Limitation of Keras/Tensorflow for solving Linear Regression tasks
I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.
I try to find coefficients for y = 0.5 * x1 + 0.5 * x2
.
np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))
x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2
model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349
My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:
floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)
session <- tf$Session()
session$run(tf$global_variables_initializer())
set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)
i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size
while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)
i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size
iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5
Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.
- For obvious reasons, I cannot choose hyperparameter
batch_size = n
. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow? - Why TensorFlow finds better solutions than Keras?
Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:
shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]
For batch size = 2000:
|(Intercept) | x1 | x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989
tensorflow machine-learning keras linear-regression
|
show 1 more comment
I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.
I try to find coefficients for y = 0.5 * x1 + 0.5 * x2
.
np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))
x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2
model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349
My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:
floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)
session <- tf$Session()
session$run(tf$global_variables_initializer())
set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)
i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size
while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)
i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size
iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5
Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.
- For obvious reasons, I cannot choose hyperparameter
batch_size = n
. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow? - Why TensorFlow finds better solutions than Keras?
Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:
shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]
For batch size = 2000:
|(Intercept) | x1 | x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989
tensorflow machine-learning keras linear-regression
I liked this question :)
– today
Nov 21 '18 at 8:31
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09
|
show 1 more comment
I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.
I try to find coefficients for y = 0.5 * x1 + 0.5 * x2
.
np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))
x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2
model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349
My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:
floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)
session <- tf$Session()
session$run(tf$global_variables_initializer())
set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)
i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size
while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)
i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size
iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5
Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.
- For obvious reasons, I cannot choose hyperparameter
batch_size = n
. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow? - Why TensorFlow finds better solutions than Keras?
Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:
shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]
For batch size = 2000:
|(Intercept) | x1 | x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989
tensorflow machine-learning keras linear-regression
I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.
I try to find coefficients for y = 0.5 * x1 + 0.5 * x2
.
np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))
x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2
model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349
My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:
floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)
session <- tf$Session()
session$run(tf$global_variables_initializer())
set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)
i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size
while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)
i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size
iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5
Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.
- For obvious reasons, I cannot choose hyperparameter
batch_size = n
. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow? - Why TensorFlow finds better solutions than Keras?
Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:
shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]
For batch size = 2000:
|(Intercept) | x1 | x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989
tensorflow machine-learning keras linear-regression
tensorflow machine-learning keras linear-regression
edited Nov 22 '18 at 13:47
Andrei Pazniak
asked Nov 20 '18 at 13:25
Andrei PazniakAndrei Pazniak
84
84
I liked this question :)
– today
Nov 21 '18 at 8:31
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09
|
show 1 more comment
I liked this question :)
– today
Nov 21 '18 at 8:31
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09
I liked this question :)
– today
Nov 21 '18 at 8:31
I liked this question :)
– today
Nov 21 '18 at 8:31
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09
|
show 1 more comment
1 Answer
1
active
oldest
votes
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394012%2flimitation-of-keras-tensorflow-for-solving-linear-regression-tasks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
add a comment |
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
add a comment |
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
edited Nov 21 '18 at 8:37
answered Nov 21 '18 at 8:31
todaytoday
11k22037
11k22037
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
add a comment |
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.
– Andrei Pazniak
Nov 22 '18 at 13:25
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394012%2flimitation-of-keras-tensorflow-for-solving-linear-regression-tasks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I liked this question :)
– today
Nov 21 '18 at 8:31
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Nov 26 '18 at 15:56
It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.
– Andrei Pazniak
Dec 2 '18 at 15:14
My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.
– today
Dec 2 '18 at 15:25
I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.
– Andrei Pazniak
Dec 3 '18 at 0:09