Limitation of Keras/Tensorflow for solving Linear Regression tasks












1















I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.



I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.



np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))

x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2

model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')

model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349


My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:



floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)

session <- tf$Session()
session$run(tf$global_variables_initializer())

set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)

i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size

while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)

i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size

iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5


Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.




  1. For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

  2. Why TensorFlow finds better solutions than Keras?


Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:



shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]


For batch size = 2000:



|(Intercept)     |       x1  |        x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989









share|improve this question

























  • I liked this question :)

    – today
    Nov 21 '18 at 8:31











  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Nov 26 '18 at 15:56











  • It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

    – Andrei Pazniak
    Dec 2 '18 at 15:14











  • My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

    – today
    Dec 2 '18 at 15:25













  • I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

    – Andrei Pazniak
    Dec 3 '18 at 0:09


















1















I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.



I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.



np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))

x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2

model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')

model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349


My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:



floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)

session <- tf$Session()
session$run(tf$global_variables_initializer())

set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)

i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size

while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)

i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size

iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5


Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.




  1. For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

  2. Why TensorFlow finds better solutions than Keras?


Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:



shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]


For batch size = 2000:



|(Intercept)     |       x1  |        x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989









share|improve this question

























  • I liked this question :)

    – today
    Nov 21 '18 at 8:31











  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Nov 26 '18 at 15:56











  • It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

    – Andrei Pazniak
    Dec 2 '18 at 15:14











  • My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

    – today
    Dec 2 '18 at 15:25













  • I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

    – Andrei Pazniak
    Dec 3 '18 at 0:09
















1












1








1








I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.



I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.



np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))

x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2

model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')

model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349


My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:



floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)

session <- tf$Session()
session$run(tf$global_variables_initializer())

set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)

i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size

while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)

i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size

iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5


Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.




  1. For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

  2. Why TensorFlow finds better solutions than Keras?


Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:



shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]


For batch size = 2000:



|(Intercept)     |       x1  |        x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989









share|improve this question
















I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.



I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.



np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))

x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2

model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')

model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349


My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:



floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)

session <- tf$Session()
session$run(tf$global_variables_initializer())

set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)

i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size

while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)

i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size

iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))


The results:



| epochs| batch_size |  bias      | x1         | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5


Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.




  1. For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

  2. Why TensorFlow finds better solutions than Keras?


Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:



shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]


For batch size = 2000:



|(Intercept)     |       x1  |        x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989






tensorflow machine-learning keras linear-regression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 13:47







Andrei Pazniak

















asked Nov 20 '18 at 13:25









Andrei PazniakAndrei Pazniak

84




84













  • I liked this question :)

    – today
    Nov 21 '18 at 8:31











  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Nov 26 '18 at 15:56











  • It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

    – Andrei Pazniak
    Dec 2 '18 at 15:14











  • My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

    – today
    Dec 2 '18 at 15:25













  • I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

    – Andrei Pazniak
    Dec 3 '18 at 0:09





















  • I liked this question :)

    – today
    Nov 21 '18 at 8:31











  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Nov 26 '18 at 15:56











  • It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

    – Andrei Pazniak
    Dec 2 '18 at 15:14











  • My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

    – today
    Dec 2 '18 at 15:25













  • I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

    – Andrei Pazniak
    Dec 3 '18 at 0:09



















I liked this question :)

– today
Nov 21 '18 at 8:31





I liked this question :)

– today
Nov 21 '18 at 8:31













If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56





If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56













It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14





It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14













My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25







My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25















I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09







I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09














1 Answer
1






active

oldest

votes


















1














The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:





>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685


As a result we would have:



y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations


In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):



# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])

# fit the model ...

# the final weights
model.get_weights()

[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2

# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])

# fit the model...

# the final weights
model.get_weights()

[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!


However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].






share|improve this answer


























  • Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

    – Andrei Pazniak
    Nov 22 '18 at 13:25











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394012%2flimitation-of-keras-tensorflow-for-solving-linear-regression-tasks%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:





>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685


As a result we would have:



y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations


In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):



# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])

# fit the model ...

# the final weights
model.get_weights()

[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2

# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])

# fit the model...

# the final weights
model.get_weights()

[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!


However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].






share|improve this answer


























  • Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

    – Andrei Pazniak
    Nov 22 '18 at 13:25
















1














The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:





>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685


As a result we would have:



y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations


In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):



# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])

# fit the model ...

# the final weights
model.get_weights()

[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2

# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])

# fit the model...

# the final weights
model.get_weights()

[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!


However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].






share|improve this answer


























  • Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

    – Andrei Pazniak
    Nov 22 '18 at 13:25














1












1








1







The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:





>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685


As a result we would have:



y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations


In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):



# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])

# fit the model ...

# the final weights
model.get_weights()

[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2

# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])

# fit the model...

# the final weights
model.get_weights()

[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!


However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].






share|improve this answer















The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:





>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685


As a result we would have:



y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations


In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):



# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])

# fit the model ...

# the final weights
model.get_weights()

[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2

# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])

# fit the model...

# the final weights
model.get_weights()

[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!


However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 21 '18 at 8:37

























answered Nov 21 '18 at 8:31









todaytoday

11k22037




11k22037













  • Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

    – Andrei Pazniak
    Nov 22 '18 at 13:25



















  • Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

    – Andrei Pazniak
    Nov 22 '18 at 13:25

















Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25





Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394012%2flimitation-of-keras-tensorflow-for-solving-linear-regression-tasks%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)