Limitation of Keras/Tensorflow for solving Linear Regression tasks

I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.

I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.

np.random.seed(1443)

n = 100000

x = np.zeros((n, 2))

y = np.zeros((n, 1))



x[:,0] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))

x[:,1] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )

y = (x[:,0] + x[:,1]) /2



model = keras.Sequential()

model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))

model.compile(loss='mean_squared_error', optimizer='sgd')



model.fit(x,y, epochs=1000, batch_size=64)

print(model.get_weights())

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

| 1000  | 64         | -5.83E-05  | 0.90410435 | 0.09594361

| 1000  | 1024       | -5.71E-06  | 0.98739249 | 0.01258729

| 1000  | 10000      | -3.07E-07  | -0.2441376 | 1.2441349

My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:

floatType <- "float32"

p <- 2L

X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")

Y <- tf$placeholder(floatType, name = "y-data")

W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))

b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))

Y_hat <- tf$add(tf$matmul(X, W), b)

cost <- tf$reduce_mean(tf$square(Y_hat - Y))

generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)

optimizer <- generator$minimize(cost)



session <- tf$Session()

session$run(tf$global_variables_initializer())



set.seed(1443)

n <- 10^5

x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )

y <- matrix((x[,1]+x[,2])/2)



i <- 1

batch_size <- 10000

epoch_number  <- 1000

iterationNumber <- n*epoch_number / batch_size



while (iterationNumber > 0) {

  feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])

  session$run(optimizer, feed_dict = feed_dict)



  i <- i+batch_size

  if( i > n-batch_size)

    i <- i %% batch_size 



  iterationNumber <- iterationNumber - 1

}

r_model <- lm(y ~ x)

tf_coef <- c(session$run(b), session$run(W))

r_coef  <- r_model$coefficients

print(rbind(tf_coef, r_coef))

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

|2000   | 64         | -1.33E-06  | 0.500307   | 0.4996932

|1000   | 1000       | 2.79E-08   | 0.5000809  | 0.499919

|1000   | 10000      | -4.33E-07  | 0.5004921  | 0.499507

|1000   | 100000     | 2.96E-18   | 0.5        | 0.5

Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.

For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

Why TensorFlow finds better solutions than Keras?

Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:

shuffledIndex<-sample(1:(nrow(x)))

x <- x[shuffledIndex,]

y <- y[shuffledIndex,,drop=FALSE]

For batch size = 2000:

|(Intercept)     |       x1  |        x2

|----------------+-----------+----------

|-1.130693e-09   | 0.5000004 | 0.4999989

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

I liked this question :)

– today
Nov 21 '18 at 8:31

If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56

It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14

My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25

I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09

|
show 1 more comment

I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.

np.random.seed(1443)

n = 100000

x = np.zeros((n, 2))

y = np.zeros((n, 1))



x[:,0] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))

x[:,1] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )

y = (x[:,0] + x[:,1]) /2



model = keras.Sequential()

model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))

model.compile(loss='mean_squared_error', optimizer='sgd')



model.fit(x,y, epochs=1000, batch_size=64)

print(model.get_weights())

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

| 1000  | 64         | -5.83E-05  | 0.90410435 | 0.09594361

| 1000  | 1024       | -5.71E-06  | 0.98739249 | 0.01258729

| 1000  | 10000      | -3.07E-07  | -0.2441376 | 1.2441349

My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:

floatType <- "float32"

p <- 2L

X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")

Y <- tf$placeholder(floatType, name = "y-data")

W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))

b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))

Y_hat <- tf$add(tf$matmul(X, W), b)

cost <- tf$reduce_mean(tf$square(Y_hat - Y))

generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)

optimizer <- generator$minimize(cost)



session <- tf$Session()

session$run(tf$global_variables_initializer())



set.seed(1443)

n <- 10^5

x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )

y <- matrix((x[,1]+x[,2])/2)



i <- 1

batch_size <- 10000

epoch_number  <- 1000

iterationNumber <- n*epoch_number / batch_size



while (iterationNumber > 0) {

  feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])

  session$run(optimizer, feed_dict = feed_dict)



  i <- i+batch_size

  if( i > n-batch_size)

    i <- i %% batch_size 



  iterationNumber <- iterationNumber - 1

}

r_model <- lm(y ~ x)

tf_coef <- c(session$run(b), session$run(W))

r_coef  <- r_model$coefficients

print(rbind(tf_coef, r_coef))

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

|2000   | 64         | -1.33E-06  | 0.500307   | 0.4996932

|1000   | 1000       | 2.79E-08   | 0.5000809  | 0.499919

|1000   | 10000      | -4.33E-07  | 0.5004921  | 0.499507

|1000   | 100000     | 2.96E-18   | 0.5        | 0.5

Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.

For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

Why TensorFlow finds better solutions than Keras?

Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:

shuffledIndex<-sample(1:(nrow(x)))

x <- x[shuffledIndex,]

y <- y[shuffledIndex,,drop=FALSE]

For batch size = 2000:

|(Intercept)     |       x1  |        x2

|----------------+-----------+----------

|-1.130693e-09   | 0.5000004 | 0.4999989

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

I liked this question :)

– today
Nov 21 '18 at 8:31

If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56

It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14

My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25

I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09

|
show 1 more comment

I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.

np.random.seed(1443)

n = 100000

x = np.zeros((n, 2))

y = np.zeros((n, 1))



x[:,0] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))

x[:,1] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )

y = (x[:,0] + x[:,1]) /2



model = keras.Sequential()

model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))

model.compile(loss='mean_squared_error', optimizer='sgd')



model.fit(x,y, epochs=1000, batch_size=64)

print(model.get_weights())

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

| 1000  | 64         | -5.83E-05  | 0.90410435 | 0.09594361

| 1000  | 1024       | -5.71E-06  | 0.98739249 | 0.01258729

| 1000  | 10000      | -3.07E-07  | -0.2441376 | 1.2441349

My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:

floatType <- "float32"

p <- 2L

X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")

Y <- tf$placeholder(floatType, name = "y-data")

W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))

b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))

Y_hat <- tf$add(tf$matmul(X, W), b)

cost <- tf$reduce_mean(tf$square(Y_hat - Y))

generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)

optimizer <- generator$minimize(cost)



session <- tf$Session()

session$run(tf$global_variables_initializer())



set.seed(1443)

n <- 10^5

x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )

y <- matrix((x[,1]+x[,2])/2)



i <- 1

batch_size <- 10000

epoch_number  <- 1000

iterationNumber <- n*epoch_number / batch_size



while (iterationNumber > 0) {

  feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])

  session$run(optimizer, feed_dict = feed_dict)



  i <- i+batch_size

  if( i > n-batch_size)

    i <- i %% batch_size 



  iterationNumber <- iterationNumber - 1

}

r_model <- lm(y ~ x)

tf_coef <- c(session$run(b), session$run(W))

r_coef  <- r_model$coefficients

print(rbind(tf_coef, r_coef))

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

|2000   | 64         | -1.33E-06  | 0.500307   | 0.4996932

|1000   | 1000       | 2.79E-08   | 0.5000809  | 0.499919

|1000   | 10000      | -4.33E-07  | 0.5004921  | 0.499507

|1000   | 100000     | 2.96E-18   | 0.5        | 0.5

Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.

For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

Why TensorFlow finds better solutions than Keras?

Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:

shuffledIndex<-sample(1:(nrow(x)))

x <- x[shuffledIndex,]

y <- y[shuffledIndex,,drop=FALSE]

For batch size = 2000:

|(Intercept)     |       x1  |        x2

|----------------+-----------+----------

|-1.130693e-09   | 0.5000004 | 0.4999989

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

I try to find coefficients for y = 0.5 * x1 + 0.5 * x2.

np.random.seed(1443)

n = 100000

x = np.zeros((n, 2))

y = np.zeros((n, 1))



x[:,0] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))

x[:,1] =  sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )

y = (x[:,0] + x[:,1]) /2



model = keras.Sequential()

model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))

model.compile(loss='mean_squared_error', optimizer='sgd')



model.fit(x,y, epochs=1000, batch_size=64)

print(model.get_weights())

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

| 1000  | 64         | -5.83E-05  | 0.90410435 | 0.09594361

| 1000  | 1024       | -5.71E-06  | 0.98739249 | 0.01258729

| 1000  | 10000      | -3.07E-07  | -0.2441376 | 1.2441349

My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:

floatType <- "float32"

p <- 2L

X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")

Y <- tf$placeholder(floatType, name = "y-data")

W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))

b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))

Y_hat <- tf$add(tf$matmul(X, W), b)

cost <- tf$reduce_mean(tf$square(Y_hat - Y))

generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)

optimizer <- generator$minimize(cost)



session <- tf$Session()

session$run(tf$global_variables_initializer())



set.seed(1443)

n <- 10^5

x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )

y <- matrix((x[,1]+x[,2])/2)



i <- 1

batch_size <- 10000

epoch_number  <- 1000

iterationNumber <- n*epoch_number / batch_size



while (iterationNumber > 0) {

  feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])

  session$run(optimizer, feed_dict = feed_dict)



  i <- i+batch_size

  if( i > n-batch_size)

    i <- i %% batch_size 



  iterationNumber <- iterationNumber - 1

}

r_model <- lm(y ~ x)

tf_coef <- c(session$run(b), session$run(W))

r_coef  <- r_model$coefficients

print(rbind(tf_coef, r_coef))

The results:

| epochs| batch_size |  bias      | x1         | x2

| ------+------------+------------+------------+-----------

|2000   | 64         | -1.33E-06  | 0.500307   | 0.4996932

|1000   | 1000       | 2.79E-08   | 0.5000809  | 0.499919

|1000   | 10000      | -4.33E-07  | 0.5004921  | 0.499507

|1000   | 100000     | 2.96E-18   | 0.5        | 0.5

Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.

For obvious reasons, I cannot choose hyperparameter batch_size = n. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?

Why TensorFlow finds better solutions than Keras?

Comment 1.
Based on post "today" below:
Train dataset shuffling will significantly improve the performance of TensorFlow version:

shuffledIndex<-sample(1:(nrow(x)))

x <- x[shuffledIndex,]

y <- y[shuffledIndex,,drop=FALSE]

For batch size = 2000:

|(Intercept)     |       x1  |        x2

|----------------+-----------+----------

|-1.130693e-09   | 0.5000004 | 0.4999989

tensorflow machine-learning keras linear-regression

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

edited Nov 22 '18 at 13:47

asked Nov 20 '18 at 13:25

Andrei Pazniak

asked Nov 20 '18 at 13:25

Andrei Pazniak

asked Nov 20 '18 at 13:25

Andrei Pazniak

I liked this question :)

– today
Nov 21 '18 at 8:31

If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56

It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14

My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25

I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09

|
show 1 more comment

I liked this question :)

– today
Nov 21 '18 at 8:31

If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56

It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14

My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25

I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09

I liked this question :)

– today
Nov 21 '18 at 8:31

If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Nov 26 '18 at 15:56

It resolves my issue in Tensorflow, however, simple keras model was not improved by shuffling train data. This model is so simple that I start to think about a bug in keras library.

– Andrei Pazniak
Dec 2 '18 at 15:14

My answer was all in Keras! The issue is not about shuffling or not. It is about sorting! You should not sort the values.

– today
Dec 2 '18 at 15:25

I am sorry that did not explain well enough. I found this problem when tried to implement simple Moving Average on EUR/USD rates. I can`t attach 100Mb input data here, so I find the closest model - sorted Poisson distribution. Example of real data is : [[1.13005/1.13007], [1.13006/1.13007], [1.13016/1.13018], [1.13026/1.13027]] The real data has similar properties as sorted Poisson. So, I am looking for a solution for sorted data. And it works in TensorFlow, but not in Keras.

– Andrei Pazniak
Dec 3 '18 at 0:09

|
show 1 more comment

1 Answer
1

active

oldest

votes

The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:

>>> np.mean(np.abs(x[:,0]-x[:,1]))

0.004125721684553685

As a result we would have:

y = (x1 + x2) / 2

 ~= (x1 + x1) / 2

  = x1

  = 0.5 * x1 + 0.5 * x1

  = 0.3 * x1 + 0.7 * x1

  = -0.3 * x1 + 1.3 * x1

  = 10.1 * x1 - 9.1 * x1

  = thousands of other possible combinations

In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):

# set the initial weight of Dense layer

model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])



# fit the model ...



# the final weights

model.get_weights()



[array([[0.00203656],

        [0.9981099 ]], dtype=float32),

 array([4.5520876e-05], dtype=float32)]    # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2



# again set the weights to something different

model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])



# fit the model...



# the final weights

model.get_weights()



[array([[0.49986306],

       [0.50013727]], dtype=float32),

 array([1.4176634e-08], dtype=float32)]    # the one you were looking for!

However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394012%2flimitation-of-keras-tensorflow-for-solving-linear-regression-tasks%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:

>>> np.mean(np.abs(x[:,0]-x[:,1]))

0.004125721684553685

As a result we would have:

y = (x1 + x2) / 2

 ~= (x1 + x1) / 2

  = x1

  = 0.5 * x1 + 0.5 * x1

  = 0.3 * x1 + 0.7 * x1

  = -0.3 * x1 + 1.3 * x1

  = 10.1 * x1 - 9.1 * x1

  = thousands of other possible combinations

# set the initial weight of Dense layer

model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])



# fit the model ...



# the final weights

model.get_weights()



[array([[0.00203656],

        [0.9981099 ]], dtype=float32),

 array([4.5520876e-05], dtype=float32)]    # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2



# again set the weights to something different

model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])



# fit the model...



# the final weights

model.get_weights()



[array([[0.49986306],

       [0.50013727]], dtype=float32),

 array([1.4176634e-08], dtype=float32)]    # the one you were looking for!

However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

add a comment |

The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:

>>> np.mean(np.abs(x[:,0]-x[:,1]))

0.004125721684553685

As a result we would have:

y = (x1 + x2) / 2

 ~= (x1 + x1) / 2

  = x1

  = 0.5 * x1 + 0.5 * x1

  = 0.3 * x1 + 0.7 * x1

  = -0.3 * x1 + 1.3 * x1

  = 10.1 * x1 - 9.1 * x1

  = thousands of other possible combinations

# set the initial weight of Dense layer

model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])



# fit the model ...



# the final weights

model.get_weights()



[array([[0.00203656],

        [0.9981099 ]], dtype=float32),

 array([4.5520876e-05], dtype=float32)]    # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2



# again set the weights to something different

model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])



# fit the model...



# the final weights

model.get_weights()



[array([[0.49986306],

       [0.50013727]], dtype=float32),

 array([1.4176634e-08], dtype=float32)]    # the one you were looking for!

However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

add a comment |

The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:

>>> np.mean(np.abs(x[:,0]-x[:,1]))

0.004125721684553685

As a result we would have:

y = (x1 + x2) / 2

 ~= (x1 + x1) / 2

  = x1

  = 0.5 * x1 + 0.5 * x1

  = 0.3 * x1 + 0.7 * x1

  = -0.3 * x1 + 1.3 * x1

  = 10.1 * x1 - 9.1 * x1

  = thousands of other possible combinations

# set the initial weight of Dense layer

model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])



# fit the model ...



# the final weights

model.get_weights()



[array([[0.00203656],

        [0.9981099 ]], dtype=float32),

 array([4.5520876e-05], dtype=float32)]    # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2



# again set the weights to something different

model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])



# fit the model...



# the final weights

model.get_weights()



[array([[0.49986306],

       [0.50013727]], dtype=float32),

 array([1.4176634e-08], dtype=float32)]    # the one you were looking for!

However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:

>>> np.mean(np.abs(x[:,0]-x[:,1]))

0.004125721684553685

As a result we would have:

y = (x1 + x2) / 2

 ~= (x1 + x1) / 2

  = x1

  = 0.5 * x1 + 0.5 * x1

  = 0.3 * x1 + 0.7 * x1

  = -0.3 * x1 + 1.3 * x1

  = 10.1 * x1 - 9.1 * x1

  = thousands of other possible combinations

# set the initial weight of Dense layer

model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])



# fit the model ...



# the final weights

model.get_weights()



[array([[0.00203656],

        [0.9981099 ]], dtype=float32),

 array([4.5520876e-05], dtype=float32)]    # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2



# again set the weights to something different

model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])



# fit the model...



# the final weights

model.get_weights()



[array([[0.49986306],

       [0.50013727]], dtype=float32),

 array([1.4176634e-08], dtype=float32)]    # the one you were looking for!

However, if you don't sort the features (i.e. just remove sorted) it is very likely that the converged weights to be very close to [0.5, 0.5].

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

edited Nov 21 '18 at 8:37

answered Nov 21 '18 at 8:31

today

11k22037

answered Nov 21 '18 at 8:31

today

11k22037

answered Nov 21 '18 at 8:31

today

11k22037

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

add a comment |

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

Actually, I tried to simulate financial price. Sorted numbers from Poisson distribution is the simplest model. Your detailed explanation helps me to understand the partial solution for TensorFlow version. If we shuffle our train data, TensorFlow will converge with much smaller batch size ~2000 samples. Thanks for your help, I greatly appreciate it.

– Andrei Pazniak
Nov 22 '18 at 13:25

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk