Scikit learn + Pandas ValueError: shapes (1,1) and (10,10) not aligned












0















I have a problem with SciKit Learn.



I'm doing a really simple linear regression problem. Based on input values of Hours Studied & the resulting grade, I want to be able to estimate a students grade, based on how long they study.



In [1]: import pandas as pd
In [2]: path = 'Desktop/hoursgrades.csv'
In [3]: df = pd.read_csv(path)
In [4]: X = df['Hours Studied']
In [5]: y = df['Grade']
In [6]: training_data_in = list()
In [7]: training_data_out = list()
In [8]: training_data_in.append(X)
In [9]: training_data_out.append(y)
In [11]: from sklearn.linear_model import LinearRegression
In [12]: model = LinearRegression(n_jobs =-1)
In [13]: model.fit(X = training_data_in, y = training_data_out)
Out[13]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)


In this example, the DF looks like this:



In [16]: df
Out[16]:
Hours Studied Grade
0 1 10.0
1 2 20.0
2 3 30.0
3 4 40.0
4 5 50.0
5 6 60.0
6 7 70.0
7 8 80.0
8 9 90.0
9 10 100.0


And X looks like this:



In [17]: X
Out[17]:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
Name: Hours Studied, dtype: int64


And y looks like this:



In [18]: y
Out[18]:
0 10.0
1 20.0
2 30.0
3 40.0
4 50.0
5 60.0
6 70.0
7 80.0
8 90.0
9 100.0
Name: Grade, dtype: float64


So far so good, it seems to have accepted everything I've put in so far. So now, I want to test the model with some input data. So, I want to say, the number of hours this student studied is 5 & for the model to tell me the expected grade.



But when I put that into the model, I get the below error.



Can anyone advise?



In [14]: studied_hour = [[5]]

In [15]: outcome = model.predict(X = studied_hour)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-6fdab4ae2efd> in <module>()
----> 1 outcome = model.predict(X = studied_hour)

~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in predict(self, X)
254 Returns predicted values.
255 """
--> 256 return self._decision_function(X)
257
258 _preprocess_data = staticmethod(_preprocess_data)

~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in _decision_function(self, X)
239 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
240 return safe_sparse_dot(X, self.coef_.T,
--> 241 dense_output=True) + self.intercept_
242
243 def predict(self, X):

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
138 return ret
139 else:
--> 140 return np.dot(a, b)
141
142

ValueError: shapes (1,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)


I should add:



In [39]: X.shape
Out[39]: (10,)

In [40]: y.shape
Out[40]: (10,)









share|improve this question





























    0















    I have a problem with SciKit Learn.



    I'm doing a really simple linear regression problem. Based on input values of Hours Studied & the resulting grade, I want to be able to estimate a students grade, based on how long they study.



    In [1]: import pandas as pd
    In [2]: path = 'Desktop/hoursgrades.csv'
    In [3]: df = pd.read_csv(path)
    In [4]: X = df['Hours Studied']
    In [5]: y = df['Grade']
    In [6]: training_data_in = list()
    In [7]: training_data_out = list()
    In [8]: training_data_in.append(X)
    In [9]: training_data_out.append(y)
    In [11]: from sklearn.linear_model import LinearRegression
    In [12]: model = LinearRegression(n_jobs =-1)
    In [13]: model.fit(X = training_data_in, y = training_data_out)
    Out[13]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)


    In this example, the DF looks like this:



    In [16]: df
    Out[16]:
    Hours Studied Grade
    0 1 10.0
    1 2 20.0
    2 3 30.0
    3 4 40.0
    4 5 50.0
    5 6 60.0
    6 7 70.0
    7 8 80.0
    8 9 90.0
    9 10 100.0


    And X looks like this:



    In [17]: X
    Out[17]:
    0 1
    1 2
    2 3
    3 4
    4 5
    5 6
    6 7
    7 8
    8 9
    9 10
    Name: Hours Studied, dtype: int64


    And y looks like this:



    In [18]: y
    Out[18]:
    0 10.0
    1 20.0
    2 30.0
    3 40.0
    4 50.0
    5 60.0
    6 70.0
    7 80.0
    8 90.0
    9 100.0
    Name: Grade, dtype: float64


    So far so good, it seems to have accepted everything I've put in so far. So now, I want to test the model with some input data. So, I want to say, the number of hours this student studied is 5 & for the model to tell me the expected grade.



    But when I put that into the model, I get the below error.



    Can anyone advise?



    In [14]: studied_hour = [[5]]

    In [15]: outcome = model.predict(X = studied_hour)
    ---------------------------------------------------------------------------
    ValueError Traceback (most recent call last)
    <ipython-input-15-6fdab4ae2efd> in <module>()
    ----> 1 outcome = model.predict(X = studied_hour)

    ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in predict(self, X)
    254 Returns predicted values.
    255 """
    --> 256 return self._decision_function(X)
    257
    258 _preprocess_data = staticmethod(_preprocess_data)

    ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in _decision_function(self, X)
    239 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    240 return safe_sparse_dot(X, self.coef_.T,
    --> 241 dense_output=True) + self.intercept_
    242
    243 def predict(self, X):

    ~/anaconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
    138 return ret
    139 else:
    --> 140 return np.dot(a, b)
    141
    142

    ValueError: shapes (1,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)


    I should add:



    In [39]: X.shape
    Out[39]: (10,)

    In [40]: y.shape
    Out[40]: (10,)









    share|improve this question



























      0












      0








      0








      I have a problem with SciKit Learn.



      I'm doing a really simple linear regression problem. Based on input values of Hours Studied & the resulting grade, I want to be able to estimate a students grade, based on how long they study.



      In [1]: import pandas as pd
      In [2]: path = 'Desktop/hoursgrades.csv'
      In [3]: df = pd.read_csv(path)
      In [4]: X = df['Hours Studied']
      In [5]: y = df['Grade']
      In [6]: training_data_in = list()
      In [7]: training_data_out = list()
      In [8]: training_data_in.append(X)
      In [9]: training_data_out.append(y)
      In [11]: from sklearn.linear_model import LinearRegression
      In [12]: model = LinearRegression(n_jobs =-1)
      In [13]: model.fit(X = training_data_in, y = training_data_out)
      Out[13]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)


      In this example, the DF looks like this:



      In [16]: df
      Out[16]:
      Hours Studied Grade
      0 1 10.0
      1 2 20.0
      2 3 30.0
      3 4 40.0
      4 5 50.0
      5 6 60.0
      6 7 70.0
      7 8 80.0
      8 9 90.0
      9 10 100.0


      And X looks like this:



      In [17]: X
      Out[17]:
      0 1
      1 2
      2 3
      3 4
      4 5
      5 6
      6 7
      7 8
      8 9
      9 10
      Name: Hours Studied, dtype: int64


      And y looks like this:



      In [18]: y
      Out[18]:
      0 10.0
      1 20.0
      2 30.0
      3 40.0
      4 50.0
      5 60.0
      6 70.0
      7 80.0
      8 90.0
      9 100.0
      Name: Grade, dtype: float64


      So far so good, it seems to have accepted everything I've put in so far. So now, I want to test the model with some input data. So, I want to say, the number of hours this student studied is 5 & for the model to tell me the expected grade.



      But when I put that into the model, I get the below error.



      Can anyone advise?



      In [14]: studied_hour = [[5]]

      In [15]: outcome = model.predict(X = studied_hour)
      ---------------------------------------------------------------------------
      ValueError Traceback (most recent call last)
      <ipython-input-15-6fdab4ae2efd> in <module>()
      ----> 1 outcome = model.predict(X = studied_hour)

      ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in predict(self, X)
      254 Returns predicted values.
      255 """
      --> 256 return self._decision_function(X)
      257
      258 _preprocess_data = staticmethod(_preprocess_data)

      ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in _decision_function(self, X)
      239 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
      240 return safe_sparse_dot(X, self.coef_.T,
      --> 241 dense_output=True) + self.intercept_
      242
      243 def predict(self, X):

      ~/anaconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
      138 return ret
      139 else:
      --> 140 return np.dot(a, b)
      141
      142

      ValueError: shapes (1,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)


      I should add:



      In [39]: X.shape
      Out[39]: (10,)

      In [40]: y.shape
      Out[40]: (10,)









      share|improve this question
















      I have a problem with SciKit Learn.



      I'm doing a really simple linear regression problem. Based on input values of Hours Studied & the resulting grade, I want to be able to estimate a students grade, based on how long they study.



      In [1]: import pandas as pd
      In [2]: path = 'Desktop/hoursgrades.csv'
      In [3]: df = pd.read_csv(path)
      In [4]: X = df['Hours Studied']
      In [5]: y = df['Grade']
      In [6]: training_data_in = list()
      In [7]: training_data_out = list()
      In [8]: training_data_in.append(X)
      In [9]: training_data_out.append(y)
      In [11]: from sklearn.linear_model import LinearRegression
      In [12]: model = LinearRegression(n_jobs =-1)
      In [13]: model.fit(X = training_data_in, y = training_data_out)
      Out[13]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)


      In this example, the DF looks like this:



      In [16]: df
      Out[16]:
      Hours Studied Grade
      0 1 10.0
      1 2 20.0
      2 3 30.0
      3 4 40.0
      4 5 50.0
      5 6 60.0
      6 7 70.0
      7 8 80.0
      8 9 90.0
      9 10 100.0


      And X looks like this:



      In [17]: X
      Out[17]:
      0 1
      1 2
      2 3
      3 4
      4 5
      5 6
      6 7
      7 8
      8 9
      9 10
      Name: Hours Studied, dtype: int64


      And y looks like this:



      In [18]: y
      Out[18]:
      0 10.0
      1 20.0
      2 30.0
      3 40.0
      4 50.0
      5 60.0
      6 70.0
      7 80.0
      8 90.0
      9 100.0
      Name: Grade, dtype: float64


      So far so good, it seems to have accepted everything I've put in so far. So now, I want to test the model with some input data. So, I want to say, the number of hours this student studied is 5 & for the model to tell me the expected grade.



      But when I put that into the model, I get the below error.



      Can anyone advise?



      In [14]: studied_hour = [[5]]

      In [15]: outcome = model.predict(X = studied_hour)
      ---------------------------------------------------------------------------
      ValueError Traceback (most recent call last)
      <ipython-input-15-6fdab4ae2efd> in <module>()
      ----> 1 outcome = model.predict(X = studied_hour)

      ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in predict(self, X)
      254 Returns predicted values.
      255 """
      --> 256 return self._decision_function(X)
      257
      258 _preprocess_data = staticmethod(_preprocess_data)

      ~/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in _decision_function(self, X)
      239 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
      240 return safe_sparse_dot(X, self.coef_.T,
      --> 241 dense_output=True) + self.intercept_
      242
      243 def predict(self, X):

      ~/anaconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
      138 return ret
      139 else:
      --> 140 return np.dot(a, b)
      141
      142

      ValueError: shapes (1,1) and (10,10) not aligned: 1 (dim 1) != 10 (dim 0)


      I should add:



      In [39]: X.shape
      Out[39]: (10,)

      In [40]: y.shape
      Out[40]: (10,)






      python python-3.x pandas machine-learning scikit-learn






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 17 '18 at 21:33







      kikee1222

















      asked Nov 17 '18 at 20:36









      kikee1222kikee1222

      1439




      1439
























          1 Answer
          1






          active

          oldest

          votes


















          1














          The input shape of both X and y is not correct, it has to be (n_samples, n_features) for X and (n_samples,) for y as per the docs.



          You see the error because the model thinks you have ten features and ten different outputs (hence the (10, 10)).



          You get the correct results by using



          X = df[['Hours Studied']]  # note the double brackets, shape (10, 1)
          y = df['Grade']
          model = LinearRegression().fit(X, y)

          model.predict([[5]])
          array([50.])





          share|improve this answer
























          • Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

            – kikee1222
            Nov 18 '18 at 14:27











          • @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

            – Matthias Ossadnik
            Nov 19 '18 at 19:33











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53355338%2fscikit-learn-pandas-valueerror-shapes-1-1-and-10-10-not-aligned%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          The input shape of both X and y is not correct, it has to be (n_samples, n_features) for X and (n_samples,) for y as per the docs.



          You see the error because the model thinks you have ten features and ten different outputs (hence the (10, 10)).



          You get the correct results by using



          X = df[['Hours Studied']]  # note the double brackets, shape (10, 1)
          y = df['Grade']
          model = LinearRegression().fit(X, y)

          model.predict([[5]])
          array([50.])





          share|improve this answer
























          • Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

            – kikee1222
            Nov 18 '18 at 14:27











          • @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

            – Matthias Ossadnik
            Nov 19 '18 at 19:33
















          1














          The input shape of both X and y is not correct, it has to be (n_samples, n_features) for X and (n_samples,) for y as per the docs.



          You see the error because the model thinks you have ten features and ten different outputs (hence the (10, 10)).



          You get the correct results by using



          X = df[['Hours Studied']]  # note the double brackets, shape (10, 1)
          y = df['Grade']
          model = LinearRegression().fit(X, y)

          model.predict([[5]])
          array([50.])





          share|improve this answer
























          • Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

            – kikee1222
            Nov 18 '18 at 14:27











          • @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

            – Matthias Ossadnik
            Nov 19 '18 at 19:33














          1












          1








          1







          The input shape of both X and y is not correct, it has to be (n_samples, n_features) for X and (n_samples,) for y as per the docs.



          You see the error because the model thinks you have ten features and ten different outputs (hence the (10, 10)).



          You get the correct results by using



          X = df[['Hours Studied']]  # note the double brackets, shape (10, 1)
          y = df['Grade']
          model = LinearRegression().fit(X, y)

          model.predict([[5]])
          array([50.])





          share|improve this answer













          The input shape of both X and y is not correct, it has to be (n_samples, n_features) for X and (n_samples,) for y as per the docs.



          You see the error because the model thinks you have ten features and ten different outputs (hence the (10, 10)).



          You get the correct results by using



          X = df[['Hours Studied']]  # note the double brackets, shape (10, 1)
          y = df['Grade']
          model = LinearRegression().fit(X, y)

          model.predict([[5]])
          array([50.])






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 17 '18 at 21:59









          Matthias OssadnikMatthias Ossadnik

          57427




          57427













          • Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

            – kikee1222
            Nov 18 '18 at 14:27











          • @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

            – Matthias Ossadnik
            Nov 19 '18 at 19:33



















          • Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

            – kikee1222
            Nov 18 '18 at 14:27











          • @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

            – Matthias Ossadnik
            Nov 19 '18 at 19:33

















          Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

          – kikee1222
          Nov 18 '18 at 14:27





          Thank you!! - can you give me any more explanation on why the double brackets are required on the definiton of X?

          – kikee1222
          Nov 18 '18 at 14:27













          @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

          – Matthias Ossadnik
          Nov 19 '18 at 19:33





          @kikee1222 It's for selecting a list of columns (here of length 1), so that the selection yields a data frame instead of a series (2d array instead of 1d).

          – Matthias Ossadnik
          Nov 19 '18 at 19:33


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53355338%2fscikit-learn-pandas-valueerror-shapes-1-1-and-10-10-not-aligned%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to pass form data using jquery Ajax to insert data in database?

          National Museum of Racing and Hall of Fame

          Guess what letter conforming each word