pandas loc function refers to different row when writing vs. reading -> ValueError

up vote
2
down vote

favorite

When running the example code below I get a

ValueError: cannot set using a multi-index selection indexer with a different

length than the value

The error is raised upon execution of

df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])

here:

~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)

    492 

    493                     if len(obj[idx]) != len(value):

--> 494                         raise ValueError

The problem seems to be connected to writing a numpy array to a "cell" of the dataframe. It seems that obj[idx] refers to index (20,) in the dataframe, while it should refer to (9,0). A few iterations before the one that raises the error, when executing

df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])

no error is raised as by coincidence obj[idx] refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2.

Remark:

When I read

df.loc[(9, 0), ("clouds", "type")].values

it correctly returns [104].

Question:

Am I using the .loc function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?

I greatly appreciate any help as the problem got me stuck for a few days now :/

Code:

import pandas as pd

import numpy as np



mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],

                           [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],

                   labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,

                            14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],

                           [0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,

                            0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])





mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],

                           ['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],

                           ['bool', 'intensity', 'modifier']],

                   labels=[[0, 0, 0, 1, 1, 1],

                           [24, 32, 43, 27, 28, 29],

                           [-1, -1, -1, -1, -1, -1]])



arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))



df = pd.DataFrame(arr, index=mi, columns=mc)





values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}





for i, val in values.items():

    for j, v in val.items():

        df.loc[(i,j),("clouds", "type")] = np.array(v)

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

4

Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51

Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19

add a comment |

up vote
2
down vote

favorite

When running the example code below I get a

ValueError: cannot set using a multi-index selection indexer with a different

length than the value

The error is raised upon execution of

df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])

here:

~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)

    492 

    493                     if len(obj[idx]) != len(value):

--> 494                         raise ValueError

df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])

no error is raised as by coincidence obj[idx] refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2.

Remark:

When I read

df.loc[(9, 0), ("clouds", "type")].values

it correctly returns [104].

Question:

Am I using the .loc function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?

I greatly appreciate any help as the problem got me stuck for a few days now :/

Code:

import pandas as pd

import numpy as np



mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],

                           [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],

                   labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,

                            14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],

                           [0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,

                            0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])





mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],

                           ['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],

                           ['bool', 'intensity', 'modifier']],

                   labels=[[0, 0, 0, 1, 1, 1],

                           [24, 32, 43, 27, 28, 29],

                           [-1, -1, -1, -1, -1, -1]])



arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))



df = pd.DataFrame(arr, index=mi, columns=mc)





values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}





for i, val in values.items():

    for j, v in val.items():

        df.loc[(i,j),("clouds", "type")] = np.array(v)

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

4

Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51

Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19

add a comment |

up vote
2
down vote

favorite

When running the example code below I get a

ValueError: cannot set using a multi-index selection indexer with a different

length than the value

The error is raised upon execution of

df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])

here:

~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)

    492 

    493                     if len(obj[idx]) != len(value):

--> 494                         raise ValueError

df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])

no error is raised as by coincidence obj[idx] refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2.

Remark:

When I read

df.loc[(9, 0), ("clouds", "type")].values

it correctly returns [104].

Question:

Am I using the .loc function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?

I greatly appreciate any help as the problem got me stuck for a few days now :/

Code:

import pandas as pd

import numpy as np



mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],

                           [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],

                   labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,

                            14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],

                           [0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,

                            0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])





mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],

                           ['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],

                           ['bool', 'intensity', 'modifier']],

                   labels=[[0, 0, 0, 1, 1, 1],

                           [24, 32, 43, 27, 28, 29],

                           [-1, -1, -1, -1, -1, -1]])



arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))



df = pd.DataFrame(arr, index=mi, columns=mc)





values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}





for i, val in values.items():

    for j, v in val.items():

        df.loc[(i,j),("clouds", "type")] = np.array(v)

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

When running the example code below I get a

ValueError: cannot set using a multi-index selection indexer with a different

length than the value

The error is raised upon execution of

df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])

here:

~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)

    492 

    493                     if len(obj[idx]) != len(value):

--> 494                         raise ValueError

df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])

no error is raised as by coincidence obj[idx] refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2.

Remark:

When I read

df.loc[(9, 0), ("clouds", "type")].values

it correctly returns [104].

Question:

Am I using the .loc function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?

I greatly appreciate any help as the problem got me stuck for a few days now :/

Code:

import pandas as pd

import numpy as np



mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],

                           [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],

                   labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,

                            14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],

                           [0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,

                            0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])





mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],

                           ['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],

                           ['bool', 'intensity', 'modifier']],

                   labels=[[0, 0, 0, 1, 1, 1],

                           [24, 32, 43, 27, 28, 29],

                           [-1, -1, -1, -1, -1, -1]])



arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))



df = pd.DataFrame(arr, index=mi, columns=mc)





values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}





for i, val in values.items():

    for j, v in val.items():

        df.loc[(i,j),("clouds", "type")] = np.array(v)

python python-3.x pandas numpy

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

edited Nov 12 at 13:10

unutbu

539k9911541222

edited Nov 12 at 13:10

unutbu

539k9911541222

edited Nov 12 at 13:10

unutbu

539k9911541222

asked Nov 12 at 12:43

MaxMike

162

asked Nov 12 at 12:43

MaxMike

162

asked Nov 12 at 12:43

MaxMike

162

4

Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51

Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19

add a comment |

4

Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51

Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19

Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51

Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

The ("clouds", "type", None) column has integer dtype:

In [28]: df[("clouds", "type", None)].dtype

Out[28]: dtype('int64')

So if you want to assign NumPy arrays to this column, first change the dtype to object:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

Use df.at or df.iat to select or assign values to particular cells of a DataFrame.

Use df.loc or df.iloc to select or assign values to columns, rows or sub-DataFrames.

Therefore, use df.at here:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = np.array(v)

which yields a df that looks like

      clouds                         group                        

     ceiling layer          type from_date from_hours from_minutes

         NaN   NaN           NaN       NaN        NaN          NaN

0  0       0     1        [None]         3          4            5

   1       6     7             8         9         10           11

1  0      12    13        [None]        15         16           17

   1      18    19        [None]        21         22           23

   2      24    25        [None]        27         28           29

   3      30    31        [None]        33         34           35

2  0      36    37        [None]        39         40           41

   1      42    43            44        45         46           47

   2      48    49        [None]        51         52           53

3  0      54    55        [None]        57         58           59

   1      60    61        [None]        63         64           65

   2      66    67        [None]        69         70           71

   3      72    73        [None]        75         76           77

   4      78    79        [None]        81         82           83

   5      84    85        [None]        87         88           89

4  0      90    91        [None]        93         94           95

5  0      96    97            98        99        100          101

6  0     102   103  [None, None]       105        106          107

7  0     108   109           110       111        112          113

8  0     114   115           116       117        118          119

9  0     120   121  [None, None]       123        124          125

...

Regarding the comment that you wish to use the cloud/type column for categorical data:

Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use

df.at[(i, j), ("clouds", "type", None)] = tuple(v)

then you can make the ("clouds", "type", None) column of category dtype:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = tuple(v)



df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')

Notice that it is necessary to first make the column of object dtype so that it may contain Python objects such as tuples, and then convert to category dtype only after all the possible values have been assigned.

Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:

For example,

6  0     102   103  'foo'       105        106          107

6  0     102   103  'bar'       105        106          107

instead of

6  0     102   103  ('foo', 'bar')       105        106          107

One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:

df.loc[df[("clouds", "type", None)] == 'foo']

or to select all rows with foo or bar cloud/type:

df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]

If you use tuples, you would have to use something like

df.loc[[any(kind in item for kind in ('foo', 'bar')) 

       for item in df[("clouds", "type", None)]]]

Note only is this longer and harder to read, it is also slower.

One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.

So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

1

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

add a comment |

up vote
0
down vote

I think you should either:

create one column per possible cloud layer (if order is important), or

use a bitmask, e.g. a column dtype of 'u8' which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).

answered Nov 13 at 7:38

John Zwinck

150k16175286

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53262460%2fpandas-loc-function-refers-to-different-row-when-writing-vs-reading-valueerr%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

The ("clouds", "type", None) column has integer dtype:

In [28]: df[("clouds", "type", None)].dtype

Out[28]: dtype('int64')

So if you want to assign NumPy arrays to this column, first change the dtype to object:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

Use df.at or df.iat to select or assign values to particular cells of a DataFrame.

Use df.loc or df.iloc to select or assign values to columns, rows or sub-DataFrames.

Therefore, use df.at here:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = np.array(v)

which yields a df that looks like

      clouds                         group                        

     ceiling layer          type from_date from_hours from_minutes

         NaN   NaN           NaN       NaN        NaN          NaN

0  0       0     1        [None]         3          4            5

   1       6     7             8         9         10           11

1  0      12    13        [None]        15         16           17

   1      18    19        [None]        21         22           23

   2      24    25        [None]        27         28           29

   3      30    31        [None]        33         34           35

2  0      36    37        [None]        39         40           41

   1      42    43            44        45         46           47

   2      48    49        [None]        51         52           53

3  0      54    55        [None]        57         58           59

   1      60    61        [None]        63         64           65

   2      66    67        [None]        69         70           71

   3      72    73        [None]        75         76           77

   4      78    79        [None]        81         82           83

   5      84    85        [None]        87         88           89

4  0      90    91        [None]        93         94           95

5  0      96    97            98        99        100          101

6  0     102   103  [None, None]       105        106          107

7  0     108   109           110       111        112          113

8  0     114   115           116       117        118          119

9  0     120   121  [None, None]       123        124          125

...

Regarding the comment that you wish to use the cloud/type column for categorical data:

df.at[(i, j), ("clouds", "type", None)] = tuple(v)

then you can make the ("clouds", "type", None) column of category dtype:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = tuple(v)



df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')

Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:

For example,

6  0     102   103  'foo'       105        106          107

6  0     102   103  'bar'       105        106          107

instead of

6  0     102   103  ('foo', 'bar')       105        106          107

One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:

df.loc[df[("clouds", "type", None)] == 'foo']

or to select all rows with foo or bar cloud/type:

df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]

If you use tuples, you would have to use something like

df.loc[[any(kind in item for kind in ('foo', 'bar')) 

       for item in df[("clouds", "type", None)]]]

Note only is this longer and harder to read, it is also slower.

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

1

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

add a comment |

up vote
1
down vote

The ("clouds", "type", None) column has integer dtype:

In [28]: df[("clouds", "type", None)].dtype

Out[28]: dtype('int64')

So if you want to assign NumPy arrays to this column, first change the dtype to object:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

Use df.at or df.iat to select or assign values to particular cells of a DataFrame.

Use df.loc or df.iloc to select or assign values to columns, rows or sub-DataFrames.

Therefore, use df.at here:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = np.array(v)

which yields a df that looks like

      clouds                         group                        

     ceiling layer          type from_date from_hours from_minutes

         NaN   NaN           NaN       NaN        NaN          NaN

0  0       0     1        [None]         3          4            5

   1       6     7             8         9         10           11

1  0      12    13        [None]        15         16           17

   1      18    19        [None]        21         22           23

   2      24    25        [None]        27         28           29

   3      30    31        [None]        33         34           35

2  0      36    37        [None]        39         40           41

   1      42    43            44        45         46           47

   2      48    49        [None]        51         52           53

3  0      54    55        [None]        57         58           59

   1      60    61        [None]        63         64           65

   2      66    67        [None]        69         70           71

   3      72    73        [None]        75         76           77

   4      78    79        [None]        81         82           83

   5      84    85        [None]        87         88           89

4  0      90    91        [None]        93         94           95

5  0      96    97            98        99        100          101

6  0     102   103  [None, None]       105        106          107

7  0     108   109           110       111        112          113

8  0     114   115           116       117        118          119

9  0     120   121  [None, None]       123        124          125

...

Regarding the comment that you wish to use the cloud/type column for categorical data:

df.at[(i, j), ("clouds", "type", None)] = tuple(v)

then you can make the ("clouds", "type", None) column of category dtype:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = tuple(v)



df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')

Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:

For example,

6  0     102   103  'foo'       105        106          107

6  0     102   103  'bar'       105        106          107

instead of

6  0     102   103  ('foo', 'bar')       105        106          107

One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:

df.loc[df[("clouds", "type", None)] == 'foo']

or to select all rows with foo or bar cloud/type:

df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]

If you use tuples, you would have to use something like

df.loc[[any(kind in item for kind in ('foo', 'bar')) 

       for item in df[("clouds", "type", None)]]]

Note only is this longer and harder to read, it is also slower.

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

1

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

add a comment |

up vote
1
down vote

The ("clouds", "type", None) column has integer dtype:

In [28]: df[("clouds", "type", None)].dtype

Out[28]: dtype('int64')

So if you want to assign NumPy arrays to this column, first change the dtype to object:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

Use df.at or df.iat to select or assign values to particular cells of a DataFrame.

Use df.loc or df.iloc to select or assign values to columns, rows or sub-DataFrames.

Therefore, use df.at here:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = np.array(v)

which yields a df that looks like

      clouds                         group                        

     ceiling layer          type from_date from_hours from_minutes

         NaN   NaN           NaN       NaN        NaN          NaN

0  0       0     1        [None]         3          4            5

   1       6     7             8         9         10           11

1  0      12    13        [None]        15         16           17

   1      18    19        [None]        21         22           23

   2      24    25        [None]        27         28           29

   3      30    31        [None]        33         34           35

2  0      36    37        [None]        39         40           41

   1      42    43            44        45         46           47

   2      48    49        [None]        51         52           53

3  0      54    55        [None]        57         58           59

   1      60    61        [None]        63         64           65

   2      66    67        [None]        69         70           71

   3      72    73        [None]        75         76           77

   4      78    79        [None]        81         82           83

   5      84    85        [None]        87         88           89

4  0      90    91        [None]        93         94           95

5  0      96    97            98        99        100          101

6  0     102   103  [None, None]       105        106          107

7  0     108   109           110       111        112          113

8  0     114   115           116       117        118          119

9  0     120   121  [None, None]       123        124          125

...

Regarding the comment that you wish to use the cloud/type column for categorical data:

df.at[(i, j), ("clouds", "type", None)] = tuple(v)

then you can make the ("clouds", "type", None) column of category dtype:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = tuple(v)



df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')

Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:

For example,

6  0     102   103  'foo'       105        106          107

6  0     102   103  'bar'       105        106          107

instead of

6  0     102   103  ('foo', 'bar')       105        106          107

One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:

df.loc[df[("clouds", "type", None)] == 'foo']

or to select all rows with foo or bar cloud/type:

df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]

If you use tuples, you would have to use something like

df.loc[[any(kind in item for kind in ('foo', 'bar')) 

       for item in df[("clouds", "type", None)]]]

Note only is this longer and harder to read, it is also slower.

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

The ("clouds", "type", None) column has integer dtype:

In [28]: df[("clouds", "type", None)].dtype

Out[28]: dtype('int64')

So if you want to assign NumPy arrays to this column, first change the dtype to object:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

Use df.at or df.iat to select or assign values to particular cells of a DataFrame.

Use df.loc or df.iloc to select or assign values to columns, rows or sub-DataFrames.

Therefore, use df.at here:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = np.array(v)

which yields a df that looks like

      clouds                         group                        

     ceiling layer          type from_date from_hours from_minutes

         NaN   NaN           NaN       NaN        NaN          NaN

0  0       0     1        [None]         3          4            5

   1       6     7             8         9         10           11

1  0      12    13        [None]        15         16           17

   1      18    19        [None]        21         22           23

   2      24    25        [None]        27         28           29

   3      30    31        [None]        33         34           35

2  0      36    37        [None]        39         40           41

   1      42    43            44        45         46           47

   2      48    49        [None]        51         52           53

3  0      54    55        [None]        57         58           59

   1      60    61        [None]        63         64           65

   2      66    67        [None]        69         70           71

   3      72    73        [None]        75         76           77

   4      78    79        [None]        81         82           83

   5      84    85        [None]        87         88           89

4  0      90    91        [None]        93         94           95

5  0      96    97            98        99        100          101

6  0     102   103  [None, None]       105        106          107

7  0     108   109           110       111        112          113

8  0     114   115           116       117        118          119

9  0     120   121  [None, None]       123        124          125

...

Regarding the comment that you wish to use the cloud/type column for categorical data:

df.at[(i, j), ("clouds", "type", None)] = tuple(v)

then you can make the ("clouds", "type", None) column of category dtype:

df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')

for i, val in values.items():

    for j, v in val.items():

        df.at[(i, j), ("clouds", "type", None)] = tuple(v)



df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')

Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:

For example,

6  0     102   103  'foo'       105        106          107

6  0     102   103  'bar'       105        106          107

instead of

6  0     102   103  ('foo', 'bar')       105        106          107

One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:

df.loc[df[("clouds", "type", None)] == 'foo']

or to select all rows with foo or bar cloud/type:

df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]

If you use tuples, you would have to use something like

df.loc[[any(kind in item for kind in ('foo', 'bar')) 

       for item in df[("clouds", "type", None)]]]

Note only is this longer and harder to read, it is also slower.

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

edited Nov 12 at 22:59

answered Nov 12 at 13:38

unutbu

539k9911541222

answered Nov 12 at 13:38

unutbu

539k9911541222

answered Nov 12 at 13:38

unutbu

539k9911541222

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

1

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

add a comment |

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

1

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47

There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype category. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49

add a comment |

up vote
0
down vote

I think you should either:

create one column per possible cloud layer (if order is important), or

use a bitmask, e.g. a column dtype of 'u8' which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).

answered Nov 13 at 7:38

John Zwinck

150k16175286

add a comment |

up vote
0
down vote

I think you should either:

create one column per possible cloud layer (if order is important), or

use a bitmask, e.g. a column dtype of 'u8' which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).

answered Nov 13 at 7:38

John Zwinck

150k16175286

add a comment |

up vote
0
down vote

I think you should either:

create one column per possible cloud layer (if order is important), or

use a bitmask, e.g. a column dtype of 'u8' which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).

answered Nov 13 at 7:38

John Zwinck

150k16175286

I think you should either:

create one column per possible cloud layer (if order is important), or

use a bitmask, e.g. a column dtype of 'u8' which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).

answered Nov 13 at 7:38

John Zwinck

150k16175286

answered Nov 13 at 7:38

John Zwinck

150k16175286

answered Nov 13 at 7:38

John Zwinck

150k16175286

answered Nov 13 at 7:38

John Zwinck

150k16175286

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk