pandas loc function refers to different row when writing vs. reading -> ValueError
up vote
2
down vote
favorite
When running the example code below I get a
ValueError: cannot set using a multi-index selection indexer with a different
length than the value
The error is raised upon execution of
df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])
here:
~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)
492
493 if len(obj[idx]) != len(value):
--> 494 raise ValueError
The problem seems to be connected to writing a numpy array to a "cell" of the dataframe. It seems that obj[idx]
refers to index (20,) in the dataframe, while it should refer to (9,0). A few iterations before the one that raises the error, when executing
df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])
no error is raised as by coincidence obj[idx]
refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2
.
Remark:
When I read
df.loc[(9, 0), ("clouds", "type")].values
it correctly returns [104]
.
Question:
Am I using the .loc
function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?
I greatly appreciate any help as the problem got me stuck for a few days now :/
Code:
import pandas as pd
import numpy as np
mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,
14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],
[0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,
0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],
['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],
['bool', 'intensity', 'modifier']],
labels=[[0, 0, 0, 1, 1, 1],
[24, 32, 43, 27, 28, 29],
[-1, -1, -1, -1, -1, -1]])
arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))
df = pd.DataFrame(arr, index=mi, columns=mc)
values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}
for i, val in values.items():
for j, v in val.items():
df.loc[(i,j),("clouds", "type")] = np.array(v)
python python-3.x pandas numpy
add a comment |
up vote
2
down vote
favorite
When running the example code below I get a
ValueError: cannot set using a multi-index selection indexer with a different
length than the value
The error is raised upon execution of
df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])
here:
~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)
492
493 if len(obj[idx]) != len(value):
--> 494 raise ValueError
The problem seems to be connected to writing a numpy array to a "cell" of the dataframe. It seems that obj[idx]
refers to index (20,) in the dataframe, while it should refer to (9,0). A few iterations before the one that raises the error, when executing
df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])
no error is raised as by coincidence obj[idx]
refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2
.
Remark:
When I read
df.loc[(9, 0), ("clouds", "type")].values
it correctly returns [104]
.
Question:
Am I using the .loc
function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?
I greatly appreciate any help as the problem got me stuck for a few days now :/
Code:
import pandas as pd
import numpy as np
mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,
14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],
[0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,
0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],
['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],
['bool', 'intensity', 'modifier']],
labels=[[0, 0, 0, 1, 1, 1],
[24, 32, 43, 27, 28, 29],
[-1, -1, -1, -1, -1, -1]])
arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))
df = pd.DataFrame(arr, index=mi, columns=mc)
values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}
for i, val in values.items():
for j, v in val.items():
df.loc[(i,j),("clouds", "type")] = np.array(v)
python python-3.x pandas numpy
4
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
When running the example code below I get a
ValueError: cannot set using a multi-index selection indexer with a different
length than the value
The error is raised upon execution of
df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])
here:
~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)
492
493 if len(obj[idx]) != len(value):
--> 494 raise ValueError
The problem seems to be connected to writing a numpy array to a "cell" of the dataframe. It seems that obj[idx]
refers to index (20,) in the dataframe, while it should refer to (9,0). A few iterations before the one that raises the error, when executing
df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])
no error is raised as by coincidence obj[idx]
refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2
.
Remark:
When I read
df.loc[(9, 0), ("clouds", "type")].values
it correctly returns [104]
.
Question:
Am I using the .loc
function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?
I greatly appreciate any help as the problem got me stuck for a few days now :/
Code:
import pandas as pd
import numpy as np
mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,
14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],
[0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,
0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],
['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],
['bool', 'intensity', 'modifier']],
labels=[[0, 0, 0, 1, 1, 1],
[24, 32, 43, 27, 28, 29],
[-1, -1, -1, -1, -1, -1]])
arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))
df = pd.DataFrame(arr, index=mi, columns=mc)
values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}
for i, val in values.items():
for j, v in val.items():
df.loc[(i,j),("clouds", "type")] = np.array(v)
python python-3.x pandas numpy
When running the example code below I get a
ValueError: cannot set using a multi-index selection indexer with a different
length than the value
The error is raised upon execution of
df.loc[(9, 0), ("clouds", "type")] = np.array([None, None])
here:
~Anaconda3libsite-packagespandascoreindexing.py in _setitem_with_indexer(self, indexer, value)
492
493 if len(obj[idx]) != len(value):
--> 494 raise ValueError
The problem seems to be connected to writing a numpy array to a "cell" of the dataframe. It seems that obj[idx]
refers to index (20,) in the dataframe, while it should refer to (9,0). A few iterations before the one that raises the error, when executing
df.loc[(6, 0), ("clouds", "type")] = np.array([None, None])
no error is raised as by coincidence obj[idx]
refers to index (17,) in the dataframe which has 2 sub-indices, so that by chance len(obj[idx])==len(value)==2
.
Remark:
When I read
df.loc[(9, 0), ("clouds", "type")].values
it correctly returns [104]
.
Question:
Am I using the .loc
function incorrectly? Am I doing something else wrong? Or is this a problem within pandas? How could I avoid it?
I greatly appreciate any help as the problem got me stuck for a few days now :/
Code:
import pandas as pd
import numpy as np
mi = pd.MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 12, 13, 14, 14,
14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22],
[0, 1, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 0,
0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
mc = pd.MultiIndex(levels=[['clouds', 'group', 'header', 'vertical_visibility', 'visibility', 'weather', 'wind', 'windshear'],
['', 'BR', 'DS', 'DU', 'DZ', 'FC', 'FG', 'FU', 'GR', 'GS', 'HZ', 'IC', 'PL', 'PO', 'PY', 'RA', 'SA', 'SG', 'SN', 'SQ', 'SS', 'UP', 'VA', 'altitude', 'ceiling', 'direction', 'form', 'from_date', 'from_hours', 'from_minutes', 'gust', 'icao_code', 'layer', 'more', 'origin_date', 'origin_hours', 'origin_minutes', 'probability', 'range', 'speed', 'till_date', 'till_hours', 'till_minutes', 'type', 'unit', 'valid_from_date', 'valid_from_hours', 'valid_till_date', 'valid_till_hours'],
['bool', 'intensity', 'modifier']],
labels=[[0, 0, 0, 1, 1, 1],
[24, 32, 43, 27, 28, 29],
[-1, -1, -1, -1, -1, -1]])
arr = np.array(range(0,len(mi)*len(mc))).reshape(len(mi),len(mc))
df = pd.DataFrame(arr, index=mi, columns=mc)
values = {0: {0: [None]}, 1: {0: [None], 1: [None], 2: [None], 3: [None]}, 2: {0: [None], 2: [None]}, 3: {0: [None], 1: [None], 2: [None], 3: [None], 4: [None], 5: [None]}, 4: {0: [None]}, 6: {0: [None, None]}, 9: {0: [None, None]}}
for i, val in values.items():
for j, v in val.items():
df.loc[(i,j),("clouds", "type")] = np.array(v)
python python-3.x pandas numpy
python python-3.x pandas numpy
edited Nov 12 at 13:10
unutbu
539k9911541222
539k9911541222
asked Nov 12 at 12:43
MaxMike
162
162
4
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19
add a comment |
4
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19
4
4
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
The ("clouds", "type", None)
column has integer dtype:
In [28]: df[("clouds", "type", None)].dtype
Out[28]: dtype('int64')
So if you want to assign NumPy arrays to this column, first change the dtype to object
:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
- Use
df.at
ordf.iat
to select or assign values to particular cells of a DataFrame. - Use
df.loc
ordf.iloc
to select or assign values to columns, rows or sub-DataFrames.
Therefore, use df.at
here:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = np.array(v)
which yields a df
that looks like
clouds group
ceiling layer type from_date from_hours from_minutes
NaN NaN NaN NaN NaN NaN
0 0 0 1 [None] 3 4 5
1 6 7 8 9 10 11
1 0 12 13 [None] 15 16 17
1 18 19 [None] 21 22 23
2 24 25 [None] 27 28 29
3 30 31 [None] 33 34 35
2 0 36 37 [None] 39 40 41
1 42 43 44 45 46 47
2 48 49 [None] 51 52 53
3 0 54 55 [None] 57 58 59
1 60 61 [None] 63 64 65
2 66 67 [None] 69 70 71
3 72 73 [None] 75 76 77
4 78 79 [None] 81 82 83
5 84 85 [None] 87 88 89
4 0 90 91 [None] 93 94 95
5 0 96 97 98 99 100 101
6 0 102 103 [None, None] 105 106 107
7 0 108 109 110 111 112 113
8 0 114 115 116 117 118 119
9 0 120 121 [None, None] 123 124 125
...
Regarding the comment that you wish to use the cloud/type column for categorical data:
Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
then you can make the ("clouds", "type", None)
column of category
dtype:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')
Notice that it is necessary to first make the column of object
dtype so that it may contain Python objects such as tuples, and then convert to category
dtype only after all the possible values have been assigned.
Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:
For example,
6 0 102 103 'foo' 105 106 107
6 0 102 103 'bar' 105 106 107
instead of
6 0 102 103 ('foo', 'bar') 105 106 107
One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:
df.loc[df[("clouds", "type", None)] == 'foo']
or to select all rows with foo
or bar
cloud/type:
df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]
If you use tuples, you would have to use something like
df.loc[[any(kind in item for kind in ('foo', 'bar'))
for item in df[("clouds", "type", None)]]]
Note only is this longer and harder to read, it is also slower.
One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.
So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtypecategory
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49
add a comment |
up vote
0
down vote
I think you should either:
- create one column per possible cloud layer (if order is important), or
- use a bitmask, e.g. a column dtype of
'u8'
which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53262460%2fpandas-loc-function-refers-to-different-row-when-writing-vs-reading-valueerr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The ("clouds", "type", None)
column has integer dtype:
In [28]: df[("clouds", "type", None)].dtype
Out[28]: dtype('int64')
So if you want to assign NumPy arrays to this column, first change the dtype to object
:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
- Use
df.at
ordf.iat
to select or assign values to particular cells of a DataFrame. - Use
df.loc
ordf.iloc
to select or assign values to columns, rows or sub-DataFrames.
Therefore, use df.at
here:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = np.array(v)
which yields a df
that looks like
clouds group
ceiling layer type from_date from_hours from_minutes
NaN NaN NaN NaN NaN NaN
0 0 0 1 [None] 3 4 5
1 6 7 8 9 10 11
1 0 12 13 [None] 15 16 17
1 18 19 [None] 21 22 23
2 24 25 [None] 27 28 29
3 30 31 [None] 33 34 35
2 0 36 37 [None] 39 40 41
1 42 43 44 45 46 47
2 48 49 [None] 51 52 53
3 0 54 55 [None] 57 58 59
1 60 61 [None] 63 64 65
2 66 67 [None] 69 70 71
3 72 73 [None] 75 76 77
4 78 79 [None] 81 82 83
5 84 85 [None] 87 88 89
4 0 90 91 [None] 93 94 95
5 0 96 97 98 99 100 101
6 0 102 103 [None, None] 105 106 107
7 0 108 109 110 111 112 113
8 0 114 115 116 117 118 119
9 0 120 121 [None, None] 123 124 125
...
Regarding the comment that you wish to use the cloud/type column for categorical data:
Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
then you can make the ("clouds", "type", None)
column of category
dtype:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')
Notice that it is necessary to first make the column of object
dtype so that it may contain Python objects such as tuples, and then convert to category
dtype only after all the possible values have been assigned.
Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:
For example,
6 0 102 103 'foo' 105 106 107
6 0 102 103 'bar' 105 106 107
instead of
6 0 102 103 ('foo', 'bar') 105 106 107
One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:
df.loc[df[("clouds", "type", None)] == 'foo']
or to select all rows with foo
or bar
cloud/type:
df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]
If you use tuples, you would have to use something like
df.loc[[any(kind in item for kind in ('foo', 'bar'))
for item in df[("clouds", "type", None)]]]
Note only is this longer and harder to read, it is also slower.
One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.
So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtypecategory
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49
add a comment |
up vote
1
down vote
The ("clouds", "type", None)
column has integer dtype:
In [28]: df[("clouds", "type", None)].dtype
Out[28]: dtype('int64')
So if you want to assign NumPy arrays to this column, first change the dtype to object
:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
- Use
df.at
ordf.iat
to select or assign values to particular cells of a DataFrame. - Use
df.loc
ordf.iloc
to select or assign values to columns, rows or sub-DataFrames.
Therefore, use df.at
here:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = np.array(v)
which yields a df
that looks like
clouds group
ceiling layer type from_date from_hours from_minutes
NaN NaN NaN NaN NaN NaN
0 0 0 1 [None] 3 4 5
1 6 7 8 9 10 11
1 0 12 13 [None] 15 16 17
1 18 19 [None] 21 22 23
2 24 25 [None] 27 28 29
3 30 31 [None] 33 34 35
2 0 36 37 [None] 39 40 41
1 42 43 44 45 46 47
2 48 49 [None] 51 52 53
3 0 54 55 [None] 57 58 59
1 60 61 [None] 63 64 65
2 66 67 [None] 69 70 71
3 72 73 [None] 75 76 77
4 78 79 [None] 81 82 83
5 84 85 [None] 87 88 89
4 0 90 91 [None] 93 94 95
5 0 96 97 98 99 100 101
6 0 102 103 [None, None] 105 106 107
7 0 108 109 110 111 112 113
8 0 114 115 116 117 118 119
9 0 120 121 [None, None] 123 124 125
...
Regarding the comment that you wish to use the cloud/type column for categorical data:
Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
then you can make the ("clouds", "type", None)
column of category
dtype:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')
Notice that it is necessary to first make the column of object
dtype so that it may contain Python objects such as tuples, and then convert to category
dtype only after all the possible values have been assigned.
Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:
For example,
6 0 102 103 'foo' 105 106 107
6 0 102 103 'bar' 105 106 107
instead of
6 0 102 103 ('foo', 'bar') 105 106 107
One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:
df.loc[df[("clouds", "type", None)] == 'foo']
or to select all rows with foo
or bar
cloud/type:
df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]
If you use tuples, you would have to use something like
df.loc[[any(kind in item for kind in ('foo', 'bar'))
for item in df[("clouds", "type", None)]]]
Note only is this longer and harder to read, it is also slower.
One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.
So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtypecategory
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49
add a comment |
up vote
1
down vote
up vote
1
down vote
The ("clouds", "type", None)
column has integer dtype:
In [28]: df[("clouds", "type", None)].dtype
Out[28]: dtype('int64')
So if you want to assign NumPy arrays to this column, first change the dtype to object
:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
- Use
df.at
ordf.iat
to select or assign values to particular cells of a DataFrame. - Use
df.loc
ordf.iloc
to select or assign values to columns, rows or sub-DataFrames.
Therefore, use df.at
here:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = np.array(v)
which yields a df
that looks like
clouds group
ceiling layer type from_date from_hours from_minutes
NaN NaN NaN NaN NaN NaN
0 0 0 1 [None] 3 4 5
1 6 7 8 9 10 11
1 0 12 13 [None] 15 16 17
1 18 19 [None] 21 22 23
2 24 25 [None] 27 28 29
3 30 31 [None] 33 34 35
2 0 36 37 [None] 39 40 41
1 42 43 44 45 46 47
2 48 49 [None] 51 52 53
3 0 54 55 [None] 57 58 59
1 60 61 [None] 63 64 65
2 66 67 [None] 69 70 71
3 72 73 [None] 75 76 77
4 78 79 [None] 81 82 83
5 84 85 [None] 87 88 89
4 0 90 91 [None] 93 94 95
5 0 96 97 98 99 100 101
6 0 102 103 [None, None] 105 106 107
7 0 108 109 110 111 112 113
8 0 114 115 116 117 118 119
9 0 120 121 [None, None] 123 124 125
...
Regarding the comment that you wish to use the cloud/type column for categorical data:
Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
then you can make the ("clouds", "type", None)
column of category
dtype:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')
Notice that it is necessary to first make the column of object
dtype so that it may contain Python objects such as tuples, and then convert to category
dtype only after all the possible values have been assigned.
Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:
For example,
6 0 102 103 'foo' 105 106 107
6 0 102 103 'bar' 105 106 107
instead of
6 0 102 103 ('foo', 'bar') 105 106 107
One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:
df.loc[df[("clouds", "type", None)] == 'foo']
or to select all rows with foo
or bar
cloud/type:
df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]
If you use tuples, you would have to use something like
df.loc[[any(kind in item for kind in ('foo', 'bar'))
for item in df[("clouds", "type", None)]]]
Note only is this longer and harder to read, it is also slower.
One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.
So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.
The ("clouds", "type", None)
column has integer dtype:
In [28]: df[("clouds", "type", None)].dtype
Out[28]: dtype('int64')
So if you want to assign NumPy arrays to this column, first change the dtype to object
:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
- Use
df.at
ordf.iat
to select or assign values to particular cells of a DataFrame. - Use
df.loc
ordf.iloc
to select or assign values to columns, rows or sub-DataFrames.
Therefore, use df.at
here:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = np.array(v)
which yields a df
that looks like
clouds group
ceiling layer type from_date from_hours from_minutes
NaN NaN NaN NaN NaN NaN
0 0 0 1 [None] 3 4 5
1 6 7 8 9 10 11
1 0 12 13 [None] 15 16 17
1 18 19 [None] 21 22 23
2 24 25 [None] 27 28 29
3 30 31 [None] 33 34 35
2 0 36 37 [None] 39 40 41
1 42 43 44 45 46 47
2 48 49 [None] 51 52 53
3 0 54 55 [None] 57 58 59
1 60 61 [None] 63 64 65
2 66 67 [None] 69 70 71
3 72 73 [None] 75 76 77
4 78 79 [None] 81 82 83
5 84 85 [None] 87 88 89
4 0 90 91 [None] 93 94 95
5 0 96 97 98 99 100 101
6 0 102 103 [None, None] 105 106 107
7 0 108 109 110 111 112 113
8 0 114 115 116 117 118 119
9 0 120 121 [None, None] 123 124 125
...
Regarding the comment that you wish to use the cloud/type column for categorical data:
Columns with categorical data must contain hashable values. Generally, it does not make sense to make mutable objects hashable. So, for instance, Python mutable builtins (such as lists), or NumPy arrays are not hashable. But Python immutable builtins (such as tuples) are hashable. Therefore, if you use
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
then you can make the ("clouds", "type", None)
column of category
dtype:
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('object')
for i, val in values.items():
for j, v in val.items():
df.at[(i, j), ("clouds", "type", None)] = tuple(v)
df[("clouds", "type", None)] = df[("clouds", "type", None)].astype('category')
Notice that it is necessary to first make the column of object
dtype so that it may contain Python objects such as tuples, and then convert to category
dtype only after all the possible values have been assigned.
Depending on what you want to do with the data, it might also make more sense to "tidy" the data by assigning only strings to the clouds/type column and using multiple rows instead of tuples:
For example,
6 0 102 103 'foo' 105 106 107
6 0 102 103 'bar' 105 106 107
instead of
6 0 102 103 ('foo', 'bar') 105 106 107
One advantage of using multiple rows is that selecting all rows with cloud/type
'foo' is now easy:
df.loc[df[("clouds", "type", None)] == 'foo']
or to select all rows with foo
or bar
cloud/type:
df.loc[df[("clouds", "type", None)].isin(['foo', 'bar'])]
If you use tuples, you would have to use something like
df.loc[[any(kind in item for kind in ('foo', 'bar'))
for item in df[("clouds", "type", None)]]]
Note only is this longer and harder to read, it is also slower.
One disadvantage of using multiple rows is that it create repeated data which may require greater memory usage. There may be ways around this, such as using multiple tables (and only joining them when required), but a discussion of this would be going way beyond the scope of this question.
So in summary, in general, use tidy data, use multiple rows, and keep your DataFrame dtypes simple -- use integers, floats whenever possible, 'strings' if necessary. Try to avoid using tuples, lists or NumPy arrays as DataFrame values.
edited Nov 12 at 22:59
answered Nov 12 at 13:38
unutbu
539k9911541222
539k9911541222
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtypecategory
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49
add a comment |
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtypecategory
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.
– unutbu
Nov 12 at 18:49
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
Thanks a lot. That works. The ultimate goal however (see my comment below original question) is to convert this into categorical data. The integer values were just filled in to help me finding the error. The usual defaul value is NaN. I assume I may have to find a different approach as it seems that it's not possible to use lists in columns of dtype "category".
– MaxMike
Nov 12 at 16:47
1
1
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype
category
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.– unutbu
Nov 12 at 18:49
There is a workaround which would require minimal change to your present code: Use tuples instead of NumPy arrays as values. Then you could convert the column to dtype
category
. (I've edited the post above with code to show how). But from a broader perspective, beware that -- depending on what you want to do with the DataFrame -- holding tuples or arrays as values inside a DataFrame is usually not a good idea. Using tidy data and multiple rows or multiple (joinable) DataFrames is often a better choice.– unutbu
Nov 12 at 18:49
add a comment |
up vote
0
down vote
I think you should either:
- create one column per possible cloud layer (if order is important), or
- use a bitmask, e.g. a column dtype of
'u8'
which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).
add a comment |
up vote
0
down vote
I think you should either:
- create one column per possible cloud layer (if order is important), or
- use a bitmask, e.g. a column dtype of
'u8'
which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).
add a comment |
up vote
0
down vote
up vote
0
down vote
I think you should either:
- create one column per possible cloud layer (if order is important), or
- use a bitmask, e.g. a column dtype of
'u8'
which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).
I think you should either:
- create one column per possible cloud layer (if order is important), or
- use a bitmask, e.g. a column dtype of
'u8'
which has 64 bits, and you can set as many cloud types as are applicable to that row (if order doesn't matter).
answered Nov 13 at 7:38
John Zwinck
150k16175286
150k16175286
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53262460%2fpandas-loc-function-refers-to-different-row-when-writing-vs-reading-valueerr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Is it really your intention to use this loop-of-loops to insert a bunch of NumPy arrays containing [None, None] or similar into a DataFrame? This is very unusual and suggests a design problem.
– John Zwinck
Nov 12 at 12:51
Thanks for your remark! What I am trying to do is to fit weather forecast objects (~100 million of them) as efficiently as possible into a dataframe. To save memory I wanted to conver e. g. the "cloud type" element of each forecast into a column of categorical data. The "cloud elements" can take various alphanumerical values, NaN (no cloud element in forecast) and None (cloud element given but no type). As each forecast may contain several cloud layers I wanted to store lists/arrays instead of a scalar value.
– MaxMike
Nov 12 at 16:19