How can I convert these pandas columns containing strings into float while maintaining their meaning?

up vote
1
down vote

favorite

I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.

The csv file looks like this:

2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%

So far I have tried this:

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']



df = df[~(df['Actual'].isin('|','<']))]

#df = df[~df.Actual.str.contains("|")]



df['Actual'] = df['Actual'].str.replace('%', '')

df['Forecast'] = df['Forecast'].str.replace('%', '')

df['Previous'] = df['Previous'].str.replace('%', '')



df['Actual'] = df['Actual'].str.replace('K', '000')

df['Forecast'] = df['Forecast'].str.replace('K', '000')

df['Previous'] = df['Previous'].str.replace('K', '000')



for i in df['Actual']: float(i)

for i in df['Forecast']: float(i)

for i in df['Previous']: float(i)

The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.

Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.

Hope someone can help. Thanks!

asked Nov 11 at 0:36

the painted cow

add a comment |

up vote
1
down vote

favorite

I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.

The csv file looks like this:

2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%

So far I have tried this:

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']



df = df[~(df['Actual'].isin('|','<']))]

#df = df[~df.Actual.str.contains("|")]



df['Actual'] = df['Actual'].str.replace('%', '')

df['Forecast'] = df['Forecast'].str.replace('%', '')

df['Previous'] = df['Previous'].str.replace('%', '')



df['Actual'] = df['Actual'].str.replace('K', '000')

df['Forecast'] = df['Forecast'].str.replace('K', '000')

df['Previous'] = df['Previous'].str.replace('K', '000')



for i in df['Actual']: float(i)

for i in df['Forecast']: float(i)

for i in df['Previous']: float(i)

The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.

Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.

Hope someone can help. Thanks!

asked Nov 11 at 0:36

the painted cow

add a comment |

up vote
1
down vote

favorite

I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.

The csv file looks like this:

2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%

So far I have tried this:

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']



df = df[~(df['Actual'].isin('|','<']))]

#df = df[~df.Actual.str.contains("|")]



df['Actual'] = df['Actual'].str.replace('%', '')

df['Forecast'] = df['Forecast'].str.replace('%', '')

df['Previous'] = df['Previous'].str.replace('%', '')



df['Actual'] = df['Actual'].str.replace('K', '000')

df['Forecast'] = df['Forecast'].str.replace('K', '000')

df['Previous'] = df['Previous'].str.replace('K', '000')



for i in df['Actual']: float(i)

for i in df['Forecast']: float(i)

for i in df['Previous']: float(i)

The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.

Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.

Hope someone can help. Thanks!

asked Nov 11 at 0:36

the painted cow

I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.

The csv file looks like this:

2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%

So far I have tried this:

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']



df = df[~(df['Actual'].isin('|','<']))]

#df = df[~df.Actual.str.contains("|")]



df['Actual'] = df['Actual'].str.replace('%', '')

df['Forecast'] = df['Forecast'].str.replace('%', '')

df['Previous'] = df['Previous'].str.replace('%', '')



df['Actual'] = df['Actual'].str.replace('K', '000')

df['Forecast'] = df['Forecast'].str.replace('K', '000')

df['Previous'] = df['Previous'].str.replace('K', '000')



for i in df['Actual']: float(i)

for i in df['Forecast']: float(i)

for i in df['Previous']: float(i)

The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.

Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.

Hope someone can help. Thanks!

python pandas

asked Nov 11 at 0:36

the painted cow

asked Nov 11 at 0:36

the painted cow

asked Nov 11 at 0:36

the painted cow

asked Nov 11 at 0:36

the painted cow

asked Nov 11 at 0:36

the painted cow

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Not the prettiest way to do this, but I believe this is what you want:

from io import StringIO

import pandas as pd

df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')

df = df.loc[~df['Actual'].str.contains('[|<]')]

for col in ['Actual', 'Forecast', 'Previous']:

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

answered Nov 11 at 3:26

cosmic_inquiry

890210

add a comment |

up vote
0
down vote

Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.

import pandas as pd



#  Importing economic calendar

df = pd.read_csv('EconomicCalendar.csv')

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']





#  Remove no and low impact rows, remove votes beacuse of #format not convertable

df = df[df.Impact != 'Non-Economic']



event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']

df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]



for col in ['Actual', 'Forecast', 'Previous']:

    #  Remove rows with certain formats not convertable

    df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]



    #  Change %, K, M, B, T into numerics

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000



    #  Change all to numeric to perform calculation

    df[col] = pd.to_numeric(df[col])





#  Creating Surprise column which is Actual minus Forecast

df['Surprise'] = df['Actual']-df['Forecast']

answered Nov 11 at 10:44

the painted cow

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244794%2fhow-can-i-convert-these-pandas-columns-containing-strings-into-float-while-maint%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Not the prettiest way to do this, but I believe this is what you want:

from io import StringIO

import pandas as pd

df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')

df = df.loc[~df['Actual'].str.contains('[|<]')]

for col in ['Actual', 'Forecast', 'Previous']:

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

answered Nov 11 at 3:26

cosmic_inquiry

890210

add a comment |

up vote
0
down vote

accepted

Not the prettiest way to do this, but I believe this is what you want:

from io import StringIO

import pandas as pd

df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')

df = df.loc[~df['Actual'].str.contains('[|<]')]

for col in ['Actual', 'Forecast', 'Previous']:

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

answered Nov 11 at 3:26

cosmic_inquiry

890210

add a comment |

up vote
0
down vote

accepted

Not the prettiest way to do this, but I believe this is what you want:

from io import StringIO

import pandas as pd

df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')

df = df.loc[~df['Actual'].str.contains('[|<]')]

for col in ['Actual', 'Forecast', 'Previous']:

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

answered Nov 11 at 3:26

cosmic_inquiry

890210

Not the prettiest way to do this, but I believe this is what you want:

from io import StringIO

import pandas as pd

df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K

2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0

2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9

2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')

df = df.loc[~df['Actual'].str.contains('[|<]')]

for col in ['Actual', 'Forecast', 'Previous']:

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

answered Nov 11 at 3:26

cosmic_inquiry

890210

answered Nov 11 at 3:26

cosmic_inquiry

890210

answered Nov 11 at 3:26

cosmic_inquiry

890210

answered Nov 11 at 3:26

cosmic_inquiry

890210

add a comment |

up vote
0
down vote

Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.

import pandas as pd



#  Importing economic calendar

df = pd.read_csv('EconomicCalendar.csv')

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']





#  Remove no and low impact rows, remove votes beacuse of #format not convertable

df = df[df.Impact != 'Non-Economic']



event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']

df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]



for col in ['Actual', 'Forecast', 'Previous']:

    #  Remove rows with certain formats not convertable

    df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]



    #  Change %, K, M, B, T into numerics

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000



    #  Change all to numeric to perform calculation

    df[col] = pd.to_numeric(df[col])





#  Creating Surprise column which is Actual minus Forecast

df['Surprise'] = df['Actual']-df['Forecast']

answered Nov 11 at 10:44

the painted cow

add a comment |

up vote
0
down vote

Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.

import pandas as pd



#  Importing economic calendar

df = pd.read_csv('EconomicCalendar.csv')

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']





#  Remove no and low impact rows, remove votes beacuse of #format not convertable

df = df[df.Impact != 'Non-Economic']



event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']

df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]



for col in ['Actual', 'Forecast', 'Previous']:

    #  Remove rows with certain formats not convertable

    df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]



    #  Change %, K, M, B, T into numerics

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000



    #  Change all to numeric to perform calculation

    df[col] = pd.to_numeric(df[col])





#  Creating Surprise column which is Actual minus Forecast

df['Surprise'] = df['Actual']-df['Forecast']

answered Nov 11 at 10:44

the painted cow

add a comment |

up vote
0
down vote

Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.

import pandas as pd



#  Importing economic calendar

df = pd.read_csv('EconomicCalendar.csv')

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']





#  Remove no and low impact rows, remove votes beacuse of #format not convertable

df = df[df.Impact != 'Non-Economic']



event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']

df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]



for col in ['Actual', 'Forecast', 'Previous']:

    #  Remove rows with certain formats not convertable

    df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]



    #  Change %, K, M, B, T into numerics

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000



    #  Change all to numeric to perform calculation

    df[col] = pd.to_numeric(df[col])





#  Creating Surprise column which is Actual minus Forecast

df['Surprise'] = df['Actual']-df['Forecast']

answered Nov 11 at 10:44

the painted cow

Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.

import pandas as pd



#  Importing economic calendar

df = pd.read_csv('EconomicCalendar.csv')

df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']





#  Remove no and low impact rows, remove votes beacuse of #format not convertable

df = df[df.Impact != 'Non-Economic']



event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']

df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]



for col in ['Actual', 'Forecast', 'Previous']:

    #  Remove rows with certain formats not convertable

    df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]



    #  Change %, K, M, B, T into numerics

    df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000

    df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000



    #  Change all to numeric to perform calculation

    df[col] = pd.to_numeric(df[col])





#  Creating Surprise column which is Actual minus Forecast

df['Surprise'] = df['Actual']-df['Forecast']

answered Nov 11 at 10:44

the painted cow

answered Nov 11 at 10:44

the painted cow

answered Nov 11 at 10:44

the painted cow

answered Nov 11 at 10:44

the painted cow

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk