Dropping column from one dataframe based on column value of second dataframe in pandas
I have 2 dataframes df1 and df2, both consisting of 8 columns each as seen below :
**df1**
╔══════════════════════════════════════════════════════════╗
║John ║ Mark ║ Jane ║ Natasha ║ Oliver ║ Tony ║ Judd ║ Ron ║
╚══════════════════════════════════════════════════════════╝
**df2**
╔══════════════════════════════════════════════════╗
║True ║True ║False ║True ║False ║False ║False ║True║
╚══════════════════════════════════════════════════╝
df1 has columns that are names of different people while df2 has column names that are boolean values. What I want to do is drop all columns in df1 that have a corresponding value of False in df2. So the resulting output should look like this :
**output**
╔════════════════════════════╗
║John ║ Mark ║ Natasha ║ Ron ║
╚════════════════════════════╝
I am reading both the dataframes from csv files.
Any and all help would be appreciated.
Note : The actual dataframes have 500 columns each. Used 8 as an example for visualization purposes as well as to show that the dataframes have equal number of columns
Thanks in advance
python pandas dataframe
add a comment |
I have 2 dataframes df1 and df2, both consisting of 8 columns each as seen below :
**df1**
╔══════════════════════════════════════════════════════════╗
║John ║ Mark ║ Jane ║ Natasha ║ Oliver ║ Tony ║ Judd ║ Ron ║
╚══════════════════════════════════════════════════════════╝
**df2**
╔══════════════════════════════════════════════════╗
║True ║True ║False ║True ║False ║False ║False ║True║
╚══════════════════════════════════════════════════╝
df1 has columns that are names of different people while df2 has column names that are boolean values. What I want to do is drop all columns in df1 that have a corresponding value of False in df2. So the resulting output should look like this :
**output**
╔════════════════════════════╗
║John ║ Mark ║ Natasha ║ Ron ║
╚════════════════════════════╝
I am reading both the dataframes from csv files.
Any and all help would be appreciated.
Note : The actual dataframes have 500 columns each. Used 8 as an example for visualization purposes as well as to show that the dataframes have equal number of columns
Thanks in advance
python pandas dataframe
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
not really. If they are in the same order:df1head = list(df1)
df2head = list(df2)
to get both header names, thendf2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50
add a comment |
I have 2 dataframes df1 and df2, both consisting of 8 columns each as seen below :
**df1**
╔══════════════════════════════════════════════════════════╗
║John ║ Mark ║ Jane ║ Natasha ║ Oliver ║ Tony ║ Judd ║ Ron ║
╚══════════════════════════════════════════════════════════╝
**df2**
╔══════════════════════════════════════════════════╗
║True ║True ║False ║True ║False ║False ║False ║True║
╚══════════════════════════════════════════════════╝
df1 has columns that are names of different people while df2 has column names that are boolean values. What I want to do is drop all columns in df1 that have a corresponding value of False in df2. So the resulting output should look like this :
**output**
╔════════════════════════════╗
║John ║ Mark ║ Natasha ║ Ron ║
╚════════════════════════════╝
I am reading both the dataframes from csv files.
Any and all help would be appreciated.
Note : The actual dataframes have 500 columns each. Used 8 as an example for visualization purposes as well as to show that the dataframes have equal number of columns
Thanks in advance
python pandas dataframe
I have 2 dataframes df1 and df2, both consisting of 8 columns each as seen below :
**df1**
╔══════════════════════════════════════════════════════════╗
║John ║ Mark ║ Jane ║ Natasha ║ Oliver ║ Tony ║ Judd ║ Ron ║
╚══════════════════════════════════════════════════════════╝
**df2**
╔══════════════════════════════════════════════════╗
║True ║True ║False ║True ║False ║False ║False ║True║
╚══════════════════════════════════════════════════╝
df1 has columns that are names of different people while df2 has column names that are boolean values. What I want to do is drop all columns in df1 that have a corresponding value of False in df2. So the resulting output should look like this :
**output**
╔════════════════════════════╗
║John ║ Mark ║ Natasha ║ Ron ║
╚════════════════════════════╝
I am reading both the dataframes from csv files.
Any and all help would be appreciated.
Note : The actual dataframes have 500 columns each. Used 8 as an example for visualization purposes as well as to show that the dataframes have equal number of columns
Thanks in advance
python pandas dataframe
python pandas dataframe
edited Nov 20 '18 at 19:34
Stevi G
asked Nov 20 '18 at 19:08
Stevi GStevi G
536
536
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
not really. If they are in the same order:df1head = list(df1)
df2head = list(df2)
to get both header names, thendf2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50
add a comment |
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
not really. If they are in the same order:df1head = list(df1)
df2head = list(df2)
to get both header names, thendf2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
not really. If they are in the same order:
df1head = list(df1)
df2head = list(df2)
to get both header names, then df2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
not really. If they are in the same order:
df1head = list(df1)
df2head = list(df2)
to get both header names, then df2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50
add a comment |
1 Answer
1
active
oldest
votes
You can, using basic indexing. However, when you parse your df2
, the column names have duplicates and are altered, so it requires a bit of cleaning.
Setup
names = ['John', 'Mark', 'Jane', 'Natasha', 'Oliver', 'Tony', 'Judd', 'Ron']
cols = ['TRUE', 'TRUE.1', 'FALSE', 'FALSE.1', 'TRUE.2', 'FALSE.2', 'FALSE.3', 'TRUE.3']
df1 = pd.DataFrame(columns=names)
df2 = pd.DataFrame(columns=cols)
df1.loc[:, df2.columns.str.contains('TRUE')]
Empty DataFrame
Columns: [John, Mark, Oliver, Ron]
Index:
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, butstr
, in which case you need to convert tobool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing withdf1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
|
show 7 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399899%2fdropping-column-from-one-dataframe-based-on-column-value-of-second-dataframe-in%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can, using basic indexing. However, when you parse your df2
, the column names have duplicates and are altered, so it requires a bit of cleaning.
Setup
names = ['John', 'Mark', 'Jane', 'Natasha', 'Oliver', 'Tony', 'Judd', 'Ron']
cols = ['TRUE', 'TRUE.1', 'FALSE', 'FALSE.1', 'TRUE.2', 'FALSE.2', 'FALSE.3', 'TRUE.3']
df1 = pd.DataFrame(columns=names)
df2 = pd.DataFrame(columns=cols)
df1.loc[:, df2.columns.str.contains('TRUE')]
Empty DataFrame
Columns: [John, Mark, Oliver, Ron]
Index:
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, butstr
, in which case you need to convert tobool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing withdf1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
|
show 7 more comments
You can, using basic indexing. However, when you parse your df2
, the column names have duplicates and are altered, so it requires a bit of cleaning.
Setup
names = ['John', 'Mark', 'Jane', 'Natasha', 'Oliver', 'Tony', 'Judd', 'Ron']
cols = ['TRUE', 'TRUE.1', 'FALSE', 'FALSE.1', 'TRUE.2', 'FALSE.2', 'FALSE.3', 'TRUE.3']
df1 = pd.DataFrame(columns=names)
df2 = pd.DataFrame(columns=cols)
df1.loc[:, df2.columns.str.contains('TRUE')]
Empty DataFrame
Columns: [John, Mark, Oliver, Ron]
Index:
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, butstr
, in which case you need to convert tobool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing withdf1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
|
show 7 more comments
You can, using basic indexing. However, when you parse your df2
, the column names have duplicates and are altered, so it requires a bit of cleaning.
Setup
names = ['John', 'Mark', 'Jane', 'Natasha', 'Oliver', 'Tony', 'Judd', 'Ron']
cols = ['TRUE', 'TRUE.1', 'FALSE', 'FALSE.1', 'TRUE.2', 'FALSE.2', 'FALSE.3', 'TRUE.3']
df1 = pd.DataFrame(columns=names)
df2 = pd.DataFrame(columns=cols)
df1.loc[:, df2.columns.str.contains('TRUE')]
Empty DataFrame
Columns: [John, Mark, Oliver, Ron]
Index:
You can, using basic indexing. However, when you parse your df2
, the column names have duplicates and are altered, so it requires a bit of cleaning.
Setup
names = ['John', 'Mark', 'Jane', 'Natasha', 'Oliver', 'Tony', 'Judd', 'Ron']
cols = ['TRUE', 'TRUE.1', 'FALSE', 'FALSE.1', 'TRUE.2', 'FALSE.2', 'FALSE.3', 'TRUE.3']
df1 = pd.DataFrame(columns=names)
df2 = pd.DataFrame(columns=cols)
df1.loc[:, df2.columns.str.contains('TRUE')]
Empty DataFrame
Columns: [John, Mark, Oliver, Ron]
Index:
edited Nov 20 '18 at 19:52
answered Nov 20 '18 at 19:10
user3483203user3483203
31.3k82656
31.3k82656
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, butstr
, in which case you need to convert tobool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing withdf1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
|
show 7 more comments
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, butstr
, in which case you need to convert tobool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing withdf1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
I am reading the dataframes from csv files so when I try this method I the KeyError : "None of the [Index('0','1','2','3','4','5','6','7','8',.......'499' are in [columns]. Also both the dataframes have 500 columns not 8. I used 8 as an example for simplicity and convenience.
– Stevi G
Nov 20 '18 at 19:21
2
2
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, but
str
, in which case you need to convert to bool
– user3483203
Nov 20 '18 at 19:23
Then you haven't accurately represented your data. I'm guessing your columns aren't boolean, but
str
, in which case you need to convert to bool
– user3483203
Nov 20 '18 at 19:23
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
Post edited to give more accurate description of problem. I apologize for the misleading post
– Stevi G
Nov 20 '18 at 19:35
@SteviG try indexing with
df1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
@SteviG try indexing with
df1.loc[:, (df2.columns == 'True')]
– user3483203
Nov 20 '18 at 19:36
1
1
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
Thanks alot man works perfectly now :)
– Stevi G
Nov 20 '18 at 19:55
|
show 7 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399899%2fdropping-column-from-one-dataframe-based-on-column-value-of-second-dataframe-in%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It seems an odd way to set up and transform your dataset. I'd rename df2 to have the same headers as df1 with the boolean values as the first row. Then you could easily pd.concat both dataframes and select columns based on the last row of the new df.
– Tim Gottgetreu
Nov 20 '18 at 19:33
Renaming 500 boolean values to 500 unique different names would be cumbersome to say the least. Hence I am trying this approach
– Stevi G
Nov 20 '18 at 19:36
not really. If they are in the same order:
df1head = list(df1)
df2head = list(df2)
to get both header names, thendf2.rename(columns = (dict(zip(df2head,df1head)),inplace = True)
– Tim Gottgetreu
Nov 20 '18 at 19:42
does df1 have any data under the header names? Or are both df's only headers? If they are both only headers, transpose them and join on index, then select by the boolean value.
– Tim Gottgetreu
Nov 20 '18 at 19:48
df1 has data under header. It is salary for each day. df1 has 19164 rows
– Stevi G
Nov 20 '18 at 19:50