Select columns which contains a string in pyspark
I have a pyspark dataframe with a lot of columns, and I want to select the ones which contain a certain string, and others. For example:
df.columns = ['hello_world','hello_country','hello_everyone','byebye','ciao','index']
I want to select the ones which contains 'hello' and also the column named 'index', so the result will be:
['hello_world','hello_country','hello_everyone','index']
I want something like df.select('hello*','index')
Thanks in advance :)
EDIT:
I found a quick way to solve it, so I answered myself, Q&A style. If someone sees my solution and can provide a better one I will appreciate it
python pyspark pyspark-sql
add a comment |
I have a pyspark dataframe with a lot of columns, and I want to select the ones which contain a certain string, and others. For example:
df.columns = ['hello_world','hello_country','hello_everyone','byebye','ciao','index']
I want to select the ones which contains 'hello' and also the column named 'index', so the result will be:
['hello_world','hello_country','hello_everyone','index']
I want something like df.select('hello*','index')
Thanks in advance :)
EDIT:
I found a quick way to solve it, so I answered myself, Q&A style. If someone sees my solution and can provide a better one I will appreciate it
python pyspark pyspark-sql
add a comment |
I have a pyspark dataframe with a lot of columns, and I want to select the ones which contain a certain string, and others. For example:
df.columns = ['hello_world','hello_country','hello_everyone','byebye','ciao','index']
I want to select the ones which contains 'hello' and also the column named 'index', so the result will be:
['hello_world','hello_country','hello_everyone','index']
I want something like df.select('hello*','index')
Thanks in advance :)
EDIT:
I found a quick way to solve it, so I answered myself, Q&A style. If someone sees my solution and can provide a better one I will appreciate it
python pyspark pyspark-sql
I have a pyspark dataframe with a lot of columns, and I want to select the ones which contain a certain string, and others. For example:
df.columns = ['hello_world','hello_country','hello_everyone','byebye','ciao','index']
I want to select the ones which contains 'hello' and also the column named 'index', so the result will be:
['hello_world','hello_country','hello_everyone','index']
I want something like df.select('hello*','index')
Thanks in advance :)
EDIT:
I found a quick way to solve it, so I answered myself, Q&A style. If someone sees my solution and can provide a better one I will appreciate it
python pyspark pyspark-sql
python pyspark pyspark-sql
edited Nov 21 '18 at 11:07
Manrique
asked Nov 21 '18 at 9:23
ManriqueManrique
537415
537415
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
I've found a quick and elegant way:
selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)
With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.
Great solution. and do not need*
beforeselected
?
– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
add a comment |
You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.
Hope it helps.
Regards,
Neeraj
add a comment |
This sample code does what you want:
hello_cols =
for col in df.columns:
if(('index' in col) or ('hello' in col)):
hello_cols.append(col)
df.select(*hello_cols)
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53408830%2fselect-columns-which-contains-a-string-in-pyspark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I've found a quick and elegant way:
selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)
With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.
Great solution. and do not need*
beforeselected
?
– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
add a comment |
I've found a quick and elegant way:
selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)
With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.
Great solution. and do not need*
beforeselected
?
– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
add a comment |
I've found a quick and elegant way:
selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)
With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.
I've found a quick and elegant way:
selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)
With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.
answered Nov 21 '18 at 9:49
ManriqueManrique
537415
537415
Great solution. and do not need*
beforeselected
?
– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
add a comment |
Great solution. and do not need*
beforeselected
?
– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
Great solution. and do not need
*
before selected
?– Ali AzG
Nov 21 '18 at 9:52
Great solution. and do not need
*
before selected
?– Ali AzG
Nov 21 '18 at 9:52
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
Thanks ! I don't :)
– Manrique
Nov 21 '18 at 9:58
add a comment |
You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.
Hope it helps.
Regards,
Neeraj
add a comment |
You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.
Hope it helps.
Regards,
Neeraj
add a comment |
You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.
Hope it helps.
Regards,
Neeraj
You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.
Hope it helps.
Regards,
Neeraj
answered Nov 21 '18 at 13:59
neeraj bhadanineeraj bhadani
925313
925313
add a comment |
add a comment |
This sample code does what you want:
hello_cols =
for col in df.columns:
if(('index' in col) or ('hello' in col)):
hello_cols.append(col)
df.select(*hello_cols)
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
add a comment |
This sample code does what you want:
hello_cols =
for col in df.columns:
if(('index' in col) or ('hello' in col)):
hello_cols.append(col)
df.select(*hello_cols)
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
add a comment |
This sample code does what you want:
hello_cols =
for col in df.columns:
if(('index' in col) or ('hello' in col)):
hello_cols.append(col)
df.select(*hello_cols)
This sample code does what you want:
hello_cols =
for col in df.columns:
if(('index' in col) or ('hello' in col)):
hello_cols.append(col)
df.select(*hello_cols)
edited Nov 21 '18 at 9:44
Manrique
537415
537415
answered Nov 21 '18 at 9:39
Ali AzGAli AzG
7131717
7131717
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
add a comment |
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
Thanks, i fixed an error in your code and it worked.
– Manrique
Nov 21 '18 at 9:45
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
@Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
– Ali AzG
Nov 21 '18 at 9:46
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !
– Manrique
Nov 21 '18 at 9:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53408830%2fselect-columns-which-contains-a-string-in-pyspark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown