Parse Dataframe and store output in a single file [duplicate]

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

scala apache-spark apache-spark-sql

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

edited Nov 10 at 9:41

SCouto

3,71531227

edited Nov 10 at 9:41

SCouto

3,71531227

edited Nov 10 at 9:41

SCouto

3,71531227

asked Nov 10 at 8:59

Nick

96110

asked Nov 10 at 8:59

Nick

96110

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,71531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9241317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,71531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,71531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,71531227

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,71531227

answered Nov 10 at 9:47

SCouto

3,71531227

answered Nov 10 at 9:47

SCouto

3,71531227

answered Nov 10 at 9:47

SCouto

3,71531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9241317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9241317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9241317

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9241317

answered Nov 10 at 9:47

Chitral Verma

9241317

answered Nov 10 at 9:47

Chitral Verma

9241317

answered Nov 10 at 9:47

Chitral Verma

9241317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk