Parse Dataframe and store output in a single file [duplicate]











up vote
0
down vote

favorite













This question already has an answer here:




  • Spark split a column value into multiple rows

    1 answer




I have a data frame using Spark SQL in Scala with columns A and B with values:



A | B
1 a|b|c
2 b|d
3 d|e|f


I need to store the output to a single textfile in following format



1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f


How can I do that?










share|improve this question















marked as duplicate by user6910411 apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 10 at 10:56


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















    up vote
    0
    down vote

    favorite













    This question already has an answer here:




    • Spark split a column value into multiple rows

      1 answer




    I have a data frame using Spark SQL in Scala with columns A and B with values:



    A | B
    1 a|b|c
    2 b|d
    3 d|e|f


    I need to store the output to a single textfile in following format



    1 a
    1 b
    1 c
    2 b
    2 d
    3 d
    3 e
    3 f


    How can I do that?










    share|improve this question















    marked as duplicate by user6910411 apache-spark
    Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    Nov 10 at 10:56


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite












      This question already has an answer here:




      • Spark split a column value into multiple rows

        1 answer




      I have a data frame using Spark SQL in Scala with columns A and B with values:



      A | B
      1 a|b|c
      2 b|d
      3 d|e|f


      I need to store the output to a single textfile in following format



      1 a
      1 b
      1 c
      2 b
      2 d
      3 d
      3 e
      3 f


      How can I do that?










      share|improve this question
















      This question already has an answer here:




      • Spark split a column value into multiple rows

        1 answer




      I have a data frame using Spark SQL in Scala with columns A and B with values:



      A | B
      1 a|b|c
      2 b|d
      3 d|e|f


      I need to store the output to a single textfile in following format



      1 a
      1 b
      1 c
      2 b
      2 d
      3 d
      3 e
      3 f


      How can I do that?





      This question already has an answer here:




      • Spark split a column value into multiple rows

        1 answer








      scala apache-spark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 10 at 9:41









      SCouto

      3,71531227




      3,71531227










      asked Nov 10 at 8:59









      Nick

      96110




      96110




      marked as duplicate by user6910411 apache-spark
      Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 10 at 10:56


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by user6910411 apache-spark
      Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 10 at 10:56


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          You can get the desired Dataframe with an expode and a split:



          val resultDF = df.withColumn("B", explode(split($"B", "\|")))


          Result



          +---+---+
          | A| B|
          +---+---+
          | 1| a|
          | 1| b|
          | 1| c|
          | 2| b|
          | 2| d|
          | 3| d|
          | 3| e|
          | 3| f|
          +---+---+


          Then you can save in a single file with a coalesce(1)



            resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")





          share|improve this answer





















          • explode function is not recognized in my code. What dependency do I need to add?
            – Nick
            Nov 10 at 10:17






          • 1




            this should be enough: import org.apache.spark.sql.functions._
            – SCouto
            Nov 10 at 10:20


















          up vote
          0
          down vote













          You can do something like,



          val df = ???
          val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

          resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")





          share|improve this answer





















          • explode(split(col : this part of your code is not recognized
            – Nick
            Nov 10 at 10:15










          • col comes from org.apache.spark.sql.functions
            – Chitral Verma
            Nov 10 at 11:16


















          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          You can get the desired Dataframe with an expode and a split:



          val resultDF = df.withColumn("B", explode(split($"B", "\|")))


          Result



          +---+---+
          | A| B|
          +---+---+
          | 1| a|
          | 1| b|
          | 1| c|
          | 2| b|
          | 2| d|
          | 3| d|
          | 3| e|
          | 3| f|
          +---+---+


          Then you can save in a single file with a coalesce(1)



            resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")





          share|improve this answer





















          • explode function is not recognized in my code. What dependency do I need to add?
            – Nick
            Nov 10 at 10:17






          • 1




            this should be enough: import org.apache.spark.sql.functions._
            – SCouto
            Nov 10 at 10:20















          up vote
          2
          down vote



          accepted










          You can get the desired Dataframe with an expode and a split:



          val resultDF = df.withColumn("B", explode(split($"B", "\|")))


          Result



          +---+---+
          | A| B|
          +---+---+
          | 1| a|
          | 1| b|
          | 1| c|
          | 2| b|
          | 2| d|
          | 3| d|
          | 3| e|
          | 3| f|
          +---+---+


          Then you can save in a single file with a coalesce(1)



            resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")





          share|improve this answer





















          • explode function is not recognized in my code. What dependency do I need to add?
            – Nick
            Nov 10 at 10:17






          • 1




            this should be enough: import org.apache.spark.sql.functions._
            – SCouto
            Nov 10 at 10:20













          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          You can get the desired Dataframe with an expode and a split:



          val resultDF = df.withColumn("B", explode(split($"B", "\|")))


          Result



          +---+---+
          | A| B|
          +---+---+
          | 1| a|
          | 1| b|
          | 1| c|
          | 2| b|
          | 2| d|
          | 3| d|
          | 3| e|
          | 3| f|
          +---+---+


          Then you can save in a single file with a coalesce(1)



            resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")





          share|improve this answer












          You can get the desired Dataframe with an expode and a split:



          val resultDF = df.withColumn("B", explode(split($"B", "\|")))


          Result



          +---+---+
          | A| B|
          +---+---+
          | 1| a|
          | 1| b|
          | 1| c|
          | 2| b|
          | 2| d|
          | 3| d|
          | 3| e|
          | 3| f|
          +---+---+


          Then you can save in a single file with a coalesce(1)



            resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 9:47









          SCouto

          3,71531227




          3,71531227












          • explode function is not recognized in my code. What dependency do I need to add?
            – Nick
            Nov 10 at 10:17






          • 1




            this should be enough: import org.apache.spark.sql.functions._
            – SCouto
            Nov 10 at 10:20


















          • explode function is not recognized in my code. What dependency do I need to add?
            – Nick
            Nov 10 at 10:17






          • 1




            this should be enough: import org.apache.spark.sql.functions._
            – SCouto
            Nov 10 at 10:20
















          explode function is not recognized in my code. What dependency do I need to add?
          – Nick
          Nov 10 at 10:17




          explode function is not recognized in my code. What dependency do I need to add?
          – Nick
          Nov 10 at 10:17




          1




          1




          this should be enough: import org.apache.spark.sql.functions._
          – SCouto
          Nov 10 at 10:20




          this should be enough: import org.apache.spark.sql.functions._
          – SCouto
          Nov 10 at 10:20












          up vote
          0
          down vote













          You can do something like,



          val df = ???
          val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

          resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")





          share|improve this answer





















          • explode(split(col : this part of your code is not recognized
            – Nick
            Nov 10 at 10:15










          • col comes from org.apache.spark.sql.functions
            – Chitral Verma
            Nov 10 at 11:16















          up vote
          0
          down vote













          You can do something like,



          val df = ???
          val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

          resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")





          share|improve this answer





















          • explode(split(col : this part of your code is not recognized
            – Nick
            Nov 10 at 10:15










          • col comes from org.apache.spark.sql.functions
            – Chitral Verma
            Nov 10 at 11:16













          up vote
          0
          down vote










          up vote
          0
          down vote









          You can do something like,



          val df = ???
          val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

          resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")





          share|improve this answer












          You can do something like,



          val df = ???
          val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

          resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 9:47









          Chitral Verma

          9241317




          9241317












          • explode(split(col : this part of your code is not recognized
            – Nick
            Nov 10 at 10:15










          • col comes from org.apache.spark.sql.functions
            – Chitral Verma
            Nov 10 at 11:16


















          • explode(split(col : this part of your code is not recognized
            – Nick
            Nov 10 at 10:15










          • col comes from org.apache.spark.sql.functions
            – Chitral Verma
            Nov 10 at 11:16
















          explode(split(col : this part of your code is not recognized
          – Nick
          Nov 10 at 10:15




          explode(split(col : this part of your code is not recognized
          – Nick
          Nov 10 at 10:15












          col comes from org.apache.spark.sql.functions
          – Chitral Verma
          Nov 10 at 11:16




          col comes from org.apache.spark.sql.functions
          – Chitral Verma
          Nov 10 at 11:16



          Popular posts from this blog

          Guess what letter conforming each word

          Run scheduled task as local user group (not BUILTIN)

          Port of Spain