Why doesn't dataset's foreach method require an encoder, but map does?
up vote
1
down vote
favorite
I have two datasets: Dataset[User]
and Dataset[Book]
where both User
and Book
are case classes. I join them like this:
val joinDS = ds1.join(ds2, "userid")
If I try to map
over each element in joinDS
, the compiler complains that an encoder is missing:
not enough arguments for method map: (implicit evidence$46: org.apache.spark.sql.Encoder[Unit])org.apache.spark.sql.Dataset[Unit].
Unspecified value parameter evidence$46.
Unable to find encoder for type stored in a Dataset.
But the same error does not occur if I use foreach
instead of map
. Why doesn't foreach
require an encoder as well? I have imported all implicits from the spark session already, so why does map
require an encoder at all, when the dataset is a result of joining two datasets containing case classes)? Also, what type of dataset do I get from that join? Is it a Dataset[Row]
, or something else?
scala apache-spark
add a comment |
up vote
1
down vote
favorite
I have two datasets: Dataset[User]
and Dataset[Book]
where both User
and Book
are case classes. I join them like this:
val joinDS = ds1.join(ds2, "userid")
If I try to map
over each element in joinDS
, the compiler complains that an encoder is missing:
not enough arguments for method map: (implicit evidence$46: org.apache.spark.sql.Encoder[Unit])org.apache.spark.sql.Dataset[Unit].
Unspecified value parameter evidence$46.
Unable to find encoder for type stored in a Dataset.
But the same error does not occur if I use foreach
instead of map
. Why doesn't foreach
require an encoder as well? I have imported all implicits from the spark session already, so why does map
require an encoder at all, when the dataset is a result of joining two datasets containing case classes)? Also, what type of dataset do I get from that join? Is it a Dataset[Row]
, or something else?
scala apache-spark
Pretty sure you can't encodeUnit
.
– erip
Nov 8 at 18:15
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have two datasets: Dataset[User]
and Dataset[Book]
where both User
and Book
are case classes. I join them like this:
val joinDS = ds1.join(ds2, "userid")
If I try to map
over each element in joinDS
, the compiler complains that an encoder is missing:
not enough arguments for method map: (implicit evidence$46: org.apache.spark.sql.Encoder[Unit])org.apache.spark.sql.Dataset[Unit].
Unspecified value parameter evidence$46.
Unable to find encoder for type stored in a Dataset.
But the same error does not occur if I use foreach
instead of map
. Why doesn't foreach
require an encoder as well? I have imported all implicits from the spark session already, so why does map
require an encoder at all, when the dataset is a result of joining two datasets containing case classes)? Also, what type of dataset do I get from that join? Is it a Dataset[Row]
, or something else?
scala apache-spark
I have two datasets: Dataset[User]
and Dataset[Book]
where both User
and Book
are case classes. I join them like this:
val joinDS = ds1.join(ds2, "userid")
If I try to map
over each element in joinDS
, the compiler complains that an encoder is missing:
not enough arguments for method map: (implicit evidence$46: org.apache.spark.sql.Encoder[Unit])org.apache.spark.sql.Dataset[Unit].
Unspecified value parameter evidence$46.
Unable to find encoder for type stored in a Dataset.
But the same error does not occur if I use foreach
instead of map
. Why doesn't foreach
require an encoder as well? I have imported all implicits from the spark session already, so why does map
require an encoder at all, when the dataset is a result of joining two datasets containing case classes)? Also, what type of dataset do I get from that join? Is it a Dataset[Row]
, or something else?
scala apache-spark
scala apache-spark
asked Nov 8 at 17:51
vaer-k
3,49342031
3,49342031
Pretty sure you can't encodeUnit
.
– erip
Nov 8 at 18:15
add a comment |
Pretty sure you can't encodeUnit
.
– erip
Nov 8 at 18:15
Pretty sure you can't encode
Unit
.– erip
Nov 8 at 18:15
Pretty sure you can't encode
Unit
.– erip
Nov 8 at 18:15
add a comment |
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
TL;DR Encoder
is required to transform the outcome to the internal Spark SQL format and there is no need for that in case of foreach
(or any other sink).
Just take a look at the signatures. map
is
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
so in plain words it transforms records from T
to U
and then uses the Encoder
of U
to transform the result to internal representation.
foreach
from the other hand, is
def foreach(f: (T) ⇒ Unit): Unit
In other words it doesn't expect any result. Since there is no result to be stored, Encoder
is just obsolete.
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would needEncoder[T]
, notEncoder[U]
(or both).
– Alexey Romanov
Nov 9 at 8:21
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
TL;DR Encoder
is required to transform the outcome to the internal Spark SQL format and there is no need for that in case of foreach
(or any other sink).
Just take a look at the signatures. map
is
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
so in plain words it transforms records from T
to U
and then uses the Encoder
of U
to transform the result to internal representation.
foreach
from the other hand, is
def foreach(f: (T) ⇒ Unit): Unit
In other words it doesn't expect any result. Since there is no result to be stored, Encoder
is just obsolete.
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would needEncoder[T]
, notEncoder[U]
(or both).
– Alexey Romanov
Nov 9 at 8:21
add a comment |
up vote
4
down vote
accepted
TL;DR Encoder
is required to transform the outcome to the internal Spark SQL format and there is no need for that in case of foreach
(or any other sink).
Just take a look at the signatures. map
is
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
so in plain words it transforms records from T
to U
and then uses the Encoder
of U
to transform the result to internal representation.
foreach
from the other hand, is
def foreach(f: (T) ⇒ Unit): Unit
In other words it doesn't expect any result. Since there is no result to be stored, Encoder
is just obsolete.
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would needEncoder[T]
, notEncoder[U]
(or both).
– Alexey Romanov
Nov 9 at 8:21
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
TL;DR Encoder
is required to transform the outcome to the internal Spark SQL format and there is no need for that in case of foreach
(or any other sink).
Just take a look at the signatures. map
is
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
so in plain words it transforms records from T
to U
and then uses the Encoder
of U
to transform the result to internal representation.
foreach
from the other hand, is
def foreach(f: (T) ⇒ Unit): Unit
In other words it doesn't expect any result. Since there is no result to be stored, Encoder
is just obsolete.
TL;DR Encoder
is required to transform the outcome to the internal Spark SQL format and there is no need for that in case of foreach
(or any other sink).
Just take a look at the signatures. map
is
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
so in plain words it transforms records from T
to U
and then uses the Encoder
of U
to transform the result to internal representation.
foreach
from the other hand, is
def foreach(f: (T) ⇒ Unit): Unit
In other words it doesn't expect any result. Since there is no result to be stored, Encoder
is just obsolete.
edited Nov 8 at 18:21
answered Nov 8 at 18:19
user10465355
66729
66729
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would needEncoder[T]
, notEncoder[U]
(or both).
– Alexey Romanov
Nov 9 at 8:21
add a comment |
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would needEncoder[T]
, notEncoder[U]
(or both).
– Alexey Romanov
Nov 9 at 8:21
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
I see. I thought it needed to encode the input
– vaer-k
Nov 8 at 18:20
@vaer-k Then it would need
Encoder[T]
, not Encoder[U]
(or both).– Alexey Romanov
Nov 9 at 8:21
@vaer-k Then it would need
Encoder[T]
, not Encoder[U]
(or both).– Alexey Romanov
Nov 9 at 8:21
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213479%2fwhy-doesnt-datasets-foreach-method-require-an-encoder-but-map-does%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Pretty sure you can't encode
Unit
.– erip
Nov 8 at 18:15