getting a NULL pointer exception when trying to use Spark IDF.fit()
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
trying to run this example in Spark documentation. Getting the error below. Get the same error using the Java version of the example as well. The exact line where I get the error is:
idfModel = idf.fit(featurizedData)
Py4JJavaError: An error occurred while calling o1142.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 256.0 failed 1 times, most recent failure: Lost task 7.0 in stage 256.0 (TID 3308, localhost): java.lang.NullPointerException
The data i'm using is obtained by reading a Json file which has few thousand records. In Java i'm reading the file as follows:
DataFrame myData = sqlContext.read().json("myJsonFile.json");
the rest of the code is exactly the same as in the example linked above. featurizedData is a valid DataFrame, I printed it's schema and the first element and everything looks as expected. I have no idea why I'm getting a null pointer exception.
java apache-spark pyspark spark-dataframe
add a comment |
trying to run this example in Spark documentation. Getting the error below. Get the same error using the Java version of the example as well. The exact line where I get the error is:
idfModel = idf.fit(featurizedData)
Py4JJavaError: An error occurred while calling o1142.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 256.0 failed 1 times, most recent failure: Lost task 7.0 in stage 256.0 (TID 3308, localhost): java.lang.NullPointerException
The data i'm using is obtained by reading a Json file which has few thousand records. In Java i'm reading the file as follows:
DataFrame myData = sqlContext.read().json("myJsonFile.json");
the rest of the code is exactly the same as in the example linked above. featurizedData is a valid DataFrame, I printed it's schema and the first element and everything looks as expected. I have no idea why I'm getting a null pointer exception.
java apache-spark pyspark spark-dataframe
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15
add a comment |
trying to run this example in Spark documentation. Getting the error below. Get the same error using the Java version of the example as well. The exact line where I get the error is:
idfModel = idf.fit(featurizedData)
Py4JJavaError: An error occurred while calling o1142.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 256.0 failed 1 times, most recent failure: Lost task 7.0 in stage 256.0 (TID 3308, localhost): java.lang.NullPointerException
The data i'm using is obtained by reading a Json file which has few thousand records. In Java i'm reading the file as follows:
DataFrame myData = sqlContext.read().json("myJsonFile.json");
the rest of the code is exactly the same as in the example linked above. featurizedData is a valid DataFrame, I printed it's schema and the first element and everything looks as expected. I have no idea why I'm getting a null pointer exception.
java apache-spark pyspark spark-dataframe
trying to run this example in Spark documentation. Getting the error below. Get the same error using the Java version of the example as well. The exact line where I get the error is:
idfModel = idf.fit(featurizedData)
Py4JJavaError: An error occurred while calling o1142.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 256.0 failed 1 times, most recent failure: Lost task 7.0 in stage 256.0 (TID 3308, localhost): java.lang.NullPointerException
The data i'm using is obtained by reading a Json file which has few thousand records. In Java i'm reading the file as follows:
DataFrame myData = sqlContext.read().json("myJsonFile.json");
the rest of the code is exactly the same as in the example linked above. featurizedData is a valid DataFrame, I printed it's schema and the first element and everything looks as expected. I have no idea why I'm getting a null pointer exception.
java apache-spark pyspark spark-dataframe
java apache-spark pyspark spark-dataframe
asked Dec 22 '15 at 16:03
KaiKai
5912722
5912722
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15
add a comment |
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15
add a comment |
1 Answer
1
active
oldest
votes
The problem is you have nan
as the text field for some columns.
Since the question is tagged with PySpark
, use
data_nan_imputed = data.fillna("unknown", subset=["text_col1", .., "text_coln"])
This is a good practice if you have a number of text_cols that you want to combine them to make a single text_col. Otherwise, you can also use
data_nan_dropped = data.dropna()
to get rid of the nan
columns and then fit this dataset. Hopefully, it will work.
For scala
or java
use similar nan
filling statements.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34419850%2fgetting-a-null-pointer-exception-when-trying-to-use-spark-idf-fit%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem is you have nan
as the text field for some columns.
Since the question is tagged with PySpark
, use
data_nan_imputed = data.fillna("unknown", subset=["text_col1", .., "text_coln"])
This is a good practice if you have a number of text_cols that you want to combine them to make a single text_col. Otherwise, you can also use
data_nan_dropped = data.dropna()
to get rid of the nan
columns and then fit this dataset. Hopefully, it will work.
For scala
or java
use similar nan
filling statements.
add a comment |
The problem is you have nan
as the text field for some columns.
Since the question is tagged with PySpark
, use
data_nan_imputed = data.fillna("unknown", subset=["text_col1", .., "text_coln"])
This is a good practice if you have a number of text_cols that you want to combine them to make a single text_col. Otherwise, you can also use
data_nan_dropped = data.dropna()
to get rid of the nan
columns and then fit this dataset. Hopefully, it will work.
For scala
or java
use similar nan
filling statements.
add a comment |
The problem is you have nan
as the text field for some columns.
Since the question is tagged with PySpark
, use
data_nan_imputed = data.fillna("unknown", subset=["text_col1", .., "text_coln"])
This is a good practice if you have a number of text_cols that you want to combine them to make a single text_col. Otherwise, you can also use
data_nan_dropped = data.dropna()
to get rid of the nan
columns and then fit this dataset. Hopefully, it will work.
For scala
or java
use similar nan
filling statements.
The problem is you have nan
as the text field for some columns.
Since the question is tagged with PySpark
, use
data_nan_imputed = data.fillna("unknown", subset=["text_col1", .., "text_coln"])
This is a good practice if you have a number of text_cols that you want to combine them to make a single text_col. Otherwise, you can also use
data_nan_dropped = data.dropna()
to get rid of the nan
columns and then fit this dataset. Hopefully, it will work.
For scala
or java
use similar nan
filling statements.
edited Nov 22 '18 at 12:46
ayaio
59.1k20135189
59.1k20135189
answered Nov 22 '18 at 12:44
lU5erlU5er
1,28711525
1,28711525
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34419850%2fgetting-a-null-pointer-exception-when-trying-to-use-spark-idf-fit%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Any workaround ?
– gtzinos
Dec 31 '17 at 8:15