Write DataFrame from Databricks to Data Lake
up vote
1
down vote
favorite
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the data I used the following:
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "<your-service-client-id>",
"dfs.adls.oauth2.credential": "<your-service-credentials>",
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}
dbutils.fs.mount(source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>",extra_configs = configs)
I want to write back a .csv file. For this task I am using the following line
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")
However, I get the following error:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
Any piece of code that can help me? Or link that walks me through.
Thanks.
azure azure-data-lake databricks
add a comment |
up vote
1
down vote
favorite
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the data I used the following:
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "<your-service-client-id>",
"dfs.adls.oauth2.credential": "<your-service-credentials>",
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}
dbutils.fs.mount(source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>",extra_configs = configs)
I want to write back a .csv file. For this task I am using the following line
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")
However, I get the following error:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
Any piece of code that can help me? Or link that walks me through.
Thanks.
azure azure-data-lake databricks
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the data I used the following:
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "<your-service-client-id>",
"dfs.adls.oauth2.credential": "<your-service-credentials>",
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}
dbutils.fs.mount(source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>",extra_configs = configs)
I want to write back a .csv file. For this task I am using the following line
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")
However, I get the following error:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
Any piece of code that can help me? Or link that walks me through.
Thanks.
azure azure-data-lake databricks
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the data I used the following:
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "<your-service-client-id>",
"dfs.adls.oauth2.credential": "<your-service-credentials>",
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}
dbutils.fs.mount(source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>",extra_configs = configs)
I want to write back a .csv file. For this task I am using the following line
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")
However, I get the following error:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
Any piece of code that can help me? Or link that walks me through.
Thanks.
azure azure-data-lake databricks
azure azure-data-lake databricks
edited Aug 3 at 14:38
asked Aug 3 at 13:24
FelipePerezR
337
337
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store
(ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:
dbutils.fs.ls("/mnt/<newmountpoint>")
So try after mounting ADLS Gen 1:
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")
This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.
Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store
(ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:
dbutils.fs.ls("/mnt/<newmountpoint>")
So try after mounting ADLS Gen 1:
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")
This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.
Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
add a comment |
up vote
1
down vote
If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store
(ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:
dbutils.fs.ls("/mnt/<newmountpoint>")
So try after mounting ADLS Gen 1:
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")
This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.
Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
add a comment |
up vote
1
down vote
up vote
1
down vote
If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store
(ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:
dbutils.fs.ls("/mnt/<newmountpoint>")
So try after mounting ADLS Gen 1:
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")
This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.
Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.
If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store
(ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:
dbutils.fs.ls("/mnt/<newmountpoint>")
So try after mounting ADLS Gen 1:
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")
This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.
Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.
edited Nov 11 at 16:37
answered Aug 4 at 22:04
Hauke Mallow
3661312
3661312
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
add a comment |
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
Mr. Mallow, can you suggest me some link where I can find good practices to work with Azure Databricks and Data Lake Storage Gen1? Thanks
– FelipePerezR
Aug 6 at 16:05
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
I updated my answer, pls check the doc and also if you have sufficient rights to access ADLS with the Service Principal.
– Hauke Mallow
Aug 6 at 16:37
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
Thanks. It worked for me. Any suggestion about "good practices"?
– FelipePerezR
Aug 6 at 20:23
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
I have another question regarding this. When I write the file back to data lake, A pseudorandom name is assigned, how can I choose the name that I want for such .csv file?
– FelipePerezR
Nov 10 at 17:20
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
That' s normal spark behaviour, see also stackoverflow.com/questions/31674530/… .
– Hauke Mallow
Nov 11 at 16:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51673712%2fwrite-dataframe-from-databricks-to-data-lake%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown