Reconstructing and files uploaded in SQL Server with python

I am working with a SQL Server database table similar to this

USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext

sample data:

USER_ID:      1

FILE_NAME:    (AttachedFiles:1)=file1.pdf

FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….

Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:

content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"

My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(content_str))

...getting a TypeError: expected bytes-like object, not str

Investigating further, I found this other post and proceeded like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

encoded = content_str.encode('ascii')

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(encoded))

...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.

I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

add a comment |

I am working with a SQL Server database table similar to this

USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext

sample data:

USER_ID:      1

FILE_NAME:    (AttachedFiles:1)=file1.pdf

FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….

Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:

content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"

My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(content_str))

...getting a TypeError: expected bytes-like object, not str

Investigating further, I found this other post and proceeded like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

encoded = content_str.encode('ascii')

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(encoded))

...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.

I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

add a comment |

I am working with a SQL Server database table similar to this

USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext

sample data:

USER_ID:      1

FILE_NAME:    (AttachedFiles:1)=file1.pdf

FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….

Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:

content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"

My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(content_str))

...getting a TypeError: expected bytes-like object, not str

Investigating further, I found this other post and proceeded like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

encoded = content_str.encode('ascii')

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(encoded))

...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.

I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

I am working with a SQL Server database table similar to this

USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext

sample data:

USER_ID:      1

FILE_NAME:    (AttachedFiles:1)=file1.pdf

FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….

Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:

content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"

My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(content_str))

...getting a TypeError: expected bytes-like object, not str

Investigating further, I found this other post and proceeded like this:

content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'

encoded = content_str.encode('ascii')

with open(os.path.expanduser('test.pdf'), 'wb') as f:

    f.write(base64.decodestring(encoded))

...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.

I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!

python sql-server

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

edited Nov 19 '18 at 19:56

Tomalak

258k51428546

asked Nov 19 '18 at 16:26

DanielaC

111

asked Nov 19 '18 at 16:26

DanielaC

111

asked Nov 19 '18 at 16:26

DanielaC

111

add a comment |

1 Answer
1

active

oldest

votes

The value of the FILE_CONTENT is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.

import base64



content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(base64.b64decode(content_str))

The base64 sequence "H4sI" at the start of your content string translates to the bytes 0x1f, 0x8b, 0x08. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.

I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.

If your PDF reader does not accept the data as is, decompress it before saving it to file:

import gzip



# ...



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(gzip.decompress(base64.b64decode(content_str)))

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378886%2freconstructing-and-files-uploaded-in-sql-server-with-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

import base64



content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(base64.b64decode(content_str))

If your PDF reader does not accept the data as is, decompress it before saving it to file:

import gzip



# ...



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(gzip.decompress(base64.b64decode(content_str)))

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

|
show 4 more comments

import base64



content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(base64.b64decode(content_str))

If your PDF reader does not accept the data as is, decompress it before saving it to file:

import gzip



# ...



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(gzip.decompress(base64.b64decode(content_str)))

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

|
show 4 more comments

import base64



content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(base64.b64decode(content_str))

If your PDF reader does not accept the data as is, decompress it before saving it to file:

import gzip



# ...



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(gzip.decompress(base64.b64decode(content_str)))

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

import base64



content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(base64.b64decode(content_str))

If your PDF reader does not accept the data as is, decompress it before saving it to file:

import gzip



# ...



with open(os.path.expanduser('test.pdf'), 'wb') as fp:

    fp.write(gzip.decompress(base64.b64decode(content_str)))

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

edited Nov 19 '18 at 20:28

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

answered Nov 19 '18 at 20:16

Tomalak

258k51428546

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

|
show 4 more comments

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.

– DanielaC
Nov 20 '18 at 15:23

First, try to write the stream to file without passing it through gzip.decompress(). Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress() is not the right tool yet, it was an educated guess of mine.

– Tomalak
Nov 20 '18 at 15:29

I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!

– DanielaC
Nov 20 '18 at 15:58

Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.

– Tomalak
Nov 20 '18 at 16:02

Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test

– DanielaC
Nov 20 '18 at 16:52

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk