Can a large file be hashed down to 32 bytes, and then reconstructed from the hash? [duplicate]












6












$begingroup$



This question already has an answer here:




  • Would it be possible to generate the original data from a SHA-512 checksum?

    5 answers




We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question











$endgroup$



marked as duplicate by user61539, Maarten Bodewes, e-sushi Nov 26 '18 at 10:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – SEJPM
    Nov 22 '18 at 12:33










  • $begingroup$
    The POINT of a hash is that it is not reversible.
    $endgroup$
    – rackandboneman
    Nov 23 '18 at 12:57










  • $begingroup$
    Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
    $endgroup$
    – e-sushi
    Nov 26 '18 at 10:16
















6












$begingroup$



This question already has an answer here:




  • Would it be possible to generate the original data from a SHA-512 checksum?

    5 answers




We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question











$endgroup$



marked as duplicate by user61539, Maarten Bodewes, e-sushi Nov 26 '18 at 10:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – SEJPM
    Nov 22 '18 at 12:33










  • $begingroup$
    The POINT of a hash is that it is not reversible.
    $endgroup$
    – rackandboneman
    Nov 23 '18 at 12:57










  • $begingroup$
    Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
    $endgroup$
    – e-sushi
    Nov 26 '18 at 10:16














6












6








6


10



$begingroup$



This question already has an answer here:




  • Would it be possible to generate the original data from a SHA-512 checksum?

    5 answers




We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question











$endgroup$





This question already has an answer here:




  • Would it be possible to generate the original data from a SHA-512 checksum?

    5 answers




We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?





This question already has an answer here:




  • Would it be possible to generate the original data from a SHA-512 checksum?

    5 answers








hash compression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 11:35









Ilmari Karonen

35.1k373138




35.1k373138










asked Nov 20 '18 at 9:39









Anu DavisAnu Davis

138119




138119




marked as duplicate by user61539, Maarten Bodewes, e-sushi Nov 26 '18 at 10:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by user61539, Maarten Bodewes, e-sushi Nov 26 '18 at 10:08


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – SEJPM
    Nov 22 '18 at 12:33










  • $begingroup$
    The POINT of a hash is that it is not reversible.
    $endgroup$
    – rackandboneman
    Nov 23 '18 at 12:57










  • $begingroup$
    Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
    $endgroup$
    – e-sushi
    Nov 26 '18 at 10:16


















  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – SEJPM
    Nov 22 '18 at 12:33










  • $begingroup$
    The POINT of a hash is that it is not reversible.
    $endgroup$
    – rackandboneman
    Nov 23 '18 at 12:57










  • $begingroup$
    Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
    $endgroup$
    – e-sushi
    Nov 26 '18 at 10:16
















$begingroup$
Comments are not for extended discussion; this conversation has been moved to chat.
$endgroup$
– SEJPM
Nov 22 '18 at 12:33




$begingroup$
Comments are not for extended discussion; this conversation has been moved to chat.
$endgroup$
– SEJPM
Nov 22 '18 at 12:33












$begingroup$
The POINT of a hash is that it is not reversible.
$endgroup$
– rackandboneman
Nov 23 '18 at 12:57




$begingroup$
The POINT of a hash is that it is not reversible.
$endgroup$
– rackandboneman
Nov 23 '18 at 12:57












$begingroup$
Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
$endgroup$
– e-sushi
Nov 26 '18 at 10:16




$begingroup$
Please research the site for similar questions before asking. That should help you avoid those constant duplicate flags.
$endgroup$
– e-sushi
Nov 26 '18 at 10:16










6 Answers
6






active

oldest

votes


















69












$begingroup$

No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





So how can something like IPFS work, then?



Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



(Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





*) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






share|improve this answer











$endgroup$









  • 16




    $begingroup$
    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    $endgroup$
    – JBentley
    Nov 21 '18 at 19:49








  • 3




    $begingroup$
    @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
    $endgroup$
    – oakad
    Nov 22 '18 at 3:12








  • 3




    $begingroup$
    gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
    $endgroup$
    – oakad
    Nov 22 '18 at 6:30






  • 6




    $begingroup$
    @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
    $endgroup$
    – OrangeDog
    Nov 22 '18 at 11:16








  • 3




    $begingroup$
    @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
    $endgroup$
    – Paŭlo Ebermann
    Nov 22 '18 at 23:46



















19












$begingroup$

No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






share|improve this answer











$endgroup$













  • $begingroup$
    Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    $endgroup$
    – Anu Davis
    Nov 20 '18 at 11:14








  • 6




    $begingroup$
    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    $endgroup$
    – Ruben De Smet
    Nov 20 '18 at 12:11












  • $begingroup$
    Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
    $endgroup$
    – Martin Bonner
    Nov 22 '18 at 15:08










  • $begingroup$
    Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
    $endgroup$
    – Maarten Bodewes
    Jan 4 at 12:19



















6












$begingroup$

The other answers are correct, there is no way to recover data from a hash.



From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






share|improve this answer









$endgroup$













  • $begingroup$
    Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    $endgroup$
    – Paul Uszak
    Nov 21 '18 at 17:33








  • 2




    $begingroup$
    I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    $endgroup$
    – cakins
    Nov 21 '18 at 20:43










  • $begingroup$
    @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:18



















5












$begingroup$

Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





(*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found. More rigorously, we say that it should be computationally infeasible for an attacker to create a matching input.






share|improve this answer











$endgroup$













  • $begingroup$
    For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:26





















2












$begingroup$

If you have a suitably small and finite set of files, that just coincidentally have unique hashes, then yes you can derive the file from the hash, using the hash as a key.



To guarantee that hash collisions do not occur for your "library" of files, you can reserve a "scratch area" (a gap somewhere) for tie breaking purposes and populate it with arbitrary values that produce a new, unique key for the whole file, which has now changed.



It is probably easier just to use a synthetic key from the outset a la Library of Congress or ISBN.






share|improve this answer









$endgroup$





















    0












    $begingroup$

    It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






    share|improve this answer









    $endgroup$









    • 2




      $begingroup$
      Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
      $endgroup$
      – AleksanderRas
      Nov 20 '18 at 13:25


















    6 Answers
    6






    active

    oldest

    votes








    6 Answers
    6






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    69












    $begingroup$

    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer











    $endgroup$









    • 16




      $begingroup$
      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      $endgroup$
      – JBentley
      Nov 21 '18 at 19:49








    • 3




      $begingroup$
      @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
      $endgroup$
      – oakad
      Nov 22 '18 at 3:12








    • 3




      $begingroup$
      gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
      $endgroup$
      – oakad
      Nov 22 '18 at 6:30






    • 6




      $begingroup$
      @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
      $endgroup$
      – OrangeDog
      Nov 22 '18 at 11:16








    • 3




      $begingroup$
      @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
      $endgroup$
      – Paŭlo Ebermann
      Nov 22 '18 at 23:46
















    69












    $begingroup$

    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer











    $endgroup$









    • 16




      $begingroup$
      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      $endgroup$
      – JBentley
      Nov 21 '18 at 19:49








    • 3




      $begingroup$
      @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
      $endgroup$
      – oakad
      Nov 22 '18 at 3:12








    • 3




      $begingroup$
      gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
      $endgroup$
      – oakad
      Nov 22 '18 at 6:30






    • 6




      $begingroup$
      @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
      $endgroup$
      – OrangeDog
      Nov 22 '18 at 11:16








    • 3




      $begingroup$
      @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
      $endgroup$
      – Paŭlo Ebermann
      Nov 22 '18 at 23:46














    69












    69








    69





    $begingroup$

    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer











    $endgroup$



    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 21 '18 at 13:22

























    answered Nov 20 '18 at 12:30









    Ilmari KaronenIlmari Karonen

    35.1k373138




    35.1k373138








    • 16




      $begingroup$
      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      $endgroup$
      – JBentley
      Nov 21 '18 at 19:49








    • 3




      $begingroup$
      @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
      $endgroup$
      – oakad
      Nov 22 '18 at 3:12








    • 3




      $begingroup$
      gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
      $endgroup$
      – oakad
      Nov 22 '18 at 6:30






    • 6




      $begingroup$
      @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
      $endgroup$
      – OrangeDog
      Nov 22 '18 at 11:16








    • 3




      $begingroup$
      @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
      $endgroup$
      – Paŭlo Ebermann
      Nov 22 '18 at 23:46














    • 16




      $begingroup$
      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      $endgroup$
      – JBentley
      Nov 21 '18 at 19:49








    • 3




      $begingroup$
      @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
      $endgroup$
      – oakad
      Nov 22 '18 at 3:12








    • 3




      $begingroup$
      gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
      $endgroup$
      – oakad
      Nov 22 '18 at 6:30






    • 6




      $begingroup$
      @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
      $endgroup$
      – OrangeDog
      Nov 22 '18 at 11:16








    • 3




      $begingroup$
      @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
      $endgroup$
      – Paŭlo Ebermann
      Nov 22 '18 at 23:46








    16




    16




    $begingroup$
    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    $endgroup$
    – JBentley
    Nov 21 '18 at 19:49






    $begingroup$
    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    $endgroup$
    – JBentley
    Nov 21 '18 at 19:49






    3




    3




    $begingroup$
    @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
    $endgroup$
    – oakad
    Nov 22 '18 at 3:12






    $begingroup$
    @Sean However, an ASCII file containing the entirety of Shakespear's works with some extra textual data in the preamble comes in at 5458199 bytes. :-)
    $endgroup$
    – oakad
    Nov 22 '18 at 3:12






    3




    3




    $begingroup$
    gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
    $endgroup$
    – oakad
    Nov 22 '18 at 6:30




    $begingroup$
    gutenberg.org/files/100/100-0.txt - this has slightly more bytes than the original revision (because of all the copyright nonsense), but it's still pretty close to 5 MiB in size.
    $endgroup$
    – oakad
    Nov 22 '18 at 6:30




    6




    6




    $begingroup$
    @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
    $endgroup$
    – OrangeDog
    Nov 22 '18 at 11:16






    $begingroup$
    @JBentley under your compression scheme, both stringThatIs32Bytes and completeWorksOfShakespeare compress to stringThatIs32Bytes
    $endgroup$
    – OrangeDog
    Nov 22 '18 at 11:16






    3




    3




    $begingroup$
    @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
    $endgroup$
    – Paŭlo Ebermann
    Nov 22 '18 at 23:46




    $begingroup$
    @OrangeDog this is easily remedied by having stringThatIs32Bytes instead "compress" e.g. to completeWorksOfShakespeare.
    $endgroup$
    – Paŭlo Ebermann
    Nov 22 '18 at 23:46











    19












    $begingroup$

    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      $endgroup$
      – Anu Davis
      Nov 20 '18 at 11:14








    • 6




      $begingroup$
      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      $endgroup$
      – Ruben De Smet
      Nov 20 '18 at 12:11












    • $begingroup$
      Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
      $endgroup$
      – Martin Bonner
      Nov 22 '18 at 15:08










    • $begingroup$
      Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
      $endgroup$
      – Maarten Bodewes
      Jan 4 at 12:19
















    19












    $begingroup$

    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      $endgroup$
      – Anu Davis
      Nov 20 '18 at 11:14








    • 6




      $begingroup$
      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      $endgroup$
      – Ruben De Smet
      Nov 20 '18 at 12:11












    • $begingroup$
      Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
      $endgroup$
      – Martin Bonner
      Nov 22 '18 at 15:08










    • $begingroup$
      Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
      $endgroup$
      – Maarten Bodewes
      Jan 4 at 12:19














    19












    19








    19





    $begingroup$

    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer











    $endgroup$



    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 20 '18 at 10:19

























    answered Nov 20 '18 at 10:02









    AleksanderRasAleksanderRas

    2,5621834




    2,5621834












    • $begingroup$
      Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      $endgroup$
      – Anu Davis
      Nov 20 '18 at 11:14








    • 6




      $begingroup$
      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      $endgroup$
      – Ruben De Smet
      Nov 20 '18 at 12:11












    • $begingroup$
      Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
      $endgroup$
      – Martin Bonner
      Nov 22 '18 at 15:08










    • $begingroup$
      Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
      $endgroup$
      – Maarten Bodewes
      Jan 4 at 12:19


















    • $begingroup$
      Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      $endgroup$
      – Anu Davis
      Nov 20 '18 at 11:14








    • 6




      $begingroup$
      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      $endgroup$
      – Ruben De Smet
      Nov 20 '18 at 12:11












    • $begingroup$
      Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
      $endgroup$
      – Martin Bonner
      Nov 22 '18 at 15:08










    • $begingroup$
      Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
      $endgroup$
      – Maarten Bodewes
      Jan 4 at 12:19
















    $begingroup$
    Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    $endgroup$
    – Anu Davis
    Nov 20 '18 at 11:14






    $begingroup$
    Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    $endgroup$
    – Anu Davis
    Nov 20 '18 at 11:14






    6




    6




    $begingroup$
    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    $endgroup$
    – Ruben De Smet
    Nov 20 '18 at 12:11






    $begingroup$
    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    $endgroup$
    – Ruben De Smet
    Nov 20 '18 at 12:11














    $begingroup$
    Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
    $endgroup$
    – Martin Bonner
    Nov 22 '18 at 15:08




    $begingroup$
    Specific hashes may limit the message length, but it's not that hard to write a padding scheme which can cope with an arbitrary message length.
    $endgroup$
    – Martin Bonner
    Nov 22 '18 at 15:08












    $begingroup$
    Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
    $endgroup$
    – Maarten Bodewes
    Jan 4 at 12:19




    $begingroup$
    Small note: it is certainly possible to guess the input if enough is known about the input file. It's easy to distinguish between two known 5 MB files if most / all data of the files is known or can be guessed.
    $endgroup$
    – Maarten Bodewes
    Jan 4 at 12:19











    6












    $begingroup$

    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      $endgroup$
      – Paul Uszak
      Nov 21 '18 at 17:33








    • 2




      $begingroup$
      I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      $endgroup$
      – cakins
      Nov 21 '18 at 20:43










    • $begingroup$
      @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:18
















    6












    $begingroup$

    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      $endgroup$
      – Paul Uszak
      Nov 21 '18 at 17:33








    • 2




      $begingroup$
      I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      $endgroup$
      – cakins
      Nov 21 '18 at 20:43










    • $begingroup$
      @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:18














    6












    6








    6





    $begingroup$

    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer









    $endgroup$



    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 21 '18 at 17:05









    cakinscakins

    691




    691












    • $begingroup$
      Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      $endgroup$
      – Paul Uszak
      Nov 21 '18 at 17:33








    • 2




      $begingroup$
      I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      $endgroup$
      – cakins
      Nov 21 '18 at 20:43










    • $begingroup$
      @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:18


















    • $begingroup$
      Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      $endgroup$
      – Paul Uszak
      Nov 21 '18 at 17:33








    • 2




      $begingroup$
      I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      $endgroup$
      – cakins
      Nov 21 '18 at 20:43










    • $begingroup$
      @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:18
















    $begingroup$
    Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    $endgroup$
    – Paul Uszak
    Nov 21 '18 at 17:33






    $begingroup$
    Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    $endgroup$
    – Paul Uszak
    Nov 21 '18 at 17:33






    2




    2




    $begingroup$
    I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    $endgroup$
    – cakins
    Nov 21 '18 at 20:43




    $begingroup$
    I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    $endgroup$
    – cakins
    Nov 21 '18 at 20:43












    $begingroup$
    @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:18




    $begingroup$
    @cakins plus one to your comment for an interesting thought experiment, totally appropriate I think to inspire new thoughts and understanding. however it might be helpful to edit in a note that's it's conjecture.
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:18











    5












    $begingroup$

    Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





    (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found. More rigorously, we say that it should be computationally infeasible for an attacker to create a matching input.






    share|improve this answer











    $endgroup$













    • $begingroup$
      For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:26


















    5












    $begingroup$

    Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





    (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found. More rigorously, we say that it should be computationally infeasible for an attacker to create a matching input.






    share|improve this answer











    $endgroup$













    • $begingroup$
      For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:26
















    5












    5








    5





    $begingroup$

    Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





    (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found. More rigorously, we say that it should be computationally infeasible for an attacker to create a matching input.






    share|improve this answer











    $endgroup$



    Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





    (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found. More rigorously, we say that it should be computationally infeasible for an attacker to create a matching input.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 22 '18 at 20:41

























    answered Nov 21 '18 at 22:08









    Toby SpeightToby Speight

    1596




    1596












    • $begingroup$
      For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:26




















    • $begingroup$
      For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
      $endgroup$
      – whitneyland
      Nov 22 '18 at 20:26


















    $begingroup$
    For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:26






    $begingroup$
    For completeness i would just note that it's only theoretically impossible when you assume a certain context and problem space, and your context is appropriate given the topic of cryptography. However since the answers and comments seem to veer well into practical data compression issues, others should note it's not theoretically impossible for other applications if your context doesn't rule out massive dictionaries (see my comment above in Illmari's answer).
    $endgroup$
    – whitneyland
    Nov 22 '18 at 20:26













    2












    $begingroup$

    If you have a suitably small and finite set of files, that just coincidentally have unique hashes, then yes you can derive the file from the hash, using the hash as a key.



    To guarantee that hash collisions do not occur for your "library" of files, you can reserve a "scratch area" (a gap somewhere) for tie breaking purposes and populate it with arbitrary values that produce a new, unique key for the whole file, which has now changed.



    It is probably easier just to use a synthetic key from the outset a la Library of Congress or ISBN.






    share|improve this answer









    $endgroup$


















      2












      $begingroup$

      If you have a suitably small and finite set of files, that just coincidentally have unique hashes, then yes you can derive the file from the hash, using the hash as a key.



      To guarantee that hash collisions do not occur for your "library" of files, you can reserve a "scratch area" (a gap somewhere) for tie breaking purposes and populate it with arbitrary values that produce a new, unique key for the whole file, which has now changed.



      It is probably easier just to use a synthetic key from the outset a la Library of Congress or ISBN.






      share|improve this answer









      $endgroup$
















        2












        2








        2





        $begingroup$

        If you have a suitably small and finite set of files, that just coincidentally have unique hashes, then yes you can derive the file from the hash, using the hash as a key.



        To guarantee that hash collisions do not occur for your "library" of files, you can reserve a "scratch area" (a gap somewhere) for tie breaking purposes and populate it with arbitrary values that produce a new, unique key for the whole file, which has now changed.



        It is probably easier just to use a synthetic key from the outset a la Library of Congress or ISBN.






        share|improve this answer









        $endgroup$



        If you have a suitably small and finite set of files, that just coincidentally have unique hashes, then yes you can derive the file from the hash, using the hash as a key.



        To guarantee that hash collisions do not occur for your "library" of files, you can reserve a "scratch area" (a gap somewhere) for tie breaking purposes and populate it with arbitrary values that produce a new, unique key for the whole file, which has now changed.



        It is probably easier just to use a synthetic key from the outset a la Library of Congress or ISBN.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 2:01









        mckenzmmckenzm

        1491




        1491























            0












            $begingroup$

            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer









            $endgroup$









            • 2




              $begingroup$
              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              $endgroup$
              – AleksanderRas
              Nov 20 '18 at 13:25
















            0












            $begingroup$

            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer









            $endgroup$









            • 2




              $begingroup$
              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              $endgroup$
              – AleksanderRas
              Nov 20 '18 at 13:25














            0












            0








            0





            $begingroup$

            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer









            $endgroup$



            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 20 '18 at 12:18









            SoorajSooraj

            311




            311








            • 2




              $begingroup$
              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              $endgroup$
              – AleksanderRas
              Nov 20 '18 at 13:25














            • 2




              $begingroup$
              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              $endgroup$
              – AleksanderRas
              Nov 20 '18 at 13:25








            2




            2




            $begingroup$
            Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
            $endgroup$
            – AleksanderRas
            Nov 20 '18 at 13:25




            $begingroup$
            Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
            $endgroup$
            – AleksanderRas
            Nov 20 '18 at 13:25



            Popular posts from this blog

            Guess what letter conforming each word

            Run scheduled task as local user group (not BUILTIN)

            Port of Spain