Python: Reversibly encode alphanumeric string to integer





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:



string --> int --> string



In other words, I want to represent an alphanumeric string by an integer.



I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.



Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.



This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.










share|improve this question































    1















    I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:



    string --> int --> string



    In other words, I want to represent an alphanumeric string by an integer.



    I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.



    Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.



    This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.










    share|improve this question



























      1












      1








      1


      1






      I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:



      string --> int --> string



      In other words, I want to represent an alphanumeric string by an integer.



      I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.



      Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.



      This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.










      share|improve this question
















      I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:



      string --> int --> string



      In other words, I want to represent an alphanumeric string by an integer.



      I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.



      Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.



      This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.







      python string encoding int






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 2 at 23:15









      A-B-B

      24.4k66470




      24.4k66470










      asked Nov 21 '18 at 21:28









      charel-fcharel-f

      339614




      339614
























          3 Answers
          3






          active

          oldest

          votes


















          2














          Here's what I have so far:



          string --> bytes



          mBytes = m.encode("utf-8")


          bytes --> int



          mInt = int.from_bytes(mBytes, byteorder="big")


          int --> bytes



          mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")


          bytes --> string



          m = mBytes.decode("utf-8")


          try it out:



          m = "test123"
          mBytes = m.encode("utf-8")
          mInt = int.from_bytes(mBytes, byteorder="big")
          mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
          m2 = mBytes2.decode("utf-8")
          print(m == m2)




          Here is an identical reusable version of the above:



          class BytesIntEncoder:

          @staticmethod
          def encode(b: bytes) -> int:
          return int.from_bytes(b, byteorder='big')

          @staticmethod
          def decode(i: int) -> bytes:
          return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')


          If you're using Python <3.6, remove the optional type annotations.



          Test:



          >>> s = 'Test123'
          >>> b = s.encode()
          >>> b
          b'Test123'

          >>> BytesIntEncoder.encode(b)
          23755444588720691
          >>> BytesIntEncoder.decode(_)
          b'Test123'
          >>> _.decode()
          'Test123'





          share|improve this answer





















          • 1





            This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

            – PM 2Ring
            Feb 3 at 8:23






          • 1





            BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

            – PM 2Ring
            Feb 3 at 8:25






          • 1





            @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

            – PM 2Ring
            Feb 3 at 8:43



















          1














          Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.



          This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.



          This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.



          Unit tests are also included.



          import string


          class BytesIntEncoder:

          def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
          num_chars = len(chars)
          translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
          self._translation_table = bytes.maketrans(chars, translation)
          self._reverse_translation_table = bytes.maketrans(translation, chars)
          self._num_bits_per_char = (num_chars + 1).bit_length()

          def encode(self, chars: bytes) -> int:
          num_bits_per_char = self._num_bits_per_char
          output, bit_idx = 0, 0
          for chr_idx in chars.translate(self._translation_table):
          output |= (chr_idx << bit_idx)
          bit_idx += num_bits_per_char
          return output

          def decode(self, i: int) -> bytes:
          maxint = (2 ** self._num_bits_per_char) - 1
          output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
          return output.translate(self._reverse_translation_table)


          # Test
          import itertools
          import random
          import unittest


          class TestBytesIntEncoder(unittest.TestCase):

          chars = string.ascii_letters + string.digits
          encoder = BytesIntEncoder(chars.encode())

          def _test_encoding(self, b_in: bytes):
          i = self.encoder.encode(b_in)
          self.assertIsInstance(i, int)
          b_out = self.encoder.decode(i)
          self.assertIsInstance(b_out, bytes)
          self.assertEqual(b_in, b_out)
          # print(b_in, i)

          def test_thoroughly_with_small_str(self):
          for s_len in range(4):
          for s in itertools.combinations_with_replacement(self.chars, s_len):
          s = ''.join(s)
          b_in = s.encode()
          self._test_encoding(b_in)

          def test_randomly_with_large_str(self):
          for s_len in range(256):
          num_samples = {s_len <= 16: 2 ** s_len,
          16 < s_len <= 32: s_len ** 2,
          s_len > 32: s_len * 2,
          s_len > 64: s_len,
          s_len > 128: 2}[True]
          # print(s_len, num_samples)
          for _ in range(num_samples):
          b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
          self._test_encoding(b_in)


          if __name__ == '__main__':
          unittest.main()


          Usage example:



          >>> encoder = BytesIntEncoder()
          >>> s = 'Test123'
          >>> b = s.encode()
          >>> b
          b'Test123'

          >>> encoder.encode(b)
          3908257788270
          >>> encoder.decode(_)
          b'Test123'





          share|improve this answer





















          • 1





            Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

            – charel-f
            Feb 3 at 8:31





















          1














          Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.



          This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.



          Credit: this answer.



          from binascii import hexlify, unhexlify

          class BytesIntEncoder:

          @staticmethod
          def encode(b: bytes) -> int:
          return int(hexlify(b), 16) if b != b'' else 0

          @staticmethod
          def decode(i: int) -> int:
          return unhexlify('%x' % i) if i != 0 else b''


          If you're using Python <3.6, remove the optional type annotations.



          Quick test:



          >>> s = 'Test123'
          >>> b = s.encode()
          >>> b
          b'Test123'

          >>> BytesIntEncoder.encode(b)
          23755444588720691
          >>> BytesIntEncoder.decode(_)
          b'Test123'
          >>> _.decode()
          'Test123'





          share|improve this answer


























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420705%2fpython-reversibly-encode-alphanumeric-string-to-integer%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            Here's what I have so far:



            string --> bytes



            mBytes = m.encode("utf-8")


            bytes --> int



            mInt = int.from_bytes(mBytes, byteorder="big")


            int --> bytes



            mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")


            bytes --> string



            m = mBytes.decode("utf-8")


            try it out:



            m = "test123"
            mBytes = m.encode("utf-8")
            mInt = int.from_bytes(mBytes, byteorder="big")
            mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
            m2 = mBytes2.decode("utf-8")
            print(m == m2)




            Here is an identical reusable version of the above:



            class BytesIntEncoder:

            @staticmethod
            def encode(b: bytes) -> int:
            return int.from_bytes(b, byteorder='big')

            @staticmethod
            def decode(i: int) -> bytes:
            return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')


            If you're using Python <3.6, remove the optional type annotations.



            Test:



            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> BytesIntEncoder.encode(b)
            23755444588720691
            >>> BytesIntEncoder.decode(_)
            b'Test123'
            >>> _.decode()
            'Test123'





            share|improve this answer





















            • 1





              This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

              – PM 2Ring
              Feb 3 at 8:23






            • 1





              BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

              – PM 2Ring
              Feb 3 at 8:25






            • 1





              @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

              – PM 2Ring
              Feb 3 at 8:43
















            2














            Here's what I have so far:



            string --> bytes



            mBytes = m.encode("utf-8")


            bytes --> int



            mInt = int.from_bytes(mBytes, byteorder="big")


            int --> bytes



            mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")


            bytes --> string



            m = mBytes.decode("utf-8")


            try it out:



            m = "test123"
            mBytes = m.encode("utf-8")
            mInt = int.from_bytes(mBytes, byteorder="big")
            mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
            m2 = mBytes2.decode("utf-8")
            print(m == m2)




            Here is an identical reusable version of the above:



            class BytesIntEncoder:

            @staticmethod
            def encode(b: bytes) -> int:
            return int.from_bytes(b, byteorder='big')

            @staticmethod
            def decode(i: int) -> bytes:
            return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')


            If you're using Python <3.6, remove the optional type annotations.



            Test:



            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> BytesIntEncoder.encode(b)
            23755444588720691
            >>> BytesIntEncoder.decode(_)
            b'Test123'
            >>> _.decode()
            'Test123'





            share|improve this answer





















            • 1





              This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

              – PM 2Ring
              Feb 3 at 8:23






            • 1





              BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

              – PM 2Ring
              Feb 3 at 8:25






            • 1





              @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

              – PM 2Ring
              Feb 3 at 8:43














            2












            2








            2







            Here's what I have so far:



            string --> bytes



            mBytes = m.encode("utf-8")


            bytes --> int



            mInt = int.from_bytes(mBytes, byteorder="big")


            int --> bytes



            mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")


            bytes --> string



            m = mBytes.decode("utf-8")


            try it out:



            m = "test123"
            mBytes = m.encode("utf-8")
            mInt = int.from_bytes(mBytes, byteorder="big")
            mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
            m2 = mBytes2.decode("utf-8")
            print(m == m2)




            Here is an identical reusable version of the above:



            class BytesIntEncoder:

            @staticmethod
            def encode(b: bytes) -> int:
            return int.from_bytes(b, byteorder='big')

            @staticmethod
            def decode(i: int) -> bytes:
            return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')


            If you're using Python <3.6, remove the optional type annotations.



            Test:



            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> BytesIntEncoder.encode(b)
            23755444588720691
            >>> BytesIntEncoder.decode(_)
            b'Test123'
            >>> _.decode()
            'Test123'





            share|improve this answer















            Here's what I have so far:



            string --> bytes



            mBytes = m.encode("utf-8")


            bytes --> int



            mInt = int.from_bytes(mBytes, byteorder="big")


            int --> bytes



            mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")


            bytes --> string



            m = mBytes.decode("utf-8")


            try it out:



            m = "test123"
            mBytes = m.encode("utf-8")
            mInt = int.from_bytes(mBytes, byteorder="big")
            mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")
            m2 = mBytes2.decode("utf-8")
            print(m == m2)




            Here is an identical reusable version of the above:



            class BytesIntEncoder:

            @staticmethod
            def encode(b: bytes) -> int:
            return int.from_bytes(b, byteorder='big')

            @staticmethod
            def decode(i: int) -> bytes:
            return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')


            If you're using Python <3.6, remove the optional type annotations.



            Test:



            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> BytesIntEncoder.encode(b)
            23755444588720691
            >>> BytesIntEncoder.decode(_)
            b'Test123'
            >>> _.decode()
            'Test123'






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 3 at 8:48

























            answered Nov 21 '18 at 21:28









            charel-fcharel-f

            339614




            339614








            • 1





              This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

              – PM 2Ring
              Feb 3 at 8:23






            • 1





              BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

              – PM 2Ring
              Feb 3 at 8:25






            • 1





              @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

              – PM 2Ring
              Feb 3 at 8:43














            • 1





              This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

              – PM 2Ring
              Feb 3 at 8:23






            • 1





              BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

              – PM 2Ring
              Feb 3 at 8:25






            • 1





              @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

              – PM 2Ring
              Feb 3 at 8:43








            1




            1





            This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

            – PM 2Ring
            Feb 3 at 8:23





            This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

            – PM 2Ring
            Feb 3 at 8:23




            1




            1





            BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

            – PM 2Ring
            Feb 3 at 8:25





            BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

            – PM 2Ring
            Feb 3 at 8:25




            1




            1





            @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

            – PM 2Ring
            Feb 3 at 8:43





            @A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

            – PM 2Ring
            Feb 3 at 8:43













            1














            Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.



            This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.



            This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.



            Unit tests are also included.



            import string


            class BytesIntEncoder:

            def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
            num_chars = len(chars)
            translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
            self._translation_table = bytes.maketrans(chars, translation)
            self._reverse_translation_table = bytes.maketrans(translation, chars)
            self._num_bits_per_char = (num_chars + 1).bit_length()

            def encode(self, chars: bytes) -> int:
            num_bits_per_char = self._num_bits_per_char
            output, bit_idx = 0, 0
            for chr_idx in chars.translate(self._translation_table):
            output |= (chr_idx << bit_idx)
            bit_idx += num_bits_per_char
            return output

            def decode(self, i: int) -> bytes:
            maxint = (2 ** self._num_bits_per_char) - 1
            output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
            return output.translate(self._reverse_translation_table)


            # Test
            import itertools
            import random
            import unittest


            class TestBytesIntEncoder(unittest.TestCase):

            chars = string.ascii_letters + string.digits
            encoder = BytesIntEncoder(chars.encode())

            def _test_encoding(self, b_in: bytes):
            i = self.encoder.encode(b_in)
            self.assertIsInstance(i, int)
            b_out = self.encoder.decode(i)
            self.assertIsInstance(b_out, bytes)
            self.assertEqual(b_in, b_out)
            # print(b_in, i)

            def test_thoroughly_with_small_str(self):
            for s_len in range(4):
            for s in itertools.combinations_with_replacement(self.chars, s_len):
            s = ''.join(s)
            b_in = s.encode()
            self._test_encoding(b_in)

            def test_randomly_with_large_str(self):
            for s_len in range(256):
            num_samples = {s_len <= 16: 2 ** s_len,
            16 < s_len <= 32: s_len ** 2,
            s_len > 32: s_len * 2,
            s_len > 64: s_len,
            s_len > 128: 2}[True]
            # print(s_len, num_samples)
            for _ in range(num_samples):
            b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
            self._test_encoding(b_in)


            if __name__ == '__main__':
            unittest.main()


            Usage example:



            >>> encoder = BytesIntEncoder()
            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> encoder.encode(b)
            3908257788270
            >>> encoder.decode(_)
            b'Test123'





            share|improve this answer





















            • 1





              Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

              – charel-f
              Feb 3 at 8:31


















            1














            Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.



            This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.



            This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.



            Unit tests are also included.



            import string


            class BytesIntEncoder:

            def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
            num_chars = len(chars)
            translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
            self._translation_table = bytes.maketrans(chars, translation)
            self._reverse_translation_table = bytes.maketrans(translation, chars)
            self._num_bits_per_char = (num_chars + 1).bit_length()

            def encode(self, chars: bytes) -> int:
            num_bits_per_char = self._num_bits_per_char
            output, bit_idx = 0, 0
            for chr_idx in chars.translate(self._translation_table):
            output |= (chr_idx << bit_idx)
            bit_idx += num_bits_per_char
            return output

            def decode(self, i: int) -> bytes:
            maxint = (2 ** self._num_bits_per_char) - 1
            output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
            return output.translate(self._reverse_translation_table)


            # Test
            import itertools
            import random
            import unittest


            class TestBytesIntEncoder(unittest.TestCase):

            chars = string.ascii_letters + string.digits
            encoder = BytesIntEncoder(chars.encode())

            def _test_encoding(self, b_in: bytes):
            i = self.encoder.encode(b_in)
            self.assertIsInstance(i, int)
            b_out = self.encoder.decode(i)
            self.assertIsInstance(b_out, bytes)
            self.assertEqual(b_in, b_out)
            # print(b_in, i)

            def test_thoroughly_with_small_str(self):
            for s_len in range(4):
            for s in itertools.combinations_with_replacement(self.chars, s_len):
            s = ''.join(s)
            b_in = s.encode()
            self._test_encoding(b_in)

            def test_randomly_with_large_str(self):
            for s_len in range(256):
            num_samples = {s_len <= 16: 2 ** s_len,
            16 < s_len <= 32: s_len ** 2,
            s_len > 32: s_len * 2,
            s_len > 64: s_len,
            s_len > 128: 2}[True]
            # print(s_len, num_samples)
            for _ in range(num_samples):
            b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
            self._test_encoding(b_in)


            if __name__ == '__main__':
            unittest.main()


            Usage example:



            >>> encoder = BytesIntEncoder()
            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> encoder.encode(b)
            3908257788270
            >>> encoder.decode(_)
            b'Test123'





            share|improve this answer





















            • 1





              Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

              – charel-f
              Feb 3 at 8:31
















            1












            1








            1







            Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.



            This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.



            This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.



            Unit tests are also included.



            import string


            class BytesIntEncoder:

            def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
            num_chars = len(chars)
            translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
            self._translation_table = bytes.maketrans(chars, translation)
            self._reverse_translation_table = bytes.maketrans(translation, chars)
            self._num_bits_per_char = (num_chars + 1).bit_length()

            def encode(self, chars: bytes) -> int:
            num_bits_per_char = self._num_bits_per_char
            output, bit_idx = 0, 0
            for chr_idx in chars.translate(self._translation_table):
            output |= (chr_idx << bit_idx)
            bit_idx += num_bits_per_char
            return output

            def decode(self, i: int) -> bytes:
            maxint = (2 ** self._num_bits_per_char) - 1
            output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
            return output.translate(self._reverse_translation_table)


            # Test
            import itertools
            import random
            import unittest


            class TestBytesIntEncoder(unittest.TestCase):

            chars = string.ascii_letters + string.digits
            encoder = BytesIntEncoder(chars.encode())

            def _test_encoding(self, b_in: bytes):
            i = self.encoder.encode(b_in)
            self.assertIsInstance(i, int)
            b_out = self.encoder.decode(i)
            self.assertIsInstance(b_out, bytes)
            self.assertEqual(b_in, b_out)
            # print(b_in, i)

            def test_thoroughly_with_small_str(self):
            for s_len in range(4):
            for s in itertools.combinations_with_replacement(self.chars, s_len):
            s = ''.join(s)
            b_in = s.encode()
            self._test_encoding(b_in)

            def test_randomly_with_large_str(self):
            for s_len in range(256):
            num_samples = {s_len <= 16: 2 ** s_len,
            16 < s_len <= 32: s_len ** 2,
            s_len > 32: s_len * 2,
            s_len > 64: s_len,
            s_len > 128: 2}[True]
            # print(s_len, num_samples)
            for _ in range(num_samples):
            b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
            self._test_encoding(b_in)


            if __name__ == '__main__':
            unittest.main()


            Usage example:



            >>> encoder = BytesIntEncoder()
            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> encoder.encode(b)
            3908257788270
            >>> encoder.decode(_)
            b'Test123'





            share|improve this answer















            Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.



            This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.



            This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.



            Unit tests are also included.



            import string


            class BytesIntEncoder:

            def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):
            num_chars = len(chars)
            translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()
            self._translation_table = bytes.maketrans(chars, translation)
            self._reverse_translation_table = bytes.maketrans(translation, chars)
            self._num_bits_per_char = (num_chars + 1).bit_length()

            def encode(self, chars: bytes) -> int:
            num_bits_per_char = self._num_bits_per_char
            output, bit_idx = 0, 0
            for chr_idx in chars.translate(self._translation_table):
            output |= (chr_idx << bit_idx)
            bit_idx += num_bits_per_char
            return output

            def decode(self, i: int) -> bytes:
            maxint = (2 ** self._num_bits_per_char) - 1
            output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))
            return output.translate(self._reverse_translation_table)


            # Test
            import itertools
            import random
            import unittest


            class TestBytesIntEncoder(unittest.TestCase):

            chars = string.ascii_letters + string.digits
            encoder = BytesIntEncoder(chars.encode())

            def _test_encoding(self, b_in: bytes):
            i = self.encoder.encode(b_in)
            self.assertIsInstance(i, int)
            b_out = self.encoder.decode(i)
            self.assertIsInstance(b_out, bytes)
            self.assertEqual(b_in, b_out)
            # print(b_in, i)

            def test_thoroughly_with_small_str(self):
            for s_len in range(4):
            for s in itertools.combinations_with_replacement(self.chars, s_len):
            s = ''.join(s)
            b_in = s.encode()
            self._test_encoding(b_in)

            def test_randomly_with_large_str(self):
            for s_len in range(256):
            num_samples = {s_len <= 16: 2 ** s_len,
            16 < s_len <= 32: s_len ** 2,
            s_len > 32: s_len * 2,
            s_len > 64: s_len,
            s_len > 128: 2}[True]
            # print(s_len, num_samples)
            for _ in range(num_samples):
            b_in = ''.join(random.choices(self.chars, k=s_len)).encode()
            self._test_encoding(b_in)


            if __name__ == '__main__':
            unittest.main()


            Usage example:



            >>> encoder = BytesIntEncoder()
            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> encoder.encode(b)
            3908257788270
            >>> encoder.decode(_)
            b'Test123'






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 3 at 16:35

























            answered Feb 3 at 7:48









            A-B-BA-B-B

            24.4k66470




            24.4k66470








            • 1





              Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

              – charel-f
              Feb 3 at 8:31
















            • 1





              Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

              – charel-f
              Feb 3 at 8:31










            1




            1





            Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

            – charel-f
            Feb 3 at 8:31







            Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

            – charel-f
            Feb 3 at 8:31













            1














            Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.



            This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.



            Credit: this answer.



            from binascii import hexlify, unhexlify

            class BytesIntEncoder:

            @staticmethod
            def encode(b: bytes) -> int:
            return int(hexlify(b), 16) if b != b'' else 0

            @staticmethod
            def decode(i: int) -> int:
            return unhexlify('%x' % i) if i != 0 else b''


            If you're using Python <3.6, remove the optional type annotations.



            Quick test:



            >>> s = 'Test123'
            >>> b = s.encode()
            >>> b
            b'Test123'

            >>> BytesIntEncoder.encode(b)
            23755444588720691
            >>> BytesIntEncoder.decode(_)
            b'Test123'
            >>> _.decode()
            'Test123'





            share|improve this answer






























              1














              Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.



              This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.



              Credit: this answer.



              from binascii import hexlify, unhexlify

              class BytesIntEncoder:

              @staticmethod
              def encode(b: bytes) -> int:
              return int(hexlify(b), 16) if b != b'' else 0

              @staticmethod
              def decode(i: int) -> int:
              return unhexlify('%x' % i) if i != 0 else b''


              If you're using Python <3.6, remove the optional type annotations.



              Quick test:



              >>> s = 'Test123'
              >>> b = s.encode()
              >>> b
              b'Test123'

              >>> BytesIntEncoder.encode(b)
              23755444588720691
              >>> BytesIntEncoder.decode(_)
              b'Test123'
              >>> _.decode()
              'Test123'





              share|improve this answer




























                1












                1








                1







                Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.



                This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.



                Credit: this answer.



                from binascii import hexlify, unhexlify

                class BytesIntEncoder:

                @staticmethod
                def encode(b: bytes) -> int:
                return int(hexlify(b), 16) if b != b'' else 0

                @staticmethod
                def decode(i: int) -> int:
                return unhexlify('%x' % i) if i != 0 else b''


                If you're using Python <3.6, remove the optional type annotations.



                Quick test:



                >>> s = 'Test123'
                >>> b = s.encode()
                >>> b
                b'Test123'

                >>> BytesIntEncoder.encode(b)
                23755444588720691
                >>> BytesIntEncoder.decode(_)
                b'Test123'
                >>> _.decode()
                'Test123'





                share|improve this answer















                Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.



                This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.



                Credit: this answer.



                from binascii import hexlify, unhexlify

                class BytesIntEncoder:

                @staticmethod
                def encode(b: bytes) -> int:
                return int(hexlify(b), 16) if b != b'' else 0

                @staticmethod
                def decode(i: int) -> int:
                return unhexlify('%x' % i) if i != 0 else b''


                If you're using Python <3.6, remove the optional type annotations.



                Quick test:



                >>> s = 'Test123'
                >>> b = s.encode()
                >>> b
                b'Test123'

                >>> BytesIntEncoder.encode(b)
                23755444588720691
                >>> BytesIntEncoder.decode(_)
                b'Test123'
                >>> _.decode()
                'Test123'






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 19 at 12:51

























                answered Feb 3 at 7:26









                A-B-BA-B-B

                24.4k66470




                24.4k66470






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420705%2fpython-reversibly-encode-alphanumeric-string-to-integer%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Guess what letter conforming each word

                    Port of Spain

                    Run scheduled task as local user group (not BUILTIN)