Python: Reversibly encode alphanumeric string to integer

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:

string --> int --> string

In other words, I want to represent an alphanumeric string by an integer.

I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.

Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.

This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

add a comment |

I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:

string --> int --> string

In other words, I want to represent an alphanumeric string by an integer.

I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.

Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.

This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

add a comment |

I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:

string --> int --> string

In other words, I want to represent an alphanumeric string by an integer.

I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.

Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.

This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

I want to convert a string (composed of alphanumeric characters) into an integer and then convert this integer back into a string:

string --> int --> string

In other words, I want to represent an alphanumeric string by an integer.

I found a working solution, which I included in the answer, but I do not think it is the best solution, and I am interested in other ideas/methods.

Please don't tag this as duplicate just because a lot of similar questions already exist, I specifically want an easy way of transforming a string into an integer and vice versa.

This should work for strings that contain alphanumeric characters, i.e. strings containing numbers and letters.

python string encoding int

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

edited Feb 2 at 23:15

A-B-B

24.4k66470

edited Feb 2 at 23:15

A-B-B

24.4k66470

edited Feb 2 at 23:15

A-B-B

24.4k66470

asked Nov 21 '18 at 21:28

charel-f

339614

asked Nov 21 '18 at 21:28

charel-f

339614

asked Nov 21 '18 at 21:28

charel-f

339614

add a comment |

3 Answers
3

active

oldest

votes

Here's what I have so far:

string --> bytes

mBytes = m.encode("utf-8")

bytes --> int

mInt = int.from_bytes(mBytes, byteorder="big")

int --> bytes

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

bytes --> string

m = mBytes.decode("utf-8")

try it out:

m = "test123"

mBytes = m.encode("utf-8")

mInt = int.from_bytes(mBytes, byteorder="big")

mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

m2 = mBytes2.decode("utf-8")

print(m == m2)

Here is an identical reusable version of the above:

class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int.from_bytes(b, byteorder='big')



    @staticmethod

    def decode(i: int) -> bytes:

        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

If you're using Python <3.6, remove the optional type annotations.

Test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

1

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

1

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

1

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

add a comment |

Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.

This answer converts the input bytes into a sequence of 6-bit integers. It encodes these small integers into one large integer using bitwise operations. Whether this actually translates into real-world storage efficiency is measured by sys.getsizeof, and is more likely for larger strings.

This implementation customizes the encoding for the choice of character set. If for example you were working with just string.ascii_lowercase (5 bits) rather than string.ascii_uppercase + string.digits (6 bits), the encoding would be correspondingly efficient.

Unit tests are also included.

import string





class BytesIntEncoder:



    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):

        num_chars = len(chars)

        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()

        self._translation_table = bytes.maketrans(chars, translation)

        self._reverse_translation_table = bytes.maketrans(translation, chars)

        self._num_bits_per_char = (num_chars + 1).bit_length()



    def encode(self, chars: bytes) -> int:

        num_bits_per_char = self._num_bits_per_char

        output, bit_idx = 0, 0

        for chr_idx in chars.translate(self._translation_table):

            output |= (chr_idx << bit_idx)

            bit_idx += num_bits_per_char

        return output



    def decode(self, i: int) -> bytes:

        maxint = (2 ** self._num_bits_per_char) - 1

        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))

        return output.translate(self._reverse_translation_table)





# Test

import itertools

import random

import unittest





class TestBytesIntEncoder(unittest.TestCase):



    chars = string.ascii_letters + string.digits

    encoder = BytesIntEncoder(chars.encode())



    def _test_encoding(self, b_in: bytes):

        i = self.encoder.encode(b_in)

        self.assertIsInstance(i, int)

        b_out = self.encoder.decode(i)

        self.assertIsInstance(b_out, bytes)

        self.assertEqual(b_in, b_out)

        # print(b_in, i)



    def test_thoroughly_with_small_str(self):

        for s_len in range(4):

            for s in itertools.combinations_with_replacement(self.chars, s_len):

                s = ''.join(s)

                b_in = s.encode()

                self._test_encoding(b_in)



    def test_randomly_with_large_str(self):

        for s_len in range(256):

            num_samples = {s_len <= 16: 2 ** s_len,

                           16 < s_len <= 32: s_len ** 2,

                           s_len > 32: s_len * 2,

                           s_len > 64: s_len,

                           s_len > 128: 2}[True]

            # print(s_len, num_samples)

            for _ in range(num_samples):

                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()

                self._test_encoding(b_in)





if __name__ == '__main__':

    unittest.main()

Usage example:

>>> encoder = BytesIntEncoder()

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> encoder.encode(b)

3908257788270

>>> encoder.decode(_)

b'Test123'

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

1

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

add a comment |

Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.

This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.

Credit: this answer.

from binascii import hexlify, unhexlify



class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int(hexlify(b), 16) if b != b'' else 0



    @staticmethod

    def decode(i: int) -> int:

        return unhexlify('%x' % i) if i != 0 else b''

If you're using Python <3.6, remove the optional type annotations.

Quick test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420705%2fpython-reversibly-encode-alphanumeric-string-to-integer%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Here's what I have so far:

string --> bytes

mBytes = m.encode("utf-8")

bytes --> int

mInt = int.from_bytes(mBytes, byteorder="big")

int --> bytes

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

bytes --> string

m = mBytes.decode("utf-8")

try it out:

m = "test123"

mBytes = m.encode("utf-8")

mInt = int.from_bytes(mBytes, byteorder="big")

mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

m2 = mBytes2.decode("utf-8")

print(m == m2)

Here is an identical reusable version of the above:

class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int.from_bytes(b, byteorder='big')



    @staticmethod

    def decode(i: int) -> bytes:

        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

If you're using Python <3.6, remove the optional type annotations.

Test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

1

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

1

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

1

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

add a comment |

Here's what I have so far:

string --> bytes

mBytes = m.encode("utf-8")

bytes --> int

mInt = int.from_bytes(mBytes, byteorder="big")

int --> bytes

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

bytes --> string

m = mBytes.decode("utf-8")

try it out:

m = "test123"

mBytes = m.encode("utf-8")

mInt = int.from_bytes(mBytes, byteorder="big")

mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

m2 = mBytes2.decode("utf-8")

print(m == m2)

Here is an identical reusable version of the above:

class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int.from_bytes(b, byteorder='big')



    @staticmethod

    def decode(i: int) -> bytes:

        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

If you're using Python <3.6, remove the optional type annotations.

Test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

1

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

1

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

1

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

add a comment |

Here's what I have so far:

string --> bytes

mBytes = m.encode("utf-8")

bytes --> int

mInt = int.from_bytes(mBytes, byteorder="big")

int --> bytes

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

bytes --> string

m = mBytes.decode("utf-8")

try it out:

m = "test123"

mBytes = m.encode("utf-8")

mInt = int.from_bytes(mBytes, byteorder="big")

mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

m2 = mBytes2.decode("utf-8")

print(m == m2)

Here is an identical reusable version of the above:

class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int.from_bytes(b, byteorder='big')



    @staticmethod

    def decode(i: int) -> bytes:

        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

If you're using Python <3.6, remove the optional type annotations.

Test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

Here's what I have so far:

string --> bytes

mBytes = m.encode("utf-8")

bytes --> int

mInt = int.from_bytes(mBytes, byteorder="big")

int --> bytes

mBytes = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

bytes --> string

m = mBytes.decode("utf-8")

try it out:

m = "test123"

mBytes = m.encode("utf-8")

mInt = int.from_bytes(mBytes, byteorder="big")

mBytes2 = mInt.to_bytes(((mInt.bit_length() + 7) // 8), byteorder="big")

m2 = mBytes2.decode("utf-8")

print(m == m2)

Here is an identical reusable version of the above:

class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int.from_bytes(b, byteorder='big')



    @staticmethod

    def decode(i: int) -> bytes:

        return i.to_bytes(((i.bit_length() + 7) // 8), byteorder='big')

If you're using Python <3.6, remove the optional type annotations.

Test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

edited Feb 3 at 8:48

answered Nov 21 '18 at 21:28

charel-f

339614

answered Nov 21 '18 at 21:28

charel-f

339614

answered Nov 21 '18 at 21:28

charel-f

339614

1

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

1

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

1

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

add a comment |

1

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

1

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

1

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

This is clear &simple. And it's fast, because all the heavy arithmetic is performed by methods that run at C speed.

– PM 2Ring
Feb 3 at 8:23

BTW, you can use negation to perform ceiling division. Eg, -(-n // 8).

– PM 2Ring
Feb 3 at 8:25

@A-B-B :) It's a nice benefit of Python's convention of handling signed operands of // & %. But it is a bit mysterious if you don't know what's going on, so I normally add a brief comment like # Ceiling division when I use it.

– PM 2Ring
Feb 3 at 8:43

add a comment |

Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.

Unit tests are also included.

import string





class BytesIntEncoder:



    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):

        num_chars = len(chars)

        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()

        self._translation_table = bytes.maketrans(chars, translation)

        self._reverse_translation_table = bytes.maketrans(translation, chars)

        self._num_bits_per_char = (num_chars + 1).bit_length()



    def encode(self, chars: bytes) -> int:

        num_bits_per_char = self._num_bits_per_char

        output, bit_idx = 0, 0

        for chr_idx in chars.translate(self._translation_table):

            output |= (chr_idx << bit_idx)

            bit_idx += num_bits_per_char

        return output



    def decode(self, i: int) -> bytes:

        maxint = (2 ** self._num_bits_per_char) - 1

        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))

        return output.translate(self._reverse_translation_table)





# Test

import itertools

import random

import unittest





class TestBytesIntEncoder(unittest.TestCase):



    chars = string.ascii_letters + string.digits

    encoder = BytesIntEncoder(chars.encode())



    def _test_encoding(self, b_in: bytes):

        i = self.encoder.encode(b_in)

        self.assertIsInstance(i, int)

        b_out = self.encoder.decode(i)

        self.assertIsInstance(b_out, bytes)

        self.assertEqual(b_in, b_out)

        # print(b_in, i)



    def test_thoroughly_with_small_str(self):

        for s_len in range(4):

            for s in itertools.combinations_with_replacement(self.chars, s_len):

                s = ''.join(s)

                b_in = s.encode()

                self._test_encoding(b_in)



    def test_randomly_with_large_str(self):

        for s_len in range(256):

            num_samples = {s_len <= 16: 2 ** s_len,

                           16 < s_len <= 32: s_len ** 2,

                           s_len > 32: s_len * 2,

                           s_len > 64: s_len,

                           s_len > 128: 2}[True]

            # print(s_len, num_samples)

            for _ in range(num_samples):

                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()

                self._test_encoding(b_in)





if __name__ == '__main__':

    unittest.main()

Usage example:

>>> encoder = BytesIntEncoder()

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> encoder.encode(b)

3908257788270

>>> encoder.decode(_)

b'Test123'

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

1

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

add a comment |

Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.

Unit tests are also included.

import string





class BytesIntEncoder:



    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):

        num_chars = len(chars)

        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()

        self._translation_table = bytes.maketrans(chars, translation)

        self._reverse_translation_table = bytes.maketrans(translation, chars)

        self._num_bits_per_char = (num_chars + 1).bit_length()



    def encode(self, chars: bytes) -> int:

        num_bits_per_char = self._num_bits_per_char

        output, bit_idx = 0, 0

        for chr_idx in chars.translate(self._translation_table):

            output |= (chr_idx << bit_idx)

            bit_idx += num_bits_per_char

        return output



    def decode(self, i: int) -> bytes:

        maxint = (2 ** self._num_bits_per_char) - 1

        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))

        return output.translate(self._reverse_translation_table)





# Test

import itertools

import random

import unittest





class TestBytesIntEncoder(unittest.TestCase):



    chars = string.ascii_letters + string.digits

    encoder = BytesIntEncoder(chars.encode())



    def _test_encoding(self, b_in: bytes):

        i = self.encoder.encode(b_in)

        self.assertIsInstance(i, int)

        b_out = self.encoder.decode(i)

        self.assertIsInstance(b_out, bytes)

        self.assertEqual(b_in, b_out)

        # print(b_in, i)



    def test_thoroughly_with_small_str(self):

        for s_len in range(4):

            for s in itertools.combinations_with_replacement(self.chars, s_len):

                s = ''.join(s)

                b_in = s.encode()

                self._test_encoding(b_in)



    def test_randomly_with_large_str(self):

        for s_len in range(256):

            num_samples = {s_len <= 16: 2 ** s_len,

                           16 < s_len <= 32: s_len ** 2,

                           s_len > 32: s_len * 2,

                           s_len > 64: s_len,

                           s_len > 128: 2}[True]

            # print(s_len, num_samples)

            for _ in range(num_samples):

                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()

                self._test_encoding(b_in)





if __name__ == '__main__':

    unittest.main()

Usage example:

>>> encoder = BytesIntEncoder()

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> encoder.encode(b)

3908257788270

>>> encoder.decode(_)

b'Test123'

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

1

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

add a comment |

Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.

Unit tests are also included.

import string





class BytesIntEncoder:



    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):

        num_chars = len(chars)

        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()

        self._translation_table = bytes.maketrans(chars, translation)

        self._reverse_translation_table = bytes.maketrans(translation, chars)

        self._num_bits_per_char = (num_chars + 1).bit_length()



    def encode(self, chars: bytes) -> int:

        num_bits_per_char = self._num_bits_per_char

        output, bit_idx = 0, 0

        for chr_idx in chars.translate(self._translation_table):

            output |= (chr_idx << bit_idx)

            bit_idx += num_bits_per_char

        return output



    def decode(self, i: int) -> bytes:

        maxint = (2 ** self._num_bits_per_char) - 1

        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))

        return output.translate(self._reverse_translation_table)





# Test

import itertools

import random

import unittest





class TestBytesIntEncoder(unittest.TestCase):



    chars = string.ascii_letters + string.digits

    encoder = BytesIntEncoder(chars.encode())



    def _test_encoding(self, b_in: bytes):

        i = self.encoder.encode(b_in)

        self.assertIsInstance(i, int)

        b_out = self.encoder.decode(i)

        self.assertIsInstance(b_out, bytes)

        self.assertEqual(b_in, b_out)

        # print(b_in, i)



    def test_thoroughly_with_small_str(self):

        for s_len in range(4):

            for s in itertools.combinations_with_replacement(self.chars, s_len):

                s = ''.join(s)

                b_in = s.encode()

                self._test_encoding(b_in)



    def test_randomly_with_large_str(self):

        for s_len in range(256):

            num_samples = {s_len <= 16: 2 ** s_len,

                           16 < s_len <= 32: s_len ** 2,

                           s_len > 32: s_len * 2,

                           s_len > 64: s_len,

                           s_len > 128: 2}[True]

            # print(s_len, num_samples)

            for _ in range(num_samples):

                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()

                self._test_encoding(b_in)





if __name__ == '__main__':

    unittest.main()

Usage example:

>>> encoder = BytesIntEncoder()

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> encoder.encode(b)

3908257788270

>>> encoder.decode(_)

b'Test123'

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

Assuming the character set is merely alphanumeric, i.e. a-z A-Z 0-9, this requires 6 bits per character. As such, using an 8-bit byte-encoding is theoretically an inefficient use of memory.

Unit tests are also included.

import string





class BytesIntEncoder:



    def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):

        num_chars = len(chars)

        translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()

        self._translation_table = bytes.maketrans(chars, translation)

        self._reverse_translation_table = bytes.maketrans(translation, chars)

        self._num_bits_per_char = (num_chars + 1).bit_length()



    def encode(self, chars: bytes) -> int:

        num_bits_per_char = self._num_bits_per_char

        output, bit_idx = 0, 0

        for chr_idx in chars.translate(self._translation_table):

            output |= (chr_idx << bit_idx)

            bit_idx += num_bits_per_char

        return output



    def decode(self, i: int) -> bytes:

        maxint = (2 ** self._num_bits_per_char) - 1

        output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))

        return output.translate(self._reverse_translation_table)





# Test

import itertools

import random

import unittest





class TestBytesIntEncoder(unittest.TestCase):



    chars = string.ascii_letters + string.digits

    encoder = BytesIntEncoder(chars.encode())



    def _test_encoding(self, b_in: bytes):

        i = self.encoder.encode(b_in)

        self.assertIsInstance(i, int)

        b_out = self.encoder.decode(i)

        self.assertIsInstance(b_out, bytes)

        self.assertEqual(b_in, b_out)

        # print(b_in, i)



    def test_thoroughly_with_small_str(self):

        for s_len in range(4):

            for s in itertools.combinations_with_replacement(self.chars, s_len):

                s = ''.join(s)

                b_in = s.encode()

                self._test_encoding(b_in)



    def test_randomly_with_large_str(self):

        for s_len in range(256):

            num_samples = {s_len <= 16: 2 ** s_len,

                           16 < s_len <= 32: s_len ** 2,

                           s_len > 32: s_len * 2,

                           s_len > 64: s_len,

                           s_len > 128: 2}[True]

            # print(s_len, num_samples)

            for _ in range(num_samples):

                b_in = ''.join(random.choices(self.chars, k=s_len)).encode()

                self._test_encoding(b_in)





if __name__ == '__main__':

    unittest.main()

Usage example:

>>> encoder = BytesIntEncoder()

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> encoder.encode(b)

3908257788270

>>> encoder.decode(_)

b'Test123'

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

edited Feb 3 at 16:35

answered Feb 3 at 7:48

A-B-B

24.4k66470

answered Feb 3 at 7:48

A-B-B

24.4k66470

answered Feb 3 at 7:48

A-B-B

24.4k66470

1

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

add a comment |

1

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

Thank you very much for the answer and for the time you put into this. Have a nice day, and I hope someone can benefit from one of your answers!

– charel-f
Feb 3 at 8:31

add a comment |

Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.

This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.

Credit: this answer.

from binascii import hexlify, unhexlify



class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int(hexlify(b), 16) if b != b'' else 0



    @staticmethod

    def decode(i: int) -> int:

        return unhexlify('%x' % i) if i != 0 else b''

If you're using Python <3.6, remove the optional type annotations.

Quick test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

add a comment |

Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.

This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.

Credit: this answer.

from binascii import hexlify, unhexlify



class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int(hexlify(b), 16) if b != b'' else 0



    @staticmethod

    def decode(i: int) -> int:

        return unhexlify('%x' % i) if i != 0 else b''

If you're using Python <3.6, remove the optional type annotations.

Quick test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

add a comment |

Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.

This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.

Credit: this answer.

from binascii import hexlify, unhexlify



class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int(hexlify(b), 16) if b != b'' else 0



    @staticmethod

    def decode(i: int) -> int:

        return unhexlify('%x' % i) if i != 0 else b''

If you're using Python <3.6, remove the optional type annotations.

Quick test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

Recall that a string can be encoded to bytes, which can then be encoded to an integer. The encodings can then be reversed to get the bytes followed by the original string.

This encoder uses binascii to produce an identical integer encoding to the one in the answer by charel-f. I know it's identical because I extensively tested it.

Credit: this answer.

from binascii import hexlify, unhexlify



class BytesIntEncoder:



    @staticmethod

    def encode(b: bytes) -> int:

        return int(hexlify(b), 16) if b != b'' else 0



    @staticmethod

    def decode(i: int) -> int:

        return unhexlify('%x' % i) if i != 0 else b''

If you're using Python <3.6, remove the optional type annotations.

Quick test:

>>> s = 'Test123'

>>> b = s.encode()

>>> b

b'Test123'



>>> BytesIntEncoder.encode(b)

23755444588720691

>>> BytesIntEncoder.decode(_)

b'Test123'

>>> _.decode()

'Test123'

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

edited Mar 19 at 12:51

answered Feb 3 at 7:26

A-B-B

24.4k66470

answered Feb 3 at 7:26

A-B-B

24.4k66470

answered Feb 3 at 7:26

A-B-B

24.4k66470

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk