Random alphanumeric password generator with GOTOs
up vote
5
down vote
favorite
I was inspired after this thread on Stack Overflow to create a random 8 character alphanumeric password generator.
Sadly, it is closed, so I cannot provide an answer there. Anyway, here is the code. Let me know if there are any bugs/bias going on.
using System.Collections.Generic;
using System.Security.Cryptography;
public class AlphanumericGen
{
public virtual string Generate(byte length)
{
ICollection<char> chars = new LinkedList<char>();
var buffer = new byte[1];
short counter = 0;
using (var rng = new RNGCryptoServiceProvider())
{
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (nextChar >= '0' && nextChar <= '9')
{
goto addChar;
}
if (nextChar >= 'A' && nextChar <= 'Z')
{
goto addChar;
}
if (nextChar >= 'a' && nextChar <= 'z')
{
goto addChar;
}
continue;
addChar:
chars.Add(nextChar);
++counter;
}
}
return string.Concat(chars);
}
}
c# random
|
show 3 more comments
up vote
5
down vote
favorite
I was inspired after this thread on Stack Overflow to create a random 8 character alphanumeric password generator.
Sadly, it is closed, so I cannot provide an answer there. Anyway, here is the code. Let me know if there are any bugs/bias going on.
using System.Collections.Generic;
using System.Security.Cryptography;
public class AlphanumericGen
{
public virtual string Generate(byte length)
{
ICollection<char> chars = new LinkedList<char>();
var buffer = new byte[1];
short counter = 0;
using (var rng = new RNGCryptoServiceProvider())
{
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (nextChar >= '0' && nextChar <= '9')
{
goto addChar;
}
if (nextChar >= 'A' && nextChar <= 'Z')
{
goto addChar;
}
if (nextChar >= 'a' && nextChar <= 'z')
{
goto addChar;
}
continue;
addChar:
chars.Add(nextChar);
++counter;
}
}
return string.Concat(chars);
}
}
c# random
1
Why aren't you usingChar.IsDigit()
andChar.IsLetter()
instead of your own range tests?
– Barmar
Nov 8 at 3:57
@Barmar: In this caseChar.IsDigit()
andChar.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.
– Jack Aidley
Nov 8 at 12:08
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just useStringBuilder
to storechar
s and put above code in awhile (stringBuffer.length < length)
(changeTake()
to actually take only required number of characters...)
– Adriano Repetti
Nov 8 at 12:11
1
@Shelby - Of course they include Unicode characters (0
, for example). In this case, we seem to be selecting from the Latin-1 set, given thatbuffer
contains abyte
. We could mask that down to ASCII using& 0x7F
(and increase the number of random values we actually use, rather than discard).
– Toby Speight
Nov 8 at 19:48
4
Moderator note: Not everyone has to mention how badgoto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how badgoto
is.
– Simon Forsberg♦
2 days ago
|
show 3 more comments
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I was inspired after this thread on Stack Overflow to create a random 8 character alphanumeric password generator.
Sadly, it is closed, so I cannot provide an answer there. Anyway, here is the code. Let me know if there are any bugs/bias going on.
using System.Collections.Generic;
using System.Security.Cryptography;
public class AlphanumericGen
{
public virtual string Generate(byte length)
{
ICollection<char> chars = new LinkedList<char>();
var buffer = new byte[1];
short counter = 0;
using (var rng = new RNGCryptoServiceProvider())
{
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (nextChar >= '0' && nextChar <= '9')
{
goto addChar;
}
if (nextChar >= 'A' && nextChar <= 'Z')
{
goto addChar;
}
if (nextChar >= 'a' && nextChar <= 'z')
{
goto addChar;
}
continue;
addChar:
chars.Add(nextChar);
++counter;
}
}
return string.Concat(chars);
}
}
c# random
I was inspired after this thread on Stack Overflow to create a random 8 character alphanumeric password generator.
Sadly, it is closed, so I cannot provide an answer there. Anyway, here is the code. Let me know if there are any bugs/bias going on.
using System.Collections.Generic;
using System.Security.Cryptography;
public class AlphanumericGen
{
public virtual string Generate(byte length)
{
ICollection<char> chars = new LinkedList<char>();
var buffer = new byte[1];
short counter = 0;
using (var rng = new RNGCryptoServiceProvider())
{
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (nextChar >= '0' && nextChar <= '9')
{
goto addChar;
}
if (nextChar >= 'A' && nextChar <= 'Z')
{
goto addChar;
}
if (nextChar >= 'a' && nextChar <= 'z')
{
goto addChar;
}
continue;
addChar:
chars.Add(nextChar);
++counter;
}
}
return string.Concat(chars);
}
}
c# random
c# random
edited Nov 7 at 20:23
t3chb0t
33.2k744106
33.2k744106
asked Nov 7 at 20:08
Sam Pearson
277210
277210
1
Why aren't you usingChar.IsDigit()
andChar.IsLetter()
instead of your own range tests?
– Barmar
Nov 8 at 3:57
@Barmar: In this caseChar.IsDigit()
andChar.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.
– Jack Aidley
Nov 8 at 12:08
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just useStringBuilder
to storechar
s and put above code in awhile (stringBuffer.length < length)
(changeTake()
to actually take only required number of characters...)
– Adriano Repetti
Nov 8 at 12:11
1
@Shelby - Of course they include Unicode characters (0
, for example). In this case, we seem to be selecting from the Latin-1 set, given thatbuffer
contains abyte
. We could mask that down to ASCII using& 0x7F
(and increase the number of random values we actually use, rather than discard).
– Toby Speight
Nov 8 at 19:48
4
Moderator note: Not everyone has to mention how badgoto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how badgoto
is.
– Simon Forsberg♦
2 days ago
|
show 3 more comments
1
Why aren't you usingChar.IsDigit()
andChar.IsLetter()
instead of your own range tests?
– Barmar
Nov 8 at 3:57
@Barmar: In this caseChar.IsDigit()
andChar.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.
– Jack Aidley
Nov 8 at 12:08
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just useStringBuilder
to storechar
s and put above code in awhile (stringBuffer.length < length)
(changeTake()
to actually take only required number of characters...)
– Adriano Repetti
Nov 8 at 12:11
1
@Shelby - Of course they include Unicode characters (0
, for example). In this case, we seem to be selecting from the Latin-1 set, given thatbuffer
contains abyte
. We could mask that down to ASCII using& 0x7F
(and increase the number of random values we actually use, rather than discard).
– Toby Speight
Nov 8 at 19:48
4
Moderator note: Not everyone has to mention how badgoto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how badgoto
is.
– Simon Forsberg♦
2 days ago
1
1
Why aren't you using
Char.IsDigit()
and Char.IsLetter()
instead of your own range tests?– Barmar
Nov 8 at 3:57
Why aren't you using
Char.IsDigit()
and Char.IsLetter()
instead of your own range tests?– Barmar
Nov 8 at 3:57
@Barmar: In this case
Char.IsDigit()
and Char.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.– Jack Aidley
Nov 8 at 12:08
@Barmar: In this case
Char.IsDigit()
and Char.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.– Jack Aidley
Nov 8 at 12:08
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just use
StringBuilder
to store char
s and put above code in a while (stringBuffer.length < length)
(change Take()
to actually take only required number of characters...)– Adriano Repetti
Nov 8 at 12:11
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just use
StringBuilder
to store char
s and put above code in a while (stringBuffer.length < length)
(change Take()
to actually take only required number of characters...)– Adriano Repetti
Nov 8 at 12:11
1
1
@Shelby - Of course they include Unicode characters (
0
, for example). In this case, we seem to be selecting from the Latin-1 set, given that buffer
contains a byte
. We could mask that down to ASCII using & 0x7F
(and increase the number of random values we actually use, rather than discard).– Toby Speight
Nov 8 at 19:48
@Shelby - Of course they include Unicode characters (
0
, for example). In this case, we seem to be selecting from the Latin-1 set, given that buffer
contains a byte
. We could mask that down to ASCII using & 0x7F
(and increase the number of random values we actually use, rather than discard).– Toby Speight
Nov 8 at 19:48
4
4
Moderator note: Not everyone has to mention how bad
goto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how bad goto
is.– Simon Forsberg♦
2 days ago
Moderator note: Not everyone has to mention how bad
goto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how bad goto
is.– Simon Forsberg♦
2 days ago
|
show 3 more comments
10 Answers
10
active
oldest
votes
up vote
54
down vote
As a general rule of thumb, any time you feel the need to use goto
, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.
In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.
I think too a StringBuilder
would do a lot better for storing the characters than a LinkedList
.
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't supportbreak n
syntax construct
– Agnius Vasiliauskas
yesterday
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
|
show 1 more comment
up vote
22
down vote
Data structure
Why are you using a LinkedList<char>
? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char
array?
Actually, you may want to consider returning the result as a char
rather than as a string
. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.
Flow control
goto
should be avoided, and its use is not justified here.
The loop is a bit clumsy, referring to counter
all over the place.
char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())
{
for (int counter = 0; counter < length; )
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if ((nextChar >= '0' && nextChar <= '9') ||
(nextChar >= 'A' && nextChar <= 'Z') ||
(nextChar >= 'a' && nextChar <= 'z'))
{
chars[counter++] = nextChar;
}
}
return new string(chars);
}
Algorithm
The algorithm is rather inefficient:
- It reads a byte at a time from the random number generator.
.GetBytes()
is thread safe, so each call should incur some synchronization overhead. - It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.
A smarter approach would be to fetch a bit more than ${3 over 4} mathrm{length}$ random bytes, base64-encode it, and discard just the resulting +
and /
characters.
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than achar
for this application.
– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
@t3chb0t That makes no sense. Both versions should callGetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment outrng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated byrng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.
– 200_success
Nov 7 at 21:43
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
|
show 6 more comments
up vote
12
down vote
Readability & GoTo
If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto
causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';
if (isNumber || isUppercaseLetter || isLowercaseLetter)
{
chars.Add(nextChar);
++counter;
}
Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.
public bool IsAlphaNumericCharacter(char c)
{
var isNumber = c >= '0' && c <= '9';
var isUppercaseLetter = c >= 'A' && c <= 'Z';
var isLowercaseLetter = c >= 'a' && c <= 'z';
return isNumber
|| isUppercaseLetter
|| isLowercaseLetter;
}
Then your loop becomes shorter and clearer.
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (IsAlphaNumericCharacter(nextChar))
{
chars.Add(nextChar);
++counter;
}
}
Bytes
Is there a particular reason you're using byte
for length instead of int
? Also, was there a reason for var buffer = new byte[1];
being an array instead of just var buffer = new byte();
If the answer to both of those questions is no, then you could have a character array of size length
instead of a LinkedList<char>
.
StringBuilder
StringBuilder
could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.
– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
add a comment |
up vote
9
down vote
Avoid premature optimizations, they're really hurting this code.
In the comments you defend the use of goto
using the argument that it avoids branch prediction.
Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if
statements than it would from a conventional if-else
construct, lets look at the context of that optimization to see what it might get you.
As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.
Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.
That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.
The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.
add a comment |
up vote
5
down vote
In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.
It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:
using (var rng = new RNGCryptoServiceProvider())
{
var buffer = new byte[length * 5];
var password = new StringBuilder(length);
while (password.Length < length)
{
rng.GetBytes(buffer);
password.Append(buffer
.Select(x => (char)x)
.Where(Char.IsLetterOrDigit)
.Take(length - password.Length)
.ToArray()
);
}
}
Notes:
- Decision to use
length * 5
is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call toGetBytes()
is not enough. - It performs pretty well (
GetBytes()
overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder
is highly optimized to build strings). - It should be slightly more fast than base64 encoding (which otherwise is a clever idea).
- It's really easy to optimize for the best case scenario adding some more code (for example when
ToArray()
returns the required string you can avoidStringBuilder()
- and its initial allocation - all together). - It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).
- Given that your input is a single byte then you can use
Char.IsLetterOrDigit
directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).
The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.
Are we done? There are some security concerns you should consider, see last section.
As you did imagine goto
is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.
goto
usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto
and if
/else
differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto
(because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.
I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto
for most of your career (with few more chances for its little brother goto case
). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.
Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:
while (chars.Count < length)
{
rng.GetBytes(buffer);
char nextChar = (char)buffer[0];
if (Char.IsLetterOrDigit(nextChar))
chars.Add(nextChar);
}
What did we do? We moved some logic into a separate function, this has two advantages:
- It makes logic easier to follow because we can read the function name instead of understanding the code.
- If frees us from the need of that
goto
because we can reduce those multipleif
s.
What if there wasn't Char.IsLetterOrDigit()
? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto
): counter
is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var
for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.
Security considerations
You're correctly using RNGCryptoServiceProvider
to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.
However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).
I'm pretty sure the outherwhile
could be squeezed into a nice linq chain by usingTakeUntil
;-]
– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced torng.Generate()....Take().ToArray()
!
– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast frombyte
tochar
usingCast<char>()
- you'll have to useSelect(b => (char)b)
. If you moverng.GetBytes(buffer);
inside thewhile
-loop as the first thing you'll have a fresh set ofbytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits
– Henrik Hansen
Nov 8 at 14:25
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
Btw, coincidentally the optimization with getting a larger buffer and calligGetBytes
fewer time beats any other version presented here.
– t3chb0t
Nov 8 at 16:45
|
show 1 more comment
up vote
4
down vote
There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.
I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.
Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator
that is the base class of the RNGCryptoServiceProvider
. It would enumerate only char
s that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of char
s.
public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)
{
var buffer = new byte[bufferSize];
while (true)
{
rng.GetBytes(buffer);
foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
{
yield return c;
}
}
}
By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.
From here, it's now very easy to build your password generator. Create any random-number-generator, Take
as many char
s as you need and create the string.
public static string Generate(int length)
{
using (var rng = new RNGCryptoServiceProvider())
{
return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());
}
}
And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes
(discovered by @Adriano?).
So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!
There is one thing about goto
that hasn't been said yet... and you're using something very similar everyday, it's called switch
:
public static void Main()
{
var num = 1;
switch (num)
{
case 1:
goto case 3;
case 2:
Console.WriteLine("Hi goto!");
break;
case 3:
goto case 2;
}
}
Yes, you can jump within its scope from case
to case
in any order.
add a comment |
up vote
3
down vote
goto, part deux
I don't agree with the mass consciousness that goto is a code smell.
code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.
Why goto could be bad
We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto
tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.
Why goto is bad
goto
murders the bedrock of all good programs - structured programming and modular construction. goto
literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.
As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto
is a code lobotomy. Many factors contribute to crumbling code but the more goto
s the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."
So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful
A very readable, practical, nicely concise "using goto wisely" discussion
These are the best arguments againstgoto
I've ever seen.
– t3chb0t
2 days ago
add a comment |
up vote
1
down vote
I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.
The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte
based random selection be biased because 256 % 62 != 0
. The problem can be solved by either adding two more chars resulting in 64 char:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
or reduce the valid chars to 32:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
There are pros and cons to both obviously, but both can be used i an efficient algorithm like:
public string Generate(int length)
{
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
using (var rng = new RNGCryptoServiceProvider())
{
char result = new char[length];
byte buffer = new byte[length];
int range = chars.Length;
rng.GetBytes(buffer);
for (int i = 0; i < length; i++)
{
result[i] = chars[buffer[i] % range];
}
return new string(result);
}
}
This should be unbiased.
I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.
add a comment |
up vote
1
down vote
Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.
An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.
As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.
New contributor
add a comment |
up vote
-3
down vote
I would use a HashSet
:
HashSet<byte> validChars = new HashSet<byte> { 48, 49, 50 }; //add all valid
........
rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))
{
chars.Add(char(buffer[0]));
++counter;
}
3
Mind elaborating on why you'd use aHashSet
?
– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
@paparazzo: but this aspect of the original code is also O(1). Using aHashSet
here is slightly slower due to extra overhead.
– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
|
show 1 more comment
10 Answers
10
active
oldest
votes
10 Answers
10
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
54
down vote
As a general rule of thumb, any time you feel the need to use goto
, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.
In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.
I think too a StringBuilder
would do a lot better for storing the characters than a LinkedList
.
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't supportbreak n
syntax construct
– Agnius Vasiliauskas
yesterday
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
|
show 1 more comment
up vote
54
down vote
As a general rule of thumb, any time you feel the need to use goto
, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.
In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.
I think too a StringBuilder
would do a lot better for storing the characters than a LinkedList
.
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't supportbreak n
syntax construct
– Agnius Vasiliauskas
yesterday
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
|
show 1 more comment
up vote
54
down vote
up vote
54
down vote
As a general rule of thumb, any time you feel the need to use goto
, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.
In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.
I think too a StringBuilder
would do a lot better for storing the characters than a LinkedList
.
As a general rule of thumb, any time you feel the need to use goto
, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.
In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.
I think too a StringBuilder
would do a lot better for storing the characters than a LinkedList
.
edited Nov 8 at 10:19
Wai Ha Lee
1506
1506
answered Nov 7 at 20:37
tinstaafl
6,290727
6,290727
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't supportbreak n
syntax construct
– Agnius Vasiliauskas
yesterday
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
|
show 1 more comment
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't supportbreak n
syntax construct
– Agnius Vasiliauskas
yesterday
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't support
break n
syntax construct– Agnius Vasiliauskas
yesterday
As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't support
break n
syntax construct– Agnius Vasiliauskas
yesterday
1
1
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
yesterday
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
20 hours ago
|
show 1 more comment
up vote
22
down vote
Data structure
Why are you using a LinkedList<char>
? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char
array?
Actually, you may want to consider returning the result as a char
rather than as a string
. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.
Flow control
goto
should be avoided, and its use is not justified here.
The loop is a bit clumsy, referring to counter
all over the place.
char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())
{
for (int counter = 0; counter < length; )
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if ((nextChar >= '0' && nextChar <= '9') ||
(nextChar >= 'A' && nextChar <= 'Z') ||
(nextChar >= 'a' && nextChar <= 'z'))
{
chars[counter++] = nextChar;
}
}
return new string(chars);
}
Algorithm
The algorithm is rather inefficient:
- It reads a byte at a time from the random number generator.
.GetBytes()
is thread safe, so each call should incur some synchronization overhead. - It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.
A smarter approach would be to fetch a bit more than ${3 over 4} mathrm{length}$ random bytes, base64-encode it, and discard just the resulting +
and /
characters.
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than achar
for this application.
– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
@t3chb0t That makes no sense. Both versions should callGetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment outrng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated byrng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.
– 200_success
Nov 7 at 21:43
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
|
show 6 more comments
up vote
22
down vote
Data structure
Why are you using a LinkedList<char>
? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char
array?
Actually, you may want to consider returning the result as a char
rather than as a string
. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.
Flow control
goto
should be avoided, and its use is not justified here.
The loop is a bit clumsy, referring to counter
all over the place.
char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())
{
for (int counter = 0; counter < length; )
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if ((nextChar >= '0' && nextChar <= '9') ||
(nextChar >= 'A' && nextChar <= 'Z') ||
(nextChar >= 'a' && nextChar <= 'z'))
{
chars[counter++] = nextChar;
}
}
return new string(chars);
}
Algorithm
The algorithm is rather inefficient:
- It reads a byte at a time from the random number generator.
.GetBytes()
is thread safe, so each call should incur some synchronization overhead. - It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.
A smarter approach would be to fetch a bit more than ${3 over 4} mathrm{length}$ random bytes, base64-encode it, and discard just the resulting +
and /
characters.
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than achar
for this application.
– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
@t3chb0t That makes no sense. Both versions should callGetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment outrng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated byrng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.
– 200_success
Nov 7 at 21:43
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
|
show 6 more comments
up vote
22
down vote
up vote
22
down vote
Data structure
Why are you using a LinkedList<char>
? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char
array?
Actually, you may want to consider returning the result as a char
rather than as a string
. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.
Flow control
goto
should be avoided, and its use is not justified here.
The loop is a bit clumsy, referring to counter
all over the place.
char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())
{
for (int counter = 0; counter < length; )
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if ((nextChar >= '0' && nextChar <= '9') ||
(nextChar >= 'A' && nextChar <= 'Z') ||
(nextChar >= 'a' && nextChar <= 'z'))
{
chars[counter++] = nextChar;
}
}
return new string(chars);
}
Algorithm
The algorithm is rather inefficient:
- It reads a byte at a time from the random number generator.
.GetBytes()
is thread safe, so each call should incur some synchronization overhead. - It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.
A smarter approach would be to fetch a bit more than ${3 over 4} mathrm{length}$ random bytes, base64-encode it, and discard just the resulting +
and /
characters.
Data structure
Why are you using a LinkedList<char>
? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char
array?
Actually, you may want to consider returning the result as a char
rather than as a string
. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.
Flow control
goto
should be avoided, and its use is not justified here.
The loop is a bit clumsy, referring to counter
all over the place.
char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())
{
for (int counter = 0; counter < length; )
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if ((nextChar >= '0' && nextChar <= '9') ||
(nextChar >= 'A' && nextChar <= 'Z') ||
(nextChar >= 'a' && nextChar <= 'z'))
{
chars[counter++] = nextChar;
}
}
return new string(chars);
}
Algorithm
The algorithm is rather inefficient:
- It reads a byte at a time from the random number generator.
.GetBytes()
is thread safe, so each call should incur some synchronization overhead. - It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.
A smarter approach would be to fetch a bit more than ${3 over 4} mathrm{length}$ random bytes, base64-encode it, and discard just the resulting +
and /
characters.
edited Nov 7 at 21:03
answered Nov 7 at 20:47
200_success
127k14148410
127k14148410
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than achar
for this application.
– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
@t3chb0t That makes no sense. Both versions should callGetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment outrng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated byrng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.
– 200_success
Nov 7 at 21:43
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
|
show 6 more comments
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than achar
for this application.
– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
@t3chb0t That makes no sense. Both versions should callGetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment outrng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated byrng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.
– 200_success
Nov 7 at 21:43
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast.
– t3chb0t
Nov 7 at 20:49
4
4
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than a
char
for this application.– 200_success
Nov 7 at 20:53
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than a
char
for this application.– 200_success
Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-)
– t3chb0t
Nov 7 at 21:00
3
3
@t3chb0t That makes no sense. Both versions should call
GetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment out rng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated by rng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.– 200_success
Nov 7 at 21:43
@t3chb0t That makes no sense. Both versions should call
GetBytes()
the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment out rng.GetBytes()
and produce the same character every time, mine is much much faster. So, the running time is completely dominated by rng.GetBytes()
, and my Base64 suggestion should make the biggest difference of all.– 200_success
Nov 7 at 21:43
3
3
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong.
– 200_success
Nov 8 at 14:54
|
show 6 more comments
up vote
12
down vote
Readability & GoTo
If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto
causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';
if (isNumber || isUppercaseLetter || isLowercaseLetter)
{
chars.Add(nextChar);
++counter;
}
Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.
public bool IsAlphaNumericCharacter(char c)
{
var isNumber = c >= '0' && c <= '9';
var isUppercaseLetter = c >= 'A' && c <= 'Z';
var isLowercaseLetter = c >= 'a' && c <= 'z';
return isNumber
|| isUppercaseLetter
|| isLowercaseLetter;
}
Then your loop becomes shorter and clearer.
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (IsAlphaNumericCharacter(nextChar))
{
chars.Add(nextChar);
++counter;
}
}
Bytes
Is there a particular reason you're using byte
for length instead of int
? Also, was there a reason for var buffer = new byte[1];
being an array instead of just var buffer = new byte();
If the answer to both of those questions is no, then you could have a character array of size length
instead of a LinkedList<char>
.
StringBuilder
StringBuilder
could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.
– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
add a comment |
up vote
12
down vote
Readability & GoTo
If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto
causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';
if (isNumber || isUppercaseLetter || isLowercaseLetter)
{
chars.Add(nextChar);
++counter;
}
Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.
public bool IsAlphaNumericCharacter(char c)
{
var isNumber = c >= '0' && c <= '9';
var isUppercaseLetter = c >= 'A' && c <= 'Z';
var isLowercaseLetter = c >= 'a' && c <= 'z';
return isNumber
|| isUppercaseLetter
|| isLowercaseLetter;
}
Then your loop becomes shorter and clearer.
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (IsAlphaNumericCharacter(nextChar))
{
chars.Add(nextChar);
++counter;
}
}
Bytes
Is there a particular reason you're using byte
for length instead of int
? Also, was there a reason for var buffer = new byte[1];
being an array instead of just var buffer = new byte();
If the answer to both of those questions is no, then you could have a character array of size length
instead of a LinkedList<char>
.
StringBuilder
StringBuilder
could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.
– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
add a comment |
up vote
12
down vote
up vote
12
down vote
Readability & GoTo
If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto
causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';
if (isNumber || isUppercaseLetter || isLowercaseLetter)
{
chars.Add(nextChar);
++counter;
}
Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.
public bool IsAlphaNumericCharacter(char c)
{
var isNumber = c >= '0' && c <= '9';
var isUppercaseLetter = c >= 'A' && c <= 'Z';
var isLowercaseLetter = c >= 'a' && c <= 'z';
return isNumber
|| isUppercaseLetter
|| isLowercaseLetter;
}
Then your loop becomes shorter and clearer.
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (IsAlphaNumericCharacter(nextChar))
{
chars.Add(nextChar);
++counter;
}
}
Bytes
Is there a particular reason you're using byte
for length instead of int
? Also, was there a reason for var buffer = new byte[1];
being an array instead of just var buffer = new byte();
If the answer to both of those questions is no, then you could have a character array of size length
instead of a LinkedList<char>
.
StringBuilder
StringBuilder
could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).
Readability & GoTo
If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto
causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';
if (isNumber || isUppercaseLetter || isLowercaseLetter)
{
chars.Add(nextChar);
++counter;
}
Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.
public bool IsAlphaNumericCharacter(char c)
{
var isNumber = c >= '0' && c <= '9';
var isUppercaseLetter = c >= 'A' && c <= 'Z';
var isLowercaseLetter = c >= 'a' && c <= 'z';
return isNumber
|| isUppercaseLetter
|| isLowercaseLetter;
}
Then your loop becomes shorter and clearer.
while (counter < length)
{
rng.GetBytes(buffer);
var nextChar = (char)buffer[0];
if (IsAlphaNumericCharacter(nextChar))
{
chars.Add(nextChar);
++counter;
}
}
Bytes
Is there a particular reason you're using byte
for length instead of int
? Also, was there a reason for var buffer = new byte[1];
being an array instead of just var buffer = new byte();
If the answer to both of those questions is no, then you could have a character array of size length
instead of a LinkedList<char>
.
StringBuilder
StringBuilder
could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).
edited 2 days ago
answered Nov 7 at 20:41
Shelby115
1,523517
1,523517
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.
– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
add a comment |
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.
– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
1
1
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course.
– Toby Speight
Nov 7 at 20:53
1
1
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha.
– Shelby115
Nov 8 at 2:18
"
||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.– Nic Hartley
2 days ago
"
||
is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note.– Nic Hartley
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad.
– Shelby115
2 days ago
add a comment |
up vote
9
down vote
Avoid premature optimizations, they're really hurting this code.
In the comments you defend the use of goto
using the argument that it avoids branch prediction.
Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if
statements than it would from a conventional if-else
construct, lets look at the context of that optimization to see what it might get you.
As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.
Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.
That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.
The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.
add a comment |
up vote
9
down vote
Avoid premature optimizations, they're really hurting this code.
In the comments you defend the use of goto
using the argument that it avoids branch prediction.
Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if
statements than it would from a conventional if-else
construct, lets look at the context of that optimization to see what it might get you.
As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.
Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.
That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.
The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.
add a comment |
up vote
9
down vote
up vote
9
down vote
Avoid premature optimizations, they're really hurting this code.
In the comments you defend the use of goto
using the argument that it avoids branch prediction.
Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if
statements than it would from a conventional if-else
construct, lets look at the context of that optimization to see what it might get you.
As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.
Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.
That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.
The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.
Avoid premature optimizations, they're really hurting this code.
In the comments you defend the use of goto
using the argument that it avoids branch prediction.
Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if
statements than it would from a conventional if-else
construct, lets look at the context of that optimization to see what it might get you.
As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.
Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.
That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.
The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.
answered Nov 8 at 3:26
Morgen
63139
63139
add a comment |
add a comment |
up vote
5
down vote
In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.
It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:
using (var rng = new RNGCryptoServiceProvider())
{
var buffer = new byte[length * 5];
var password = new StringBuilder(length);
while (password.Length < length)
{
rng.GetBytes(buffer);
password.Append(buffer
.Select(x => (char)x)
.Where(Char.IsLetterOrDigit)
.Take(length - password.Length)
.ToArray()
);
}
}
Notes:
- Decision to use
length * 5
is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call toGetBytes()
is not enough. - It performs pretty well (
GetBytes()
overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder
is highly optimized to build strings). - It should be slightly more fast than base64 encoding (which otherwise is a clever idea).
- It's really easy to optimize for the best case scenario adding some more code (for example when
ToArray()
returns the required string you can avoidStringBuilder()
- and its initial allocation - all together). - It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).
- Given that your input is a single byte then you can use
Char.IsLetterOrDigit
directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).
The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.
Are we done? There are some security concerns you should consider, see last section.
As you did imagine goto
is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.
goto
usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto
and if
/else
differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto
(because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.
I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto
for most of your career (with few more chances for its little brother goto case
). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.
Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:
while (chars.Count < length)
{
rng.GetBytes(buffer);
char nextChar = (char)buffer[0];
if (Char.IsLetterOrDigit(nextChar))
chars.Add(nextChar);
}
What did we do? We moved some logic into a separate function, this has two advantages:
- It makes logic easier to follow because we can read the function name instead of understanding the code.
- If frees us from the need of that
goto
because we can reduce those multipleif
s.
What if there wasn't Char.IsLetterOrDigit()
? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto
): counter
is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var
for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.
Security considerations
You're correctly using RNGCryptoServiceProvider
to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.
However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).
I'm pretty sure the outherwhile
could be squeezed into a nice linq chain by usingTakeUntil
;-]
– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced torng.Generate()....Take().ToArray()
!
– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast frombyte
tochar
usingCast<char>()
- you'll have to useSelect(b => (char)b)
. If you moverng.GetBytes(buffer);
inside thewhile
-loop as the first thing you'll have a fresh set ofbytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits
– Henrik Hansen
Nov 8 at 14:25
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
Btw, coincidentally the optimization with getting a larger buffer and calligGetBytes
fewer time beats any other version presented here.
– t3chb0t
Nov 8 at 16:45
|
show 1 more comment
up vote
5
down vote
In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.
It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:
using (var rng = new RNGCryptoServiceProvider())
{
var buffer = new byte[length * 5];
var password = new StringBuilder(length);
while (password.Length < length)
{
rng.GetBytes(buffer);
password.Append(buffer
.Select(x => (char)x)
.Where(Char.IsLetterOrDigit)
.Take(length - password.Length)
.ToArray()
);
}
}
Notes:
- Decision to use
length * 5
is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call toGetBytes()
is not enough. - It performs pretty well (
GetBytes()
overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder
is highly optimized to build strings). - It should be slightly more fast than base64 encoding (which otherwise is a clever idea).
- It's really easy to optimize for the best case scenario adding some more code (for example when
ToArray()
returns the required string you can avoidStringBuilder()
- and its initial allocation - all together). - It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).
- Given that your input is a single byte then you can use
Char.IsLetterOrDigit
directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).
The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.
Are we done? There are some security concerns you should consider, see last section.
As you did imagine goto
is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.
goto
usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto
and if
/else
differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto
(because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.
I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto
for most of your career (with few more chances for its little brother goto case
). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.
Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:
while (chars.Count < length)
{
rng.GetBytes(buffer);
char nextChar = (char)buffer[0];
if (Char.IsLetterOrDigit(nextChar))
chars.Add(nextChar);
}
What did we do? We moved some logic into a separate function, this has two advantages:
- It makes logic easier to follow because we can read the function name instead of understanding the code.
- If frees us from the need of that
goto
because we can reduce those multipleif
s.
What if there wasn't Char.IsLetterOrDigit()
? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto
): counter
is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var
for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.
Security considerations
You're correctly using RNGCryptoServiceProvider
to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.
However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).
I'm pretty sure the outherwhile
could be squeezed into a nice linq chain by usingTakeUntil
;-]
– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced torng.Generate()....Take().ToArray()
!
– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast frombyte
tochar
usingCast<char>()
- you'll have to useSelect(b => (char)b)
. If you moverng.GetBytes(buffer);
inside thewhile
-loop as the first thing you'll have a fresh set ofbytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits
– Henrik Hansen
Nov 8 at 14:25
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
Btw, coincidentally the optimization with getting a larger buffer and calligGetBytes
fewer time beats any other version presented here.
– t3chb0t
Nov 8 at 16:45
|
show 1 more comment
up vote
5
down vote
up vote
5
down vote
In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.
It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:
using (var rng = new RNGCryptoServiceProvider())
{
var buffer = new byte[length * 5];
var password = new StringBuilder(length);
while (password.Length < length)
{
rng.GetBytes(buffer);
password.Append(buffer
.Select(x => (char)x)
.Where(Char.IsLetterOrDigit)
.Take(length - password.Length)
.ToArray()
);
}
}
Notes:
- Decision to use
length * 5
is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call toGetBytes()
is not enough. - It performs pretty well (
GetBytes()
overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder
is highly optimized to build strings). - It should be slightly more fast than base64 encoding (which otherwise is a clever idea).
- It's really easy to optimize for the best case scenario adding some more code (for example when
ToArray()
returns the required string you can avoidStringBuilder()
- and its initial allocation - all together). - It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).
- Given that your input is a single byte then you can use
Char.IsLetterOrDigit
directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).
The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.
Are we done? There are some security concerns you should consider, see last section.
As you did imagine goto
is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.
goto
usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto
and if
/else
differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto
(because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.
I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto
for most of your career (with few more chances for its little brother goto case
). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.
Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:
while (chars.Count < length)
{
rng.GetBytes(buffer);
char nextChar = (char)buffer[0];
if (Char.IsLetterOrDigit(nextChar))
chars.Add(nextChar);
}
What did we do? We moved some logic into a separate function, this has two advantages:
- It makes logic easier to follow because we can read the function name instead of understanding the code.
- If frees us from the need of that
goto
because we can reduce those multipleif
s.
What if there wasn't Char.IsLetterOrDigit()
? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto
): counter
is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var
for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.
Security considerations
You're correctly using RNGCryptoServiceProvider
to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.
However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).
In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.
It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:
using (var rng = new RNGCryptoServiceProvider())
{
var buffer = new byte[length * 5];
var password = new StringBuilder(length);
while (password.Length < length)
{
rng.GetBytes(buffer);
password.Append(buffer
.Select(x => (char)x)
.Where(Char.IsLetterOrDigit)
.Take(length - password.Length)
.ToArray()
);
}
}
Notes:
- Decision to use
length * 5
is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call toGetBytes()
is not enough. - It performs pretty well (
GetBytes()
overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder
is highly optimized to build strings). - It should be slightly more fast than base64 encoding (which otherwise is a clever idea).
- It's really easy to optimize for the best case scenario adding some more code (for example when
ToArray()
returns the required string you can avoidStringBuilder()
- and its initial allocation - all together). - It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).
- Given that your input is a single byte then you can use
Char.IsLetterOrDigit
directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).
The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.
Are we done? There are some security concerns you should consider, see last section.
As you did imagine goto
is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.
goto
usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto
and if
/else
differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto
(because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.
I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto
for most of your career (with few more chances for its little brother goto case
). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.
Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:
while (chars.Count < length)
{
rng.GetBytes(buffer);
char nextChar = (char)buffer[0];
if (Char.IsLetterOrDigit(nextChar))
chars.Add(nextChar);
}
What did we do? We moved some logic into a separate function, this has two advantages:
- It makes logic easier to follow because we can read the function name instead of understanding the code.
- If frees us from the need of that
goto
because we can reduce those multipleif
s.
What if there wasn't Char.IsLetterOrDigit()
? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto
): counter
is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var
for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.
Security considerations
You're correctly using RNGCryptoServiceProvider
to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.
However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).
edited 2 days ago
answered Nov 8 at 12:33
Adriano Repetti
9,61511440
9,61511440
I'm pretty sure the outherwhile
could be squeezed into a nice linq chain by usingTakeUntil
;-]
– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced torng.Generate()....Take().ToArray()
!
– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast frombyte
tochar
usingCast<char>()
- you'll have to useSelect(b => (char)b)
. If you moverng.GetBytes(buffer);
inside thewhile
-loop as the first thing you'll have a fresh set ofbytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits
– Henrik Hansen
Nov 8 at 14:25
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
Btw, coincidentally the optimization with getting a larger buffer and calligGetBytes
fewer time beats any other version presented here.
– t3chb0t
Nov 8 at 16:45
|
show 1 more comment
I'm pretty sure the outherwhile
could be squeezed into a nice linq chain by usingTakeUntil
;-]
– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced torng.Generate()....Take().ToArray()
!
– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast frombyte
tochar
usingCast<char>()
- you'll have to useSelect(b => (char)b)
. If you moverng.GetBytes(buffer);
inside thewhile
-loop as the first thing you'll have a fresh set ofbytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits
– Henrik Hansen
Nov 8 at 14:25
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
Btw, coincidentally the optimization with getting a larger buffer and calligGetBytes
fewer time beats any other version presented here.
– t3chb0t
Nov 8 at 16:45
I'm pretty sure the outher
while
could be squeezed into a nice linq chain by using TakeUntil
;-]– t3chb0t
Nov 8 at 12:42
I'm pretty sure the outher
while
could be squeezed into a nice linq chain by using TakeUntil
;-]– t3chb0t
Nov 8 at 12:42
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced to
rng.Generate()....Take().ToArray()
!– Adriano Repetti
Nov 8 at 12:44
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced to
rng.Generate()....Take().ToArray()
!– Adriano Repetti
Nov 8 at 12:44
I'm not sure, you can make that cast from
byte
to char
using Cast<char>()
- you'll have to use Select(b => (char)b)
. If you move rng.GetBytes(buffer);
inside the while
-loop as the first thing you'll have a fresh set of bytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits– Henrik Hansen
Nov 8 at 14:25
I'm not sure, you can make that cast from
byte
to char
using Cast<char>()
- you'll have to use Select(b => (char)b)
. If you move rng.GetBytes(buffer);
inside the while
-loop as the first thing you'll have a fresh set of bytes
for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits– Henrik Hansen
Nov 8 at 14:25
2
2
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why?
– t3chb0t
Nov 8 at 16:28
1
1
Btw, coincidentally the optimization with getting a larger buffer and callig
GetBytes
fewer time beats any other version presented here.– t3chb0t
Nov 8 at 16:45
Btw, coincidentally the optimization with getting a larger buffer and callig
GetBytes
fewer time beats any other version presented here.– t3chb0t
Nov 8 at 16:45
|
show 1 more comment
up vote
4
down vote
There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.
I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.
Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator
that is the base class of the RNGCryptoServiceProvider
. It would enumerate only char
s that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of char
s.
public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)
{
var buffer = new byte[bufferSize];
while (true)
{
rng.GetBytes(buffer);
foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
{
yield return c;
}
}
}
By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.
From here, it's now very easy to build your password generator. Create any random-number-generator, Take
as many char
s as you need and create the string.
public static string Generate(int length)
{
using (var rng = new RNGCryptoServiceProvider())
{
return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());
}
}
And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes
(discovered by @Adriano?).
So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!
There is one thing about goto
that hasn't been said yet... and you're using something very similar everyday, it's called switch
:
public static void Main()
{
var num = 1;
switch (num)
{
case 1:
goto case 3;
case 2:
Console.WriteLine("Hi goto!");
break;
case 3:
goto case 2;
}
}
Yes, you can jump within its scope from case
to case
in any order.
add a comment |
up vote
4
down vote
There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.
I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.
Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator
that is the base class of the RNGCryptoServiceProvider
. It would enumerate only char
s that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of char
s.
public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)
{
var buffer = new byte[bufferSize];
while (true)
{
rng.GetBytes(buffer);
foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
{
yield return c;
}
}
}
By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.
From here, it's now very easy to build your password generator. Create any random-number-generator, Take
as many char
s as you need and create the string.
public static string Generate(int length)
{
using (var rng = new RNGCryptoServiceProvider())
{
return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());
}
}
And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes
(discovered by @Adriano?).
So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!
There is one thing about goto
that hasn't been said yet... and you're using something very similar everyday, it's called switch
:
public static void Main()
{
var num = 1;
switch (num)
{
case 1:
goto case 3;
case 2:
Console.WriteLine("Hi goto!");
break;
case 3:
goto case 2;
}
}
Yes, you can jump within its scope from case
to case
in any order.
add a comment |
up vote
4
down vote
up vote
4
down vote
There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.
I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.
Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator
that is the base class of the RNGCryptoServiceProvider
. It would enumerate only char
s that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of char
s.
public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)
{
var buffer = new byte[bufferSize];
while (true)
{
rng.GetBytes(buffer);
foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
{
yield return c;
}
}
}
By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.
From here, it's now very easy to build your password generator. Create any random-number-generator, Take
as many char
s as you need and create the string.
public static string Generate(int length)
{
using (var rng = new RNGCryptoServiceProvider())
{
return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());
}
}
And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes
(discovered by @Adriano?).
So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!
There is one thing about goto
that hasn't been said yet... and you're using something very similar everyday, it's called switch
:
public static void Main()
{
var num = 1;
switch (num)
{
case 1:
goto case 3;
case 2:
Console.WriteLine("Hi goto!");
break;
case 3:
goto case 2;
}
}
Yes, you can jump within its scope from case
to case
in any order.
There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.
I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.
Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator
that is the base class of the RNGCryptoServiceProvider
. It would enumerate only char
s that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of char
s.
public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)
{
var buffer = new byte[bufferSize];
while (true)
{
rng.GetBytes(buffer);
foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
{
yield return c;
}
}
}
By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.
From here, it's now very easy to build your password generator. Create any random-number-generator, Take
as many char
s as you need and create the string.
public static string Generate(int length)
{
using (var rng = new RNGCryptoServiceProvider())
{
return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());
}
}
And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes
(discovered by @Adriano?).
So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!
There is one thing about goto
that hasn't been said yet... and you're using something very similar everyday, it's called switch
:
public static void Main()
{
var num = 1;
switch (num)
{
case 1:
goto case 3;
case 2:
Console.WriteLine("Hi goto!");
break;
case 3:
goto case 2;
}
}
Yes, you can jump within its scope from case
to case
in any order.
edited 2 days ago
answered Nov 8 at 17:03
t3chb0t
33.2k744106
33.2k744106
add a comment |
add a comment |
up vote
3
down vote
goto, part deux
I don't agree with the mass consciousness that goto is a code smell.
code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.
Why goto could be bad
We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto
tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.
Why goto is bad
goto
murders the bedrock of all good programs - structured programming and modular construction. goto
literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.
As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto
is a code lobotomy. Many factors contribute to crumbling code but the more goto
s the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."
So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful
A very readable, practical, nicely concise "using goto wisely" discussion
These are the best arguments againstgoto
I've ever seen.
– t3chb0t
2 days ago
add a comment |
up vote
3
down vote
goto, part deux
I don't agree with the mass consciousness that goto is a code smell.
code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.
Why goto could be bad
We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto
tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.
Why goto is bad
goto
murders the bedrock of all good programs - structured programming and modular construction. goto
literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.
As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto
is a code lobotomy. Many factors contribute to crumbling code but the more goto
s the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."
So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful
A very readable, practical, nicely concise "using goto wisely" discussion
These are the best arguments againstgoto
I've ever seen.
– t3chb0t
2 days ago
add a comment |
up vote
3
down vote
up vote
3
down vote
goto, part deux
I don't agree with the mass consciousness that goto is a code smell.
code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.
Why goto could be bad
We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto
tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.
Why goto is bad
goto
murders the bedrock of all good programs - structured programming and modular construction. goto
literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.
As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto
is a code lobotomy. Many factors contribute to crumbling code but the more goto
s the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."
So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful
A very readable, practical, nicely concise "using goto wisely" discussion
goto, part deux
I don't agree with the mass consciousness that goto is a code smell.
code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.
Why goto could be bad
We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto
tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.
Why goto is bad
goto
murders the bedrock of all good programs - structured programming and modular construction. goto
literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.
As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto
is a code lobotomy. Many factors contribute to crumbling code but the more goto
s the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."
So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful
A very readable, practical, nicely concise "using goto wisely" discussion
edited Nov 8 at 20:54
answered Nov 8 at 19:53
radarbob
5,2541025
5,2541025
These are the best arguments againstgoto
I've ever seen.
– t3chb0t
2 days ago
add a comment |
These are the best arguments againstgoto
I've ever seen.
– t3chb0t
2 days ago
These are the best arguments against
goto
I've ever seen.– t3chb0t
2 days ago
These are the best arguments against
goto
I've ever seen.– t3chb0t
2 days ago
add a comment |
up vote
1
down vote
I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.
The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte
based random selection be biased because 256 % 62 != 0
. The problem can be solved by either adding two more chars resulting in 64 char:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
or reduce the valid chars to 32:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
There are pros and cons to both obviously, but both can be used i an efficient algorithm like:
public string Generate(int length)
{
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
using (var rng = new RNGCryptoServiceProvider())
{
char result = new char[length];
byte buffer = new byte[length];
int range = chars.Length;
rng.GetBytes(buffer);
for (int i = 0; i < length; i++)
{
result[i] = chars[buffer[i] % range];
}
return new string(result);
}
}
This should be unbiased.
I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.
add a comment |
up vote
1
down vote
I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.
The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte
based random selection be biased because 256 % 62 != 0
. The problem can be solved by either adding two more chars resulting in 64 char:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
or reduce the valid chars to 32:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
There are pros and cons to both obviously, but both can be used i an efficient algorithm like:
public string Generate(int length)
{
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
using (var rng = new RNGCryptoServiceProvider())
{
char result = new char[length];
byte buffer = new byte[length];
int range = chars.Length;
rng.GetBytes(buffer);
for (int i = 0; i < length; i++)
{
result[i] = chars[buffer[i] % range];
}
return new string(result);
}
}
This should be unbiased.
I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.
add a comment |
up vote
1
down vote
up vote
1
down vote
I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.
The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte
based random selection be biased because 256 % 62 != 0
. The problem can be solved by either adding two more chars resulting in 64 char:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
or reduce the valid chars to 32:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
There are pros and cons to both obviously, but both can be used i an efficient algorithm like:
public string Generate(int length)
{
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
using (var rng = new RNGCryptoServiceProvider())
{
char result = new char[length];
byte buffer = new byte[length];
int range = chars.Length;
rng.GetBytes(buffer);
for (int i = 0; i < length; i++)
{
result[i] = chars[buffer[i] % range];
}
return new string(result);
}
}
This should be unbiased.
I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.
I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.
The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte
based random selection be biased because 256 % 62 != 0
. The problem can be solved by either adding two more chars resulting in 64 char:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
or reduce the valid chars to 32:
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
There are pros and cons to both obviously, but both can be used i an efficient algorithm like:
public string Generate(int length)
{
const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";
using (var rng = new RNGCryptoServiceProvider())
{
char result = new char[length];
byte buffer = new byte[length];
int range = chars.Length;
rng.GetBytes(buffer);
for (int i = 0; i < length; i++)
{
result[i] = chars[buffer[i] % range];
}
return new string(result);
}
}
This should be unbiased.
I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.
edited Nov 8 at 18:47
answered Nov 8 at 16:22
Henrik Hansen
5,6081622
5,6081622
add a comment |
add a comment |
up vote
1
down vote
Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.
An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.
As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.
New contributor
add a comment |
up vote
1
down vote
Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.
An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.
As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.
New contributor
add a comment |
up vote
1
down vote
up vote
1
down vote
Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.
An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.
As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.
New contributor
Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.
An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.
As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.
New contributor
New contributor
answered yesterday
stephan mantler
112
112
New contributor
New contributor
add a comment |
add a comment |
up vote
-3
down vote
I would use a HashSet
:
HashSet<byte> validChars = new HashSet<byte> { 48, 49, 50 }; //add all valid
........
rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))
{
chars.Add(char(buffer[0]));
++counter;
}
3
Mind elaborating on why you'd use aHashSet
?
– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
@paparazzo: but this aspect of the original code is also O(1). Using aHashSet
here is slightly slower due to extra overhead.
– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
|
show 1 more comment
up vote
-3
down vote
I would use a HashSet
:
HashSet<byte> validChars = new HashSet<byte> { 48, 49, 50 }; //add all valid
........
rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))
{
chars.Add(char(buffer[0]));
++counter;
}
3
Mind elaborating on why you'd use aHashSet
?
– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
@paparazzo: but this aspect of the original code is also O(1). Using aHashSet
here is slightly slower due to extra overhead.
– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
|
show 1 more comment
up vote
-3
down vote
up vote
-3
down vote
I would use a HashSet
:
HashSet<byte> validChars = new HashSet<byte> { 48, 49, 50 }; //add all valid
........
rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))
{
chars.Add(char(buffer[0]));
++counter;
}
I would use a HashSet
:
HashSet<byte> validChars = new HashSet<byte> { 48, 49, 50 }; //add all valid
........
rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))
{
chars.Add(char(buffer[0]));
++counter;
}
edited 2 days ago
answered Nov 8 at 19:45
paparazzo
4,9911733
4,9911733
3
Mind elaborating on why you'd use aHashSet
?
– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
@paparazzo: but this aspect of the original code is also O(1). Using aHashSet
here is slightly slower due to extra overhead.
– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
|
show 1 more comment
3
Mind elaborating on why you'd use aHashSet
?
– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
@paparazzo: but this aspect of the original code is also O(1). Using aHashSet
here is slightly slower due to extra overhead.
– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
3
3
Mind elaborating on why you'd use a
HashSet
?– Shelby115
2 days ago
Mind elaborating on why you'd use a
HashSet
?– Shelby115
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago
2
2
@paparazzo: but this aspect of the original code is also O(1). Using a
HashSet
here is slightly slower due to extra overhead.– Pieter Witvoet
2 days ago
@paparazzo: but this aspect of the original code is also O(1). Using a
HashSet
here is slightly slower due to extra overhead.– Pieter Witvoet
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
@PieterWitvoet Not buying. Good day.
– paparazzo
2 days ago
2
2
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
2 days ago
|
show 1 more comment
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207171%2frandom-alphanumeric-password-generator-with-gotos%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Why aren't you using
Char.IsDigit()
andChar.IsLetter()
instead of your own range tests?– Barmar
Nov 8 at 3:57
@Barmar: In this case
Char.IsDigit()
andChar.IsLetter()
are probably the wrong choice, since the aim is to create a password with characters from a known range.– Jack Aidley
Nov 8 at 12:08
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just use
StringBuilder
to storechar
s and put above code in awhile (stringBuffer.length < length)
(changeTake()
to actually take only required number of characters...)– Adriano Repetti
Nov 8 at 12:11
1
@Shelby - Of course they include Unicode characters (
0
, for example). In this case, we seem to be selecting from the Latin-1 set, given thatbuffer
contains abyte
. We could mask that down to ASCII using& 0x7F
(and increase the number of random values we actually use, rather than discard).– Toby Speight
Nov 8 at 19:48
4
Moderator note: Not everyone has to mention how bad
goto
is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how badgoto
is.– Simon Forsberg♦
2 days ago