This article describes the common pitfalls when storing passwords and describes best practices for secure password authentication.
We accompany a young developer without the slightest clue of how to store passwords and follow him through his learning process. In each step I will describe the obvious and not-so-obvious problem with his solution. Let’s start!
Just store the passwords as plaintextSo our developer has to store the password of a user of his application. Since he knows how to work with databases, he just stores the password in plaintext.
The obvious problem with this approach is that everyone who has read access to the database can recover the passwords. As they are stored in plaintext and thus can easily be recovered, an attacker could even try to use the user’s username/password for other services, e.g. the email account.
Obfuscate the passwordOur developer realizes that storing the passwords as plaintext is a bad idea. So he invents his own algorithm to obfuscate the password.
Problems with this: Cryptography is hard. There are a lot of smart guys who worked together to create secure systems, so please do not roll your own crypto. Even if the output from your self-made crypto may look secure, it probably (and that is a very high probability) is not. Also, there is a principle called security through obscurity which you should avoid.
Encrypting the passwordSo our developer accepts the fact that creating his own crypto-algorithm is a bad idea. He uses his search engine of choice to find examples of good crypto. And he finds the AES cipher. This cipher is widely used and generally considered as secure. So he utilizes this cipher to encrypt the passwords before storing them in the database. To encrypt the password, he needs to use a key, which is stored somewhere in the application. This key is used to decrypt the password from the database when a user wants to log on to the application.
This approach has a big problem: If the attacker has read access to the database, it is likely that he can also find the application files. And somewhere in this files lies the key to encrypt/decrypt the passwords. The attacker then just needs to read the password from the database, decrypt it with the extracted key, and voila, he gets the plaintext password.
Hashing the passwordAfter discovering the flaw our developer reads about crypto basics and discovers a technique known as hash functions. Some well-known hash functions are MD5, SHA1 and the SHA-2xx family. A hash function is a function which takes data of an arbitrary length and maps it to a fixed-length value. Every same input data maps to the same output value. Hash functions which are used in cryptography have an interesting characteristic: they are non-invertible. That means it is very hard to find the input data for a given output value. “Hey, that’s exactly what I’m looking for”, our developer thinks. “This way I can store the password and nobody can recover the plaintext password!” Okay, you ask, but if you cannot recover the plaintext, how does the system check that the user has entered the correct password on login? Because the same input data maps to the same output value, you can take the password which the user has entered, hash it and compare it to the hash stored in the database. If they are both the same, the user has entered the correct password.
As you may have guessed, this technique also has a flaw: As every same password leads to the same hash, some guys have built big databases which do the inverse mapping, from hash to the plaintext password. As database lookups are usually really fast, the plaintext for a given hash can be looked up quite fast. Test it for yourself: Visit http://www.md5decrypt.org/ and enter some MD5 hashes. For example:
- 5ebe2294ecd0e0f08eab7690d2a6ee69 (“secret”)
- b295ab91a74ae7048c0ba523c03511e3 (“verysecret”)
Put in some saltAfter searching the internet for some time, our developer stumbles across the recommendation to add salt to his hashes. Salt is random data which is added to the plaintext password before hashing. This defeats precomputed hashes or rainbow tables, because the same input password does not produce the same hash anymore. Every password gets its own salt, and the salt is stored in plaintext along with the password hash. When the user wants to log on to the application, the salt and the hash is read from the database. The salt is now added to the password entered from the user and the hash function is applied. Then the hash from the database is compared to the calculated hash. If both hashes match, the user is allowed to login. There are still some mistakes our developer can make:
- Reuse the salt for every password. The salt should be unique for every password, otherwise the attacker can precalculate the hashes or build a rainbow table
- Use a salt which is too short: If you add just, say, for example one byte of salt, the precalculated hash table grows at the factor of 256 (2 to the power of 8), but an attacker can compensate for this.
- Use a non-random salt: If you use the username as salt for example, the salt is not random enough. Use a cryptographic random number generator (for example SecureRandom in Java) to generate the salt. If the salt isn't random, an attacker can build a rainbow table with that salt.
Performance problems the other way roundEven after using a hash function with salt, our developer recognizes, the passwords are not stored secure enough. The main problem is that hash functions are not made for password storage. Admittedly, they have the nice property of being non-invertible, but they have one flaw: they are way too fast. These hash functions have been designed to work with huge amount of data at a fast speed. With the uprising of the ability to calculate on GPUs, billions of hashes can be checked per second (!). A great tool for this is oclHashcat. An NVidia GTX 750 Ti GPU, which costs at the time of writing about 150 Euros, can create roughly 3 billion of MD5 Hashes per second. If you use a more expensive GPU or multiple GPUs, this gets even faster.
The solution, finallyAfter realizing that using a hash function is not the way to go when storing passwords, our developer reads about cryptographic functions which are more suitable for password storage. One of these functions is PBKDF2, the Password based keyderivation function #2. PBKDF2 applies a salt to the password and uses a hash function (or another pseudorandom function) multiple times. The number of applications of this function is called the iteration count. This technique is known as key stretching and slows down the hashing of passwords. As a result an attacker can calculate considerably fewer hashes per second. The iteration count of PBKDF2 can be adjusted to compensate the growth in computing power. If the computing power grows, just increase the iteration count. Other examples of password hashing functions are Bcrypt and Scrypt. When using such a function you need to decide on how many iterations you need. This depends on your performance requirements and your available performance. ThisStackoverflow article offers some clues.
The QAware solutionWe at QAware have developed a library which handles all the unpleasant parts of secure password storage for you. The workflow looks like this:
As you can see in the code, this library has a pretty cool feature: automatic detection of broken algorithms and parameters. If you used, say, PBKDF2 with 1000 iterations, that was secure in the past, but nowadays it is not. The library will detect this and give you the possibility to update the password hash. The library also handles secure salt generation with a sufficient length and has secure defaults for iteration count and the like.
- Use crypto
- Do not roll your own crypto
- Use a salt
- Apply a random and long enough salt
- Use a cryptographic function which was designed for password hashing, for example PBKDF2 or bcrypt
- Adjust the iteration count to fit your needs
We've just made the library open source. You can download (and contribute) here: https://github.com/qaware/heimdall. Enjoy!