Monster.com recently disclosed yet another major breach that compromised the personal data of over 1.3 million users. This is not unlike the previous breach in August 2007, though the attack vector was likely different. From a notice on their website (emphasis mine):
We recently learned our database was illegally accessed and certain contact and account data were taken, including Monster user IDs and passwords, email addresses, names, phone numbers, and some basic demographic data. The information accessed does not include resumes.
Considering the well-known tendency to use the same password on multiple websites, compounded with the fact that Monster pledged a comprehensive security review after the first breach, it's just embarrassing that they are still storing passwords in the clear. So let's talk about how to properly store passwords for a web application. Use a one-way cryptographic hash Don't store your passwords in the clear! If you do, an attacker just needs to find one SQL Injection vulnerability and he's got the password for every one of your users. The idea behind using a one-way algorithm is that the hash value can't be reversed to "decrypt" the password. So how does authentication work? When a user attempts to login, you apply the same one-way algorithm to convert the user-provided password into the hash value, and then compare the two hashes. If they match, then the user-provided password was correct. At no time is the password ever stored in the clear. Often, developers will hear the advice "use a hash" and interpret that as "run the plaintext password through MD5 or SHA-1 and store the result." But that only solves part of the problem -- the part about using an irreversible algorithm. It doesn't protect against pre-computation. Let's say you've used SHA-1 to hash your passwords, and your USERS table looks like this in the database:
USER PASSWORD_HASH admin 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 bob fbb73ec5afd91d5b503ca11756e33d21a9045d9d jim 7c6a61c68ef8b9b6b061b28c348bc1ed7921cb53
So if you wanted to obtain the original passwords you'd have to run a dictionary or brute force attack, hashing all possible password options with SHA-1 and comparing the output to the stored hashes. This would take a long time but eventually you'd figure some of them out. But what if you already had a list of all 8-character permutations and their corresponding SHA-1 hashes? Now all you have to do is look up the hashes, rather than computing them on-the-fly. This is the idea behind rainbow tables. An attacker with a SHA-1 rainbow table covering 8-character alphanumeric combinations would quickly look up those three hashes and obtain the original passwords of "password", "p4ssword", and "passw0rd" respectively. Use a salt The best defense against pre-computation of raw hashes is salting. To salt a password, you append or prepend a random string of bits to the plaintext password and hash the result. You then store the salt value alongside the hash so that it can be used by the authentication routine. Look in the /etc/shadow file of any modern Unix system and you'll see something like this:
user1:$1$lKorlp4C$RD5TSM6PaZ6oaWRVUuXT40:13740:0:99999:7::: user2:$1$qOmA0CUm$I6IdbZDTDl6B6m7s77VPe1:13650:0:99999:7::: user3:$1$nIEInNo5$PSxcLtvGIJArL8r2AQl74.:13749:0:99999:7:::
Let's look at the "user1" entry in the example above, paying attention to the second field which contains a bunch of alphanumeric characters separated by dollar signs. The first token, 1, is a version number, The second token, lKorlp4C, is the salt. The third token, RD5TSM6PaZ6oaWRVUuXT40, is the one-way hash that was calculated using lKorlp4C as the salt. When the user attempts to login, the system passes the user-provided password along with the stored salt into the hash routine (in this case, md5crypt), and compares the result to the stored hash. Each bit of salt used doubles the amount of storage and computation required for a pre-computed table. For instance, if we used one bit of salt -- either 0 or 1 -- the rainbow table would have to account for two variations of every password. Eight bits of salt require 2^8, or 256 variations of every password. Use a sufficiently large salt and pre-computation becomes infeasible. For example, the md5crypt utility uses 48 bits of salt (and for an extra layer of protection, it runs 1000 iterations of MD5 to slow down dictionary attacks). There are a couple of common mistakes that people make with regard to salting. First, don't use the same salt every time. If you do, you're not really increasing the search space because the attacker only has to account for a single salt value. Second, don't worry about protecting the salt values, they're not secrets. The added security is derived not from the secrecy of the salt but rather by the amount it increases the resources required for pre-computation. If you have OpenSSL installed you can play around with various salt mechanisms and see what the output looks like:
$ openssl passwd -h Usage: passwd [options] [passwords] where options are -crypt standard Unix password algorithm (default) -1 MD5-based password algorithm -apr1 MD5-based password algorithm, Apache variant -salt string use provided salt -in file read passwords from file -stdin read passwords from stdin -noverify never verify when reading password from terminal -quiet no warnings -table format output as table -reverse switch table columns $ openssl passwd -1 password $1$LH1SwzJI$0ho4XuPVfGlbWIcNuGIap/ $ openssl passwd -1 password $1$eAUtQOBh$GlvJwVsyb8In5KKkvnR0E0 $ openssl passwd -1 password $1$PgaSiWTy$ElLh6uy83Y6T4Y70AGmV20
A quick Google search shows that there is a lot of confusion about salting. But wait, now my password recovery feature won't work What's that? You say your application has one of those "Forgot My Password" features where a user can type in their username and their current password will be sent to the e-mail address on file? Clearly, that requirement depends on passwords being stored either in the clear or using a reversible mechanism such as symmetric encryption. The answer here is to redesign your password recovery feature. Don't let an unnecessary requirement force you into poor security practices. If you must e-mail a password, generate a temporary password that's only valid for a short time period, and require the user to login immediately and select a new password. This obviates the need to retrieve the original, forgotten password. Why not just use symmetric encryption? Instead of storing passwords in the clear, you could encrypt them using a symmetric algorithm such as AES and have the application encrypt/decrypt as needed. While this solves the plaintext storage problem, it creates a new problem: key management. Where do you store the key? How often does it change? How many people have access to it? What do you do if/when the key is compromised? And so on. The tradeoff really isn't worth it for something that's more elegantly solved with salted hashes. Layered defenses While you're rethinking password storage, it might be a good time to consider other common flubs such as password complexity and brute-force protections. In conclusion
- Storing passwords in the clear puts your users at unnecessary risk if (when) your application database is compromised
- Use salted hashes instead of storing passwords in a recoverable format
- Password recovery mechanisms can be implemented without needing to obtain the original password
- As with any aspect of security architecture, use layered defenses
Have fun refactoring!