NOTE: This is a revised version of my original post to reflect readers concerns regarding statements made that do not reflect best practices surrounding encryption key management. A big thank you to Andrew Jamieson who reviewed and commented on this revised posting.
During the last couple of years, I have run into more and more questions regarding encryption and encryption key management than I thought existed. As a result, I have come to the realization that, for most people, encryption is some mystical science. The stories of the Enigma machine and Bletchley Park have only seemed to add to that mysticism. Over the years, I have collected my thoughts based on all of the questions and developed this distilled and very simplified version of guidance for those of you struggling with encryption.
For the security and encryption purists out there, I do not represent this post in any way, shape, or form as the “be all, to end all” on encryption. Volumes upon volumes of books and Web sites have been dedicated to encryption, which is probably why it gets the bad reputation it does as the vast majority of these discussions are about as esoteric as they can be.
In addition, this post is written in regards to the most common method of encryption used in encrypting data stored in a database or file and that is the use of an encryption algorithm against a column of data or an entire file. It does not cover public key infrastructure (PKI) or other techniques that could be used. So please do not flame me for missing your favorite algorithm, other forms of encryption or some other piece of encryption minutiae.
There are all sorts of nuances to encryption methods and I do not want to cloud the basic issues so that people can get beyond the mysticism. This post is for educating people so that they have a modicum of knowledge to identify hyperbole from fact.
The first thing I want to clarify to people is that encryption and hashing are two entirely different methods. While both methods obscure information, the key thing to remember is that encryption is reversible and hashing is not reversible. Even security professionals get balled up interchanging hashing and encryption, so I wanted to make sure everyone understands the difference.
The most common questions I get typically revolve around how encryption works. Non-mathematicians should not need to know how an encryption algorithm works, that is for the experts that develop and prove that they work. In my opinion, unless you are a mathematician studying cryptography, I recommend that people trust the research conducted by the experts regarding encryption algorithms.
That is not to say you should not know strong cryptography from weak cryptography. I am just suggesting that the underlying mathematics that defines a strong algorithm can be beyond even some mathematicians, so why we expect non-mathematicians to understand encryption at this level is beyond me. My point is that the algorithms work. How they work is not and should not be a prerequisite for management and even security professionals to using encryption.
This leads me to the most important thing people need to know about encryption. If you only take away one thing from this post, it would be that strong encryption comes down to four basic principles.
- The algorithm used;
- The key used;
- How the key is managed; and
- How the key is protected.
If you understand these four basic principles you will be miles ahead of everyone else that is getting twisted up in the details and missing these key points. If you look at PCI requirement 3, the tests are structured around these four basic principles.
On the algorithm side of the equation, the best algorithm currently in use is the Advanced Encryption Standard (AES). AES was selected by the United States National Institute of Standards and Technology (NIST) in 2001 as the official encryption standard for the US government. AES replaced the Data Encryption Standard (DES) that was no longer considered secure. AES was selected through a competition where 15 algorithms were evaluated. While the following algorithms were not selected as the winner of the NIST competition, Twofish, Serpent, RC6 and MARS were finalists and are also considered strong encryption algorithms. Better yet, for all of you in the software development business, AES, Twofish, Serpent and MARS are open source. Other algorithms are available, but these are the most tested and reliable of the lot.
One form of DES, Triple DES (3DES) 168-bit key strength, is still considered strong encryption. However how long that will remain the case is up for debate I have always recommended staying away from 3DES 168-bit unless you have no other choice, which can be the case with older devices and software. If you are currently using 3DES, I would highly recommend you develop a plan to migrate away from using it.
This brings up another key take away from this discussion. Regardless of the algorithm used, they are not perfect. Over time, encryption algorithms are likely to be shown to have flaws or be breakable by the latest computing power available. Some flaws may be annoyances that you can work around or you may have to accept some minimal risk of their continued use. However, some flaws may be fatal and require the discontinued use of the algorithm as was the case with DES. The lesson here is that you should always be prepared to change your encryption algorithm. Not that you will likely be required to make such a change on a moment’s notice. But as the experience with DES shows, what was considered strong in the past, is no longer strong or should not be relied upon. Changes in computing power and research could make any algorithm obsolete thus requiring you to make a change.
Just because you use AES or another strong algorithm does not mean your encryption cannot be broken. If there is any weak link in the use of encryption, it is the belief by many that the algorithm is the only thing that matters. As a result, we end up with a strong algorithm using a weak key. Weak keys, such as a key comprised of the same character, a series of consecutive characters, easily guessed phrase or a key of insufficient length, are the reasons most often cited as why encryption fails. In order for encryption to be effective, encryption keys need to be strong as well. Encryption keys should be a minimum of 32 characters in length. However in the encryption game, the longer and more random the characters in a key the better, which is why you see organizations using 64 to 256 character long random key strings. When I use the term ‘character’ that can be printable characters of upper and lower case alphabetic as well as numeric and special characters. But ‘character’ can also include hexadecimal values as well if your key entry interface allows for hexadecimal values to be entered. The important thing to remember is that you should ensure that the values you enter for your key are as hard to guess or brute force as maximum key size of the algorithm you are using. For example, using a seven character password to generate a 256 bit AES key does not provide for the full strength of that algorithm.
This brings us to the topic of encryption key generation. There are a number of Web sites that can generate pseudo-random character strings for use as encryption keys. To be correct, any Web site claiming to generate a “random” string of characters is only pseudo-random. This is because the character generator algorithm is a mathematical formula and by its very nature is not truly random. My favorite Web site for this purpose is operated by Gibson Research Corporation (GRC). It is my favorite because it runs over SSL and is set up so that it is not cached or processed by search engines to better guarantee security. The GRC site generates 63 character long hexadecimal strings, alphanumeric strings and printable ASCII strings, not numerical strings provided by other random and pseudo-random number generator sites. Using such a site, you can generate keys or seed values for key generators. You can combine multiple results from these Web sites to generate longer key values.
In addition, you can have multiple people individually go to the Web site, obtain a pseudo-random character string and then have each of them enter their character string into the system. This is also known as split key knowledge as individuals only know their input to the final value of the key. Under such an approach, the key generator system asks each key custodian to enter their value (called a ‘component’) separately and the system allows no key custodian to come into contact with any other custodian’s component value. The key is then generated by combining the entered values in such a way that none of the individual inputs provides any information about the final key. It is important to note that simply concatenating the input values to form the key does not provide this function, and therefore does not ensure split knowledge of the key value.
Just because you have encrypted your data does not mean your job is over. Depending on how your encryption solution is implemented, you may be required to protect your encryption keys as well as periodically change those keys. Encryption key protection can be as simple as storing the key components on separate pieces of paper in separate, sealed envelopes or as high tech as storing them on separate encrypted USB thumb drives. Each of these would then be stored in separate safes.
You can also store encryption keys on a server not involved in storing encrypted data. This server should not be any ordinary server as it needs to be securely configured and very limited access. Using this approach is where those key encryption keys (KEK) come into play. The way this works is that each custodian generates a KEK and encrypts their component with the KEK. Those encrypted components can then be placed in an encrypted folder or zip file where computer operations have the encryption key. This is where you tend to see PGP used for encryption as multiple decryption keys can be used so that in an emergency, operations can decrypt the archive and then the key custodians or their backups can decrypt their key components.
Finally, key changes are where a lot of organizations run into issues. This is because key changes can require that the information be decrypted using the old key and then encrypted with the new key. That decrypt/encrypt process can take days, weeks even years depending on the volume of data involved. And depending on the time involved and how the decrypt/encrypt process is implemented, cardholder data can potentially be decrypted or exposed because of a compromised key for a long period of time.
The bottom line is that organizations can find out that key changes are not really feasible or introduce more risk than they are willing to accept. As a result, protection of the encryption keys takes on even more importance because key changes are not feasible. This is another reason why sales of key management appliances are on the rise.
That is encryption in a nutshell, a sort of “CliffsNotes” for the non-geeky out there. In future posts I intend to go into PKI and other nuances to encryption and how to address the various PCI requirements in requirements 3 and 4. For now, I wanted to get a basic educational foundation out there for people to build on and to remove that glassy eyed look that can occur when the topic of encryption comes up.