Introduction (Yak Shaving)
As you can see from the title, we are still the “CyberCoding for Fun” path, and although I would like to jump in and start how you make that happen, we need to step back a bit and take care of some details. In this case, defining and describing the Terminology and Concepts associated with crypto and cryptocoding is something that we refer to as “Yak Shaving”, or something you end up doing before starting what you started out doing.
Many times when you start a new project / endeavor, where some learning is involved it is necessary to back the task up to a point where you can actually do something. For example, if the task is painting something in the garage, it is often necessary to clean and prep the garage, go buy some paint, and probably prep the the item for painting – all prior to the actual ‘painting’. This is called Yak Shaving.
In this update, we are going to go over some of the system level concepts in modern crypto without getting bogged down in details or acronyms. The intent is that in order to understand cryptosystems, is is necessary to understand tools (and terms) available to crytosystem designer.
Terms – Authentication / Authorization / Credentials / Confidentiality / Integrity / Availability
This section is all about crypto terminology, and nothing more.
- Authentication – When a system authenticates a given set of credentials as valid. If these credentials are a username / password, the username identifies and the password authenticates. When a computer verifies that the username is valid, and that password matches the password associated with that username – you have been authenticated.
- Authorization – Authorization is about access control and determining what a given authenticated user can and cannot access within some context. Authorization is what separates ‘guest’ from ‘administrator’ (and everything in between) on most computer systems.
- Credentials – Something that identifies you to the computer system. This can be a username / password pair, or a PKI smartcard, or a simple RFID token. Each one of these represents different levels of confidence (that the credentials represent you), and this means they provide different levels of security. Bottom line – the security of most forms of credentials is based on how difficult they are to fake, break or steal.
- Confidentiality – The ability to keep something secret. It is really that simple. If we assume Alice and Bob are exchanging private information, confidentiality is a characteristic of the communications channel that prevents Eve from listening in.
- Integrity – The ability to ensure that the message received is the same as the message sent. If Alice and Bob are exchanging information, integrity is a characteristic of the communications channel that the prevents Eve from modifying the information (without detection) over the channel.
- Availability – The ability to ensure that the communications channel is available. If Alice and Bob are exchanging information, availability is the characteristic of the communications channel that prevents Eve from blocking information over the channel.
- Channel – Some arbitrary mechanism between two points for exchanging information. Channels can be nested on other channels. For example, the network IP protocol layer is a channel, and TCP is another protocol channel that runs on top of IP – forming TCP/IP. In this example, neither IP nor TCP are secure. Another example is the TLS secure protocol which runs on top of the insecure TCP/IP protocol. TLS is the basis of most secure web browser sessions.
- Ciphertext – Encrypted text or data, to be contrasted with cleartext (which is un-encrypted text).
Interesting Sidenote – In most cybersecurity scenarios, Bob and Alice are the protagonists and Eve is the antagonist. This is their story:
Encryption – Symmetric
Symmetric encryption is where plaintext is encrypted to ciphertext, and then decrypted back to plaintext using the same key used to encrypt. When used to secure some data or channel, it requires that both end points or all parties involved share the same key, which is where the term ‘pre-shared key (PSK)’ comes from.
Historically, symmetric encryption was the only form of encryption available until about 1976 (not withstanding classified encryption) when the Diffie-Hellman key exchange algorithm was published. Every form of encryption or ciphers prior to that time was about key generation, key management, and algorithms. Prior to WWII, all encryption was done by hand or with machines, the most sophisticated and infamous of these machines being the German Enigma machine.
Encryption – Asymmetrical
Asymmetrical encryption is a form of encryption where there are two paired keys, where either one can be used to encrypt and the opposite key (and only this key) of this pair is used to decrypt the data. The first form of asymmetrical encryption that became generally well know was the basis of the Diffie-Hellman Key exchange. There have been a number of different variations of asymmetrical encryption based on various arcane and complex mathematical methods, but all share the same basic characteristic of a key pair for encryption / decryption.
In common nomenclature, one of these keys are designated the ‘public key’ which is not kept secret (and often published publicly), and the other key is designated the ‘private key’ which is kept as secret as possible. On some operating systems, private keys are often secured in applications called ‘keyrings’ which require some form of user credentials to access. Bottom line – private keys need to be kept private.
The value of asymmetric encryption may not be immediately obvious, but let’s take a look at an example where we compare / contrast with symmetric encryption.
If Bob and Alice need to exchange some small amount of data securely over a non-secure channel and they are relying on symmetric encryption, both Bob and Alice need to have a pre-shared encryption key, and this pre-shared key needs to be kept secret from Eve (and everybody else who may be a threat). The problem with this is how Bob or Alice communicates this key to the other without an secure channel in place. Since the primary communications channel is insecure, it cannot be used to share the encryption key, which drives the need for some secondary channel or out of band (OOB) channel that is secure. Think about that a minute – exchanging information securely over a non-secure channel requires some other secure channel to exchange keys first. This highlights the fundamental problem with symmetric encryption; key management.
Now if we take a look at asymmetric encryption, both Bob and Alice have generated their own personal public-private key pairs. This is then followed by Bob and Alice exchanging their public keys. Since these are public keys and it is not necessary to keep them secret, this is much easier than exchanging a symmetric encryption key. Once both Bob and Alice have exchanged public keys, we can start.
- Bob has a message he wants to send to Alice securely over a non-secure channel.
- Bob takes the message, produces a hash of the message, encrypts that hash with his private key and attaches it to the message and produces message A. The encrypted hash is known as a digital signature.
- Bob takes message A and then encrypts it using Alice’s public key, producing ciphertext B.
- Bob then sends this encrypted message B to Alice via any non-secure channel.
- Alice gets the message and decrypts it using her private key, producing message A. Alice is the only one who can do this since she is the only one that has her private key.
- Alice then takes message A and decrypts the attached electronic signature using Bob’s public key, producing Bob’s original message.
From this exchange, we can make the following significant statements:
- Bob knows that Alice and only Alice can decrypt the outer encryption since she is the one who has her private key.
- Alice knows that Bob and only Bob could have sent the message since the digital signature was verified and Bob is the only one that has his private key.
- Alice knows that the message was not modified since the hash code produced from the digital signature matched the contents of the message.
- This was achieved without sending secret keys through a second channel or over a non-secure channel.
These are some fairly significant features of public-private key encryption. But of course our example can be compromised by a Man in the Middle Attack (MITM). For further details on these operations read the Wikipedia reference below on RSA.
Man in the Middle Attack (MITM)
As shown in the example above, public-private key encryption provides some significant advantages. However, it also is susceptible to some new attacks, including the Man in the Middle attack. If we look to the example above, both Bob and Alice generated their own public-private key pair and then somehow exchanged them. Since they are public keys there is no need for secrecy – but there is a need for integrity. Say for example that Bob and Alice emailed their public keys to each other. Meanwhile Eve was somehow able to intercept these emails, generate her own public-private key pairs, and substitute her public key in those emails and send them to Bob and Alice. This means that when Bob thinks he is signing the message with Alice’s public key it is really Eves public key. After Eve intercepts the message, she opens it with her private key and they re-signs it with Alice’s public key and sends on to Alice. The net result is that Eve can intercept and read every message without Bob or Alice being aware if it.
The use of Digital signatures is a technically interesting solution to many of the attacks on public-private key encryption. But first we need to talk about hashcodes. In the world of digital data and encryption, a ‘hashcode’ is a mathematical fingerprint of some data set. A typical hash code used is called SHA-2/256 (most often just SHA256) that ‘hashes’ a dataset of any size and produces a 256 bit hashcode. Due to the mathematical processes used, it is highly unlikely that a data set could be modified and still produce the same hashcode, so hashcodes are often used to verify integrity of datasets. When combined with public-private keys this leads us to digital signing.
In this example, we are going to add a fourth party to the example; Larry’s Certificate Authority (CA). At Larry’s CA, Larry has a special public-private keypair used just for signing things. It works just like any other public-private keypair, but is only used to sign things and is treated with a much higher degree of security than most other certs since it is used to assert the validity of many other certs.
In this example, both Bob and Alice take their public-private key pairs to their respective local offices of Larry’s CA along with identifying credentials – like drivers licenses, passports or birth certificates. Larry’s examines the credentials and determines that Bob is Bob and Alice is Alice, and then generates a Digital Public Key Certificate with their respective names, possibly addresses, email addresses, and their public keys. Larry then generates a hash of this Public Key Certificate, encrypts it with the signing private key, and attaches it to the public key certificate.
Now both Bob and Alice have upgraded from simply using self-generated public-private keypair to using a public private key pair with a public key certificate signed by a trusted certificate authority. So when Bob and Alice exchange these public key certificates, they can each take these certs and decrypt the signature using Larry’s CA public signing key, read the encrypted hashcode and compare it to the hashcode they generate from the certificate.
If the hashcodes match, we can conclude a few things about these public key certificates.
- Since Larry’s CA is known to check physical credentials, there is a certain level of trust that the personal identifying information on the public key certificate is really associated with that information.
- Since the digital signature is based on the hashcode of the entire public certificate, and the signature is valid – is is highly probable that the contents of the public key certificate have not been modified since it was signed.
- Bottom Line – If Bob has public key certificate for Alice signed by Larry’s CA, he can trust that this public key is trustworthy (and probably has not been replaced by Eve’s key). Since Alice can know the same things about Bob’s public key certificate signed by Larry’s CA, Bob and Alice can use each other’s public keys with a much higher degree of confidence than with the example based on self-generated keypairs.
It is important to note that any digital data can be signed by a public-private key pair. This includes public key credentials (as described above), executable code, firmware updates, and documents.
Digital Certificates are essentially what is described above in ‘Digital Signing’, but are mapped to specific structure. By mapping the data into a standard structure, it means that the generation, signing, verification and general use of the certificates can work across product / technology boundaries. In other words, it makes signed public-private key certificates inter-operable. The most common standard for digital certificates is X.509.
Public Key Infrastructure (PKI)
Public Key Infrastructure is an operational and inter-operable set of standards and services on a network that enable anybody to procure a signed digital certificate and use this as an authentication credential. On the Internet this allows every website to inter-operate securely with SSL / TLS, with certificates from any number of different Certificate authorities, with any number of web browsers, all automatically.
Within the context of an company, consortium, or enterprise the same type of PKI services can be operated to provide an additional level of operational security.
Diffie-Hellman Authentication (or Key Exchange) was the first published form of asymmetric encryption in 1976. In a Diffie-Hellman exchange, the two parties would generate their own public-private key pairs and exchange the public keys through some open channel. The fundamental issue with asymmetric encryption (for bulk data) is that it does not scale well for large data sets, since it requires significant computing effort. So in Diffie-Hellman, a public-private session is established and the payload / data for the exchange is a shared key for a symmetric encryption session. Once this key has been generated and securely shared with both parties, a much higher performance symmetric encryption secure session is established and used for all following communications in that secure session.
However – it is important to note that just like our example above with Bob and Alice, using locally generated unsigned keypairs are highly susceptible to MITM attacks and should never be used where that is a risk.
Most people are familiar with SSL/TLS as the keypair solution to generate secure sessions between webservers and web clients (browsers). SSL/TLS operates using the same logical steps as Diffie-Hellman, but with two differences. Rather than using locally generated unsigned keys, (as a minimum) the server has a signed key that is validated as part of the exchange. Optionally, the client may also have a signed key that can be used for authentication also.
SSL/TLS is very widely used and considered to be one of the most important foundational elements to privacy and security on the Internet. However it does have its issues. One of the most significant is that that the key to secure the symmetric encryption channel is exchanged using the certificate keypair. Now if we consider that servers will often server a large number of clients, and the server certificate will be the same for each of these customers and for each session – the keypair is the same in all cases. We also recognize that while this is happening, the key size is very likely to be fairly unbreakable; therefore the sessions are fairly secure.
However, if this session traffic is recorded and archived by some highly capable group, and at some later date this group was able to acquire the private key for the server certificate, it means that every single one of those sessions can be decrypted. Essentially the private key can be used to decrypt the initial key exchange for each session and then use that key for the remainder of the session.
There is however a solution – Forward Secrecy.
Forward secrecy is one of the most interesting developments (in my opinion) in securing communications using public-private key pairs. As discussed in SSL/TLS above, if a private key for server certificate is ever compromised, every session ever initiated with that certificate can be compromised.
In forward secrecy, a normal SSL/TLS session is initiated with a resulting symmetric encryption secure channel. At each end of this channel the server and client generate a public – private keypair, and exchange these public keys over the secure channel. A second symmetric key is generated and exchanged over the secure channel. This is essentially a Diffie-Hellman key exchange over an SSL/TLS session. This second key resulting from the Diffie-Hellman key exchange is then used to setup a symmetric session channel, and the initial channel is discarded.
Overall – the first key exchange authenticates the server (and possibly the client) since the session is based on signed certificates, but does not provide long term session security. The second key exchange based on Diffie-Hellman is vulnerable to MITM, but since it runs over an already authenticated secure channel, MITM is not a risk. Most significantly, since the private keys in the second key exchange are never sent over the channel or written persistently, these keys cannot be recovered from an archived session, and as a result the second symmetric session key is also unrecoverable.
This means that even if the private key for the server certificate is compromised, any archived sessions are still secure – Forward Secure.
Since 1976 cryptography and all of the associated piece parts have exploded in terms of development, applications and vulnerability research and a massive amount if it has been in open source development. However, for most engineers and programmers it is still very inaccessible. Step one in making it accessible is to learn how it fundamentally works, and this was step one.
Lastly – There are some very significant details on these topics that have been left out in order to generalize the concepts of operation and use. I strongly recommend at least skimming the references below to get a flavor of these details (that have been left out of this article). In my experience it is very easy to get lost in the details and become frustrated, so this approach was intentional – and hopefully effective.