AEAD (Authenticated Encryption with Associated Data) modes of operations for symmetric block ciphers provide authentication and integrity in addition to confidentiality. Traditional modes like CBC or CTR only guarantee confidentiality; one has to combine them with cryptographic MACs in order to have assurance of authenticity. It is not trivial to implement such generic composition without introducing subtle security flaws. AEAD modes are designed to be more secure and often more efficient than ad-hoc constructions. Widespread AEAD modes are GCM, CCM, EAX, SIV, OCB. Most AEAD modes allow to optionally authenticate the plaintext with some piece of arbitrary data that will not be encrypted, for instance a packet header. In this file, that is called "associated data" (AD) even though terminology may vary from mode to mode. === Goals of the AEAD API 1. Any piece of data (AD, ciphertext, plaintext) is passed only once. 2. It is possible to encrypt/decrypt huge files withe little memory cost. To this end, authentication, encryption, and decryption are always incremental. That is, the caller may split data in any way, and the end result does not depend on how splitting is done. Data is processed immediately without being internally buffered. 3. API is similar to a Crypto.Hash MAC object, to the point that when only AD is present, the object behaves exactly like a MAC. 4. API is similar to a classic cipher object, to the point that when no AD is present, the object behaves exactly like a classic cipher mode (e.g. CBC). 5. MACs are produced and handled separately from ciphertext/plaintext. 6. The API is intuitive and hard to misuse. Exceptions are raised immediately in case of misuse. 7. Caller does not have to add or remove padding. The following state diagram shows the proposed API for AEAD modes. In general, it is applicable to all block ciphers in the Crypto.Cipher module, even though certain modes may put restrictions on certain cipher properties (e.g. block size). .--------. | START | '--------' | | .new(key, mode, iv, ...) v .-------------. .-------------.-------------| INITIALIZED |---------------+-------------. | | '-------------' | | | | | | | | |.encrypt() |.update() |.decrypt() | | | | | | | | v | | | | .--------------.___ .update() | | | | | CLEAR HEADER |<--' | | | | '--------------' | | | | | | | | | | | | .encrypt()| | | |.decrypt() | | | v | | | | v | | .--------------. | | | | .--------------. | | | ENCRYPTING |<----' | | '---->| DECRYPTING | | | '--------------' | | '--------------' | | | ^ | | | | ^ | | | hex/digest()| | |.encrypt()* | | | | |.decrypt()* | | | ''' | | | ''' | | | / \ | | | | hex/digest()/ \hex/verify()|hex/verify() | |hex/digest() | / \ | | | | / \ | hex/verify()| | v / \ v | | .--------------. / \ .--------------. | '------->| FINISHED/ENC |<-- -->| FINISHED/DEC |<---------' '--------------' '--------------' ^ | ^ | | |hex/digest() | |hex/verify() ''' ''' [*] this transition is not always allowed for non-online or 2-pass modes The following states are defined: - INITIALIZED. The cipher has been instantiated with a key, possibly a nonce (or IV), and other parameters. The cipher object may be used for encryption or decryption. - CLEAR HEADER. The cipher is authenticating the associated data. - ENCRYPTING. It has been decided that the cipher has to be used for encrypting data. At least one piece of ciphertext has been produced. - DECRYPTING. It has been decided that the cipher has to be used for decrypting data. At least one piece of candidate plaintext has been produced. - FINISHED/ENC. Authenticated encryption has completed. The final MAC has been produced. - FINISHED/DEC. Authenticated decryption has completed. It is now established if the associated data together the plaintext are authentic. In addition to the existing Cipher methods new(), encrypt() and decrypt(), 5 new methods are added to a Cipher object: - update() to consume all associated data. It takes as input a byte string, and it can be invoked multiple times. Ciphertext / plaintext must not be passed to update(). For simplicity, update() can only be called before encryption or decryption starts. If no associated data is used, update() is not called. - digest() to output the binary MAC. It can only be called at the end of the encryption. - hexdigest() as digest(), but the output is ASCII hexadecimal. - verify() to evaluate if the binary MAC is correct. It can only be called at the end of decryption. - hexverify() as verify(), but the input is ASCII hexadecimal. The syntax of update(), digest(), and hexdigest() are consistent with the API of a Hash object. IMPORTANT: method copy() is not present. Since most AEAD modes require IV/nonce to never repeat, this method may be misused and lead to security holes. Since MAC validation (decryption mode) is subject to timing attacks, the (hex)verify() method accepts as input the expected MAC and raises a ValueError exception if it does not match. Internally, the method can perform the validation using time-independent code. === Padding The proposed API only supports AEAD modes than do not require padding of the plaintext. Adding support for that would make the API more complex (for instance, an external "finished" flag to pass encrypt()). === Pre-processing of associated data The proposed API does not support pre-processing of AD. Sometimes, a lot of encryptions/decryptions are performed with the same key and the same AD, and different nonces and plaintext. Certain modes like GCM allow to cache processing of AD, and reuse it such results across several encryptions/decryptions. === Singe/2-pass modes and online/non-online modes The proposed API supports single-pass, online AEAD modes. A "single-pass" mode processes each block of AD or data only once. Compare that to "2-pass" modes, where each block of data (not AD) twice: once for authentication and one for enciphering. An "online" mode does not need to know the size of the message before encryption starts. As a consequence: - memory usage doesn't depend on the plaintext/ciphertext size. - when encrypting, partial ciphertexts can be immediately sent to the receiver. - when decrypting, partial (unauthenticated) plaintexts can be obtained from the cipher. Most single-pass modes are patented, and only the less efficient 2-pass modes are in widespread use. That is still OK, provided that encryption can start *before* authentication is completed. If that's the case (e.g. GCM, EAX) encrypt()/decrypt() will also take care of completing the authentication over the plaintext. If that's NOT the case (e.g. SIV), encrypt()/decrypt() can only be called once. A similar problem arises with non-online modes (e.g. CCM): they have to wait to see the end of the plaintext before encryption can start. The only way to achieve that is to only allow encrypt()/decrypt() to be called once. === Associated Data Depending on the mode, the associated data AD can be defined as: 1. a single binary string or 2. a vector of binary strings For modes of the 1st type (e.g. CCM), the API allows the AD to be split in any number of segments, and fed to update() in multiple calls. The resulting AD does not depend on how splitting is performed. It is responsability of the caller to ensure that concatenation of strings is secure. For instance, there is no difference between: c.update(b"builtin") c.update(b"securely") and: c.update(b"built") c.update(b"insecurely") For modes of the 2nd type (e.g. SIV), the API assumes that each call to update() ingests one full item of the vector. The two examples above are now different. === CCM mode Number of keys required: 1 Compatible with ciphers: AES (may be used with others if block length is 128 bits) Counter/IV/nonce: 1 nonce required (length 8..13 bytes) Single-pass: no Online: no Term for AD: "associate data" (NIST) or "additional authenticated data" (RFC) Term for MAC: part of ciphertext (NIST) or "encrypted authentication value" (RFC). Padding required: no Pre-processing of AD: not possible In order to allow encrypt()/decrypt() to be called multiple times, and to reduce the memory usage, the new() method of the Cipher module optionally accepts the following parameters: - 'assoc_len', the total length (in bytes) of the AD - 'msg_len', the total length (in bytes) of the plaintext or ciphertext === GCM mode Number of keys required: 1 Compatible with ciphers: AES (may be used with others if block length is 128 bits) Counter/IV/nonce: 1 IV required (any length) Single-pass: no Online: yes Term for AD: "additional authenticated data" Term for MAC: "authentication tag" or "tag" Padding required: no Pre-processing of AD: possible Encryption can start before authentication, so encrypt()/decrypt() can always be called multiple times. === EAX mode Number of keys required: 1 Compatible with ciphers: any Counter/IV/nonce: 1 IV required (any length) Single-pass: no Online: yes Term for AD: "header" Term for MAC: "tag" Padding required: no Pre-processing of AD: possible Encryption can start before authentication, so encrypt()/decrypt() can always be called multiple times. === SIV Number of keys required: 2 Compatible with ciphers: AES Counter/IV/nonce: 1 IV required (any length) Single-pass: no Online: no Term for AD: "associated data" (vector) Term for MAC: "tag" Padding required: no Pre-processing of AD: possible Encryption can only start before authentication is ended, so encrypt()/decrypt() an only be called once. SIV is not suitable for big files or streaming. AD is a vector of strings. One item in the vector is the (optional) IV.