Doc/AEAD_API.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304

AEAD (Authenticated Encryption with Associated Data) modes of operations
for symmetric block ciphers provide authentication and integrity in addition
to confidentiality.

Traditional modes like CBC or CTR only guarantee confidentiality; one
has to combine them with cryptographic MACs in order to have assurance of
authenticity. It is not trivial to implement such generic composition
without introducing subtle security flaws.

AEAD modes are designed to be more secure and often more efficient than
ad-hoc constructions. Widespread AEAD modes are GCM, CCM, EAX, SIV, OCB.

Most AEAD modes allow to optionally authenticate the plaintext with some
piece of arbitrary data that will not be encrypted, for instance a
packet header.

In this file, that is called "associated data" (AD) even though terminology
may vary from mode to mode.

The term "MAC" is used to indicate the authentication tag generated as part
of the encryption process. The MAC is consumed by the receiver to tell
whether the message has been tampered with.

=== Goals of the AEAD API

1. Any piece of data (AD, ciphertext, plaintext, MAC) is passed only once.
2. It is possible to encrypt/decrypt huge files withe little memory cost.
   To this end, authentication, encryption, and decryption are always
   incremental. That is, the caller may split data in any way, and the end
   result does not depend on how splitting is done.
   Data is processed immediately without being internally buffered.
3. API is similar to a Crypto.Hash MAC object, to the point that when only
   AD is present, the object behaves exactly like a MAC.
4. API is similar to a classic cipher object, to the point that when no AD is present,
   the object behaves exactly like a classic cipher mode (e.g. CBC).
5. MACs are produced and handled separately from ciphertext/plaintext.
6. The API is intuitive and hard to misuse. Exceptions are raised immediately
   in case of misuse.
7. Caller does not have to add or remove padding.

The following state diagram shows the proposed API for AEAD modes.
In general, it is applicable to all block ciphers in the Crypto.Cipher
module, even though certain modes may put restrictions on certain
cipher properties (e.g. block size).

                              .--------.
                              |  START |
                              '--------'
                                  |
                                  | .new(key, mode, iv, ...)
                                  v
                           .-------------.
.-------------.-------------| INITIALIZED |---------------+-------------.
|             |             '-------------'               |             |
|             |                    |                      |             |
|             |.encrypt()          |.update()             |.decrypt()   |
|             |                    |                      |             |
|             |                    v                      |             |
|             |            .--------------.___ .update()  |             |
|             |            | CLEAR HEADER |<--'           |             |
|             |            '--------------'               |             |
|             |               |  |   |  |                 |             |
|             |     .encrypt()|  |   |  |.decrypt()       |             |
|             v               |  |   |  |                 v             |
|        .--------------.     |  |   |  |     .--------------.          |
|        | ENCRYPTING   |<----'  |   |  '---->|  DECRYPTING  |          |
|        '--------------'        |   |        '--------------'          |
|             |   ^ |            |   |             |  ^ |               |
| hex/digest()|   | |.encrypt()* |   |             |  | |.decrypt()*    |
|             |   '''            |   |             |  '''               |
|             |                  /   \             |                    |
|             |     hex/digest()/     \hex/verify()|hex/verify()        |
|hex/digest() |                /       \           |                    |
|             |               /         \          |        hex/verify()|
|             v              /           \         v                    |
|        .--------------.   /             \   .--------------.          |
'------->| FINISHED/ENC |<--               -->| FINISHED/DEC |<---------'
         '--------------'                     '--------------'
               ^ |                                  ^ |
               | |hex/digest()                      | |hex/verify()
               '''                                  '''

 [*] this transition is not always allowed for non-online or 2-pass modes

The following states are defined:

 - INITIALIZED. The cipher has been instantiated with a key, possibly
   a nonce (or IV), and other parameters.
   The cipher object may be used for encryption or decryption.

 - CLEAR HEADER. The cipher is authenticating the associated data.

 - ENCRYPTING. It has been decided that the cipher has to be used for
   encrypting data. At least one piece of ciphertext has been produced.

 - DECRYPTING. It has been decided that the cipher has to be used for
   decrypting data. At least one piece of candidate plaintext has
   been produced.

 - FINISHED/ENC. Authenticated encryption has completed.
   The final MAC has been produced.

 - FINISHED/DEC. Authenticated decryption has completed.
   It is now established if the associated data together the plaintext
   are authentic.

In addition to the existing Cipher methods new(), encrypt() and decrypt(),
5 new methods are added to a Cipher object:

 - update() to consume all associated data. It takes as input
   a byte string, and it can be invoked multiple times.
   Ciphertext / plaintext must not be passed to update().
   For simplicity, update() can only be called before encryption
   or decryption starts. If no associated data is used, update()
   is not called.

 - digest() to output the binary MAC. It can only be called at the end
   of the encryption.

 - hexdigest() as digest(), but the output is ASCII hexadecimal.

 - verify() to evaluate if the binary MAC is correct. It can only be
   called at the end of decryption. It takes the MAC as input.

 - hexverify() as verify(), but the input is ASCII hexadecimal.

The syntax of update(), digest(), and hexdigest() are consistent with
the API of a Hash object.

IMPORTANT: method copy() is not present. Since most AEAD modes require
IV/nonce to never repeat, this method may be misused and lead to security
holes.

Since MAC validation (decryption mode) is subject to timing attacks,
the (hex)verify() method internally performs comparison of received
and computed MACs using time-independent code.
A ValueError exception is issued if the MAC does not match.

=== Padding

The proposed API only supports AEAD modes that do not require padding
of the plaintext. Adding support for that would make the API more complex
(for instance, an external "finished" flag to pass encrypt()).

=== Pre-processing of associated data

The proposed API does not support pre-processing of AD.

Sometimes, a lot of encryptions/decryptions are performed with
the same key and the same AD, and different nonces and plaintext.

Certain modes like GCM allow to cache processing of AD, and reuse
it such results across several encryptions/decryptions.

=== Singe/2-pass modes and online/non-online modes

The proposed API supports single-pass, online AEAD modes.

A "single-pass" mode processes each block of AD or data only once.
Compare that to "2-pass" modes, where each block of data (not AD)
is processed twice: once for authentication and once for enciphering.

An "online" mode does not need to know the size of the message before
encryption starts. As a consequence:
 - memory usage doesn't depend on the plaintext/ciphertext size.
 - when encrypting, partial ciphertexts can be immediately sent to the receiver.
 - when decrypting, partial (unauthenticated) plaintexts can be obtained from
   the cipher.

Most single-pass modes are patented, and only the less efficient 2-pass modes
are in widespread use.

That is still OK, provided that encryption can start *before* authentication is
completed. If that's the case (e.g. GCM, EAX) encrypt()/decrypt() will also
take care of completing the authentication over the plaintext.

A similar problem arises with non-online modes (e.g. CCM): they have to wait to see
the end of the plaintext before encryption can start. The only way to achieve
that is to only allow encrypt()/decrypt() to be called once.

To summarize:

Single-pass and online:
 Fully supported by the API. The mode is most likely patented.

Single-pass and not-online:
 encrypt()/decrypt() can only be called once. The mode is most likely patented.

2-pass and online:
 Fully supported by the API, but only if encryption or decryption can start
 *before* authentication is finished.

2-pass and not-online:
 Fully supported by the API, but only if encryption or decryption can start
 *before* authentication is finished. encrypt()/decrypt() can only be called once.

=== Associated Data

Depending on the mode, the associated data AD can be defined as:
 1. a single binary string or
 2. a vector of binary strings

For modes of the 1st type (e.g. CCM), the API allows the AD to be split
in any number of segments, and fed to update() in multiple calls.
The resulting AD does not depend on how splitting is performed.
It is responsability of the caller to ensure that concatenation of strings
is secure. For instance, there is no difference between:

   c.update(b"builtin")
   c.update(b"securely")

and:

   c.update(b"built")
   c.update(b"insecurely")

For modes of the 2nd type (e.g. SIV), the API assumes that each call
to update() ingests one full item of the vector. The two examples above
are now different.

=== MAC

During encryption, the MAC is computed and returned by digest().

During decryption, the received MAC is passed as a parameter to verify()
at the very end.

=== CCM mode
Number of keys required:	1
Compatible with ciphers:	AES (may be used with others if block length
                                is 128 bits)
Counter/IV/nonce:		1 nonce required (length 8..13 bytes)
Single-pass:			no
Online:				no
Encryption can start before
authentication is complete:	yes
Term for AD:			"associate data" (NIST) or "additional
                                authenticated data" (RFC)
Term for MAC:			part of ciphertext (NIST) or
				"encrypted authentication value" (RFC).
Padding required:		no
Pre-processing of AD:		not possible

In order to allow encrypt()/decrypt() to be called multiple times,
and to reduce the memory usage, the new() method of the Cipher module
optionally accepts the following parameters:

 - 'assoc_len', the total length (in bytes) of the AD

 - 'msg_len', the total length (in bytes) of the plaintext or ciphertext

=== GCM mode
Number of keys required:	1
Compatible with ciphers:	AES (may be used with others if block length
                                is 128 bits)
Counter/IV/nonce:		1 IV required (any length)
Single-pass:			no
Online:				yes
Encryption can start before
authentication is complete:	yes
Term for AD:			"additional authenticated data"
Term for MAC:			"authentication tag" or "tag"
Padding required:		no
Pre-processing of AD:		possible

=== EAX mode
Number of keys required:	1
Compatible with ciphers:	any
Counter/IV/nonce:		1 IV required (any length)
Single-pass:			no
Online:				yes
Encryption can start before
authentication is complete:	yes
Term for AD:			"header"
Term for MAC:			"tag"
Padding required:		no
Pre-processing of AD:		possible

=== SIV
Number of keys required:	2
Compatible with ciphers:	AES
Counter/IV/nonce:		1 IV required (any length)
Single-pass:			no
Online:				no
Encryption can start before
authentication is complete:	no
Term for AD:			"associated data" (vector)
Term for MAC:			"tag"
Padding required:		no
Pre-processing of AD:		possible

AD is a vector of strings. One item in the vector is the (optional) IV.

One property of SIV is that encryption or decryption can only start
as soon as the MAC is available.

That is not a problem for encryption: since encrypt() can only be called
once, it can internally compute the MAC first, perform the encryption,
and return the ciphertext. The MAC is cached for the subsequent call to digest().

The major problem is decryption: decrypt() cannot produce any output because
the MAC is meant to be passed to verify() only later.
To overcome this limitation, decrypt() *exceptionally* accepts the ciphertext
concatenated to the MAC. The MAC check is still performed with verify().