From 3663b6cc60d30a2996ba25b8dd2ca1493336d290 Mon Sep 17 00:00:00 2001 From: "A.M. Kuchling" Date: Wed, 2 Jan 2008 10:08:26 -0500 Subject: [project @ amk@amk.ca-20080102150826-02fmrui0m05jg1eh] Remove TeX file --- Doc/pycrypt.tex | 1188 ------------------------------------------------------- 1 file changed, 1188 deletions(-) delete mode 100644 Doc/pycrypt.tex (limited to 'Doc') diff --git a/Doc/pycrypt.tex b/Doc/pycrypt.tex deleted file mode 100644 index 5d6a41c..0000000 --- a/Doc/pycrypt.tex +++ /dev/null @@ -1,1188 +0,0 @@ -\documentclass{howto} - -\title{Python Cryptography Toolkit} - -\release{2.0.2} - -\author{A.M. Kuchling} -\authoraddress{\url{www.amk.ca}} - -\begin{document} -\maketitle - -\begin{abstract} -\noindent -The Python Cryptography Toolkit describes a package containing various -cryptographic modules for the Python programming language. This -documentation assumes you have some basic knowledge about the Python -language, but not necessarily about cryptography. - -\end{abstract} - -\tableofcontents - - -%====================================================================== -\section{Introduction} - -\subsection{Design Goals} -The Python cryptography toolkit is intended to provide a reliable and -stable base for writing Python programs that require cryptographic -functions. - -A central goal of the author's has been to provide a simple, -consistent interface for similar classes of algorithms. For example, -all block cipher objects have the same methods and return values, and -support the same feedback modes. Hash functions have a different -interface, but it too is consistent over all the hash functions -available. Some of these interfaces have been codified as Python -Enhancement Proposal documents, as \pep{247}, ``API for Cryptographic -Hash Functions'', and \pep{272}, ``API for Block Encryption -Algorithms''. - -This is intended to make it easy to replace old algorithms with newer, -more secure ones. If you're given a bit of portably-written Python -code that uses the DES encryption algorithm, you should be able to use -AES instead by simply changing \code{from Crypto.Cipher import DES} to -\code{from Crypto.Cipher import AES}, and changing all references to -\code{DES.new()} to \code{AES.new()}. It's also fairly simple to -write your own modules that mimic this interface, thus letting you use -combinations or permutations of algorithms. - -Some modules are implemented in C for performance; others are written -in Python for ease of modification. Generally, low-level functions -like ciphers and hash functions are written in C, while less -speed-critical functions have been written in Python. This division -may change in future releases. When speeds are quoted in this -document, they were measured on a 500 MHz Pentium II running Linux. -The exact speeds will obviously vary with different machines, -different compilers, and the phase of the moon, but they provide a -crude basis for comparison. Currently the cryptographic -implementations are acceptably fast, but not spectacularly good. I -welcome any suggestions or patches for faster code. - -I have placed the code under no restrictions; you can redistribute the -code freely or commercially, in its original form or with any -modifications you make, subject to whatever local laws may apply in your -jurisdiction. Note that you still have to come to some agreement with -the holders of any patented algorithms you're using. If you're -intensively using these modules, please tell me about it; there's little -incentive for me to work on this package if I don't know of anyone using -it. - -I also make no guarantees as to the usefulness, correctness, or legality -of these modules, nor does their inclusion constitute an endorsement of -their effectiveness. Many cryptographic algorithms are patented; -inclusion in this package does not necessarily mean you are allowed to -incorporate them in a product and sell it. Some of these algorithms may -have been cryptanalyzed, and may no longer be secure. While I will -include commentary on the relative security of the algorithms in the -sections entitled "Security Notes", there may be more recent analyses -I'm not aware of. (Or maybe I'm just clueless.) If you're implementing -an important system, don't just grab things out of a toolbox and put -them together; do some research first. On the other hand, if you're -just interested in keeping your co-workers or your relatives out of your -files, any of the components here could be used. - -This document is very much a work in progress. If you have any -questions, comments, complaints, or suggestions, please send them to me. - -\subsection{Acknowledgements} -Much of the code that actually implements the various cryptographic -algorithms was not written by me. I'd like to thank all the people who -implemented them, and released their work under terms which allowed me -to use their code. These individuals are credited in the relevant -chapters of this documentation. Bruce Schneier's book \emph{Applied -Cryptography} was also very useful in writing this toolkit; I highly -recommend it if you're interested in learning more about cryptography. - -Good luck with your cryptography hacking! - -A.M.K. - -\email{comments@amk.ca} - -Washington DC, USA - -June 2005 - - -%====================================================================== -\section{Crypto.Hash: Hash Functions} - -Hash functions take arbitrary strings as input, and produce an output -of fixed size that is dependent on the input; it should never be -possible to derive the input data given only the hash function's -output. One simple hash function consists of simply adding together -all the bytes of the input, and taking the result modulo 256. For a -hash function to be cryptographically secure, it must be very -difficult to find two messages with the same hash value, or to find a -message with a given hash value. The simple additive hash function -fails this criterion miserably and the hash functions described below -meet this criterion (as far as we know). Examples of -cryptographically secure hash functions include MD2, MD5, and SHA1. - -Hash functions can be used simply as a checksum, or, in association with a -public-key algorithm, can be used to implement digital signatures. - -The hashing algorithms currently implemented are: - -\begin{tableii}{c|l}{}{Hash function}{Digest length} -\lineii{MD2}{128 bits} -\lineii{MD4}{128 bits} -\lineii{MD5}{128 bits} -\lineii{RIPEMD}{160 bits} -\lineii{SHA1}{160 bits} -\lineii{SHA256}{256 bits} -\end{tableii} - -All hashing modules share the same interface. After importing a given -hashing module, call the \function{new()} function to create a new -hashing object. You can now feed arbitrary strings into the object -with the \method{update()} method, and can ask for the hash value at -any time by calling the \method{digest()} or \method{hexdigest()} -methods. The \function{new()} function can also be passed an optional -string parameter that will be immediately hashed into the object's -state. - -Hash function modules define one variable: - -\begin{datadesc}{digest_size} -An integer value; the size of the digest -produced by the hashing objects. You could also obtain this value by -creating a sample object, and taking the length of the digest string -it returns, but using \member{digest_size} is faster. -\end{datadesc} - -The methods for hashing objects are always the following: - -\begin{methoddesc}{copy}{} -Return a separate copy of this hashing object. An \code{update} to -this copy won't affect the original object. -\end{methoddesc} - -\begin{methoddesc}{digest}{} -Return the hash value of this hashing object, as a string containing -8-bit data. The object is not altered in any way by this function; -you can continue updating the object after calling this function. -\end{methoddesc} - -\begin{methoddesc}{hexdigest}{} -Return the hash value of this hashing object, as a string containing -the digest data as hexadecimal digits. The resulting string will be -twice as long as that returned by \method{digest()}. The object is not -altered in any way by this function; you can continue updating the -object after calling this function. -\end{methoddesc} - -\begin{methoddesc}{update}{arg} -Update this hashing object with the string \var{arg}. -\end{methoddesc} - -Here's an example, using the MD5 algorithm: - -\begin{verbatim} ->>> from Crypto.Hash import MD5 ->>> m = MD5.new() ->>> m.update('abc') ->>> m.digest() -'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' ->>> m.hexdigest() -'900150983cd24fb0d6963f7d28e17f72' -\end{verbatim} - - -\subsection{Security Notes} - -Hashing algorithms are broken by developing an algorithm to compute a -string that produces a given hash value, or to find two messages that -produce the same hash value. Consider an example where Alice and Bob -are using digital signatures to sign a contract. Alice computes the -hash value of the text of the contract and signs the hash value with -her private key. Bob could then compute a different contract that has -the same hash value, and it would appear that Alice signed that bogus -contract; she'd have no way to prove otherwise. Finding such a -message by brute force takes \code{pow(2, b-1)} operations, where the -hash function produces \emph{b}-bit hashes. - -If Bob can only find two messages with the same hash value but can't -choose the resulting hash value, he can look for two messages with -different meanings, such as "I will mow Bob's lawn for $10" and "I owe -Bob $1,000,000", and ask Alice to sign the first, innocuous contract. -This attack is easier for Bob, since finding two such messages by brute -force will take \code{pow(2, b/2)} operations on average. However, -Alice can protect herself by changing the protocol; she can simply -append a random string to the contract before hashing and signing it; -the random string can then be kept with the signature. - -None of the algorithms implemented here have been completely broken. -There are no attacks on MD2, but it's rather slow at 1250 K/sec. MD4 -is faster at 44,500 K/sec but there have been some partial attacks on -it. MD4 makes three iterations of a basic mixing operation; two of -the three rounds have been cryptanalyzed, but the attack can't be -extended to the full algorithm. MD5 is a strengthened version of MD4 -with four rounds; an attack against one round has been found XXX -update this. MD5 is still believed secure at the moment, but people -are gravitating toward using SHA1 in new software because there are no -known attacks against SHA1. The MD5 implementation is moderately -well-optimized and thus faster on x86 processors, running at 35,500 -K/sec. MD5 may even be faster than MD4, depending on the processor -and compiler you use. - -All the MD\var{n} algorithms produce 128-bit hashes; SHA1 produces a -larger 160-bit hash, and there are no known attacks against it. The -first version of SHA had a weakness which was later corrected; the -code used here implements the second, corrected, version. It operates -at 21,000 K/sec. SHA256 is about as half as fast as SHA1. RIPEMD has -a 160-bit output, the same output size as SHA1, and operates at 17,600 -K/sec. - -\subsection{Credits} -The MD2 and MD4 implementations were written by A.M. Kuchling, and the -MD5 code was implemented by Colin Plumb. The SHA1 code was originally -written by Peter Gutmann. The RIPEMD code was written by Antoon -Bosselaers, and adapted for the toolkit by Hirendra Hindocha. The -SHA256 code was written by Tom St.~Denis and is part of the -LibTomCrypt library (\url{http://www.libtomcrypt.org/}); it was -adapted for the toolkit by Jeethu Rao and Taylor Boon. - - -%====================================================================== -\section{Crypto.Cipher: Encryption Algorithms} - -Encryption algorithms transform their input data, or \dfn{plaintext}, -in some way that is dependent on a variable \dfn{key}, producing -\dfn{ciphertext}. This transformation can easily be reversed, if (and, -hopefully, only if) one knows the key. The key can be varied by the -user or application and chosen from some very large space of possible -keys. - -For a secure encryption algorithm, it should be very difficult to -determine the original plaintext without knowing the key; usually, no -clever attacks on the algorithm are known, so the only way of breaking -the algorithm is to try all possible keys. Since the number of possible -keys is usually of the order of 2 to the power of 56 or 128, this is not -a serious threat, although 2 to the power of 56 is now considered -insecure in the face of custom-built parallel computers and distributed -key guessing efforts. - -\dfn{Block ciphers} take multibyte inputs of a fixed size -(frequently 8 or 16 bytes long) and encrypt them. Block ciphers can -be operated in various modes. The simplest is Electronic Code Book -(or ECB) mode. In this mode, each block of plaintext is simply -encrypted to produce the ciphertext. This mode can be dangerous, -because many files will contain patterns greater than the block size; -for example, the comments in a C program may contain long strings of -asterisks intended to form a box. All these identical blocks will -encrypt to identical ciphertext; an adversary may be able to use this -structure to obtain some information about the text. - -To eliminate this weakness, there are various feedback modes in which -the plaintext is combined with the previous ciphertext before -encrypting; this eliminates any repetitive structure in the -ciphertext. - -One mode is Cipher Block Chaining (CBC mode); another is Cipher -FeedBack (CFB mode). CBC mode still encrypts in blocks, and thus is -only slightly slower than ECB mode. CFB mode encrypts on a -byte-by-byte basis, and is much slower than either of the other two -modes. The chaining feedback modes require an initialization value to -start off the encryption; this is a string of the same length as the -ciphering algorithm's block size, and is passed to the \code{new()} -function. There is also a special PGP mode, which is an oddball -variant of CFB used by the PGP program. While you can use it in -non-PGP programs, it's quite non-standard. - -The currently available block ciphers are listed in the following table, -and are in the \code{Crypto.Cipher} package: - -\begin{tableii}{c|l}{}{Cipher}{Key Size/Block Size} -\lineii{AES}{16, 24, or 32 bytes/16 bytes} -\lineii{ARC2}{Variable/8 bytes} -\lineii{Blowfish}{Variable/8 bytes} -\lineii{CAST}{Variable/8 bytes} -\lineii{DES}{8 bytes/8 bytes} -\lineii{DES3 (Triple DES)}{16 bytes/8 bytes} -\lineii{IDEA}{16 bytes/8 bytes} -\lineii{RC5}{Variable/8 bytes} -\end{tableii} - -In a strict formal sense, \dfn{stream ciphers} encrypt data bit-by-bit; -practically, stream ciphers work on a character-by-character basis. -Stream ciphers use exactly the -same interface as block ciphers, with a block length that will always -be 1; this is how block and stream ciphers can be distinguished. -The only feedback mode available for stream ciphers is ECB mode. - -The currently available stream ciphers are listed in the following table: - -\begin{tableii}{c|l}{}{Cipher}{Key Size} -\lineii{Cipher}{Key Size} - \lineii{ARC4}{Variable} - \lineii{XOR}{Variable} -\end{tableii} - -ARC4 is short for `Alleged RC4'. In September of 1994, someone posted -C code to both the Cypherpunks mailing list and to the Usenet -newsgroup \code{sci.crypt}, claiming that it implemented the RC4 -algorithm. This claim turned out to be correct. Note that there's a -damaging class of weak RC4 keys; this module won't warn you about such keys. -% XXX other analyses of RC4? - -A similar anonymous posting was made for Alleged RC2 in January, 1996. - -An example usage of the DES module: -\begin{verbatim} ->>> from Crypto.Cipher import DES ->>> obj=DES.new('abcdefgh', DES.MODE_ECB) ->>> plain="Guido van Rossum is a space alien." ->>> len(plain) -34 ->>> obj.encrypt(plain) -Traceback (innermost last): - File "", line 1, in ? -ValueError: Strings for DES must be a multiple of 8 in length ->>> ciph=obj.encrypt(plain+'XXXXXX') ->>> ciph -'\021,\343Nq\214DY\337T\342pA\372\255\311s\210\363,\300j\330\250\312\347\342I\3215w\03561\303dgb/\006' ->>> obj.decrypt(ciph) -'Guido van Rossum is a space alien.XXXXXX' -\end{verbatim} - -All cipher algorithms share a common interface. After importing a -given module, there is exactly one function and two variables -available. - -\begin{funcdesc}{new}{key, mode\optional{, IV}} -Returns a ciphering object, using \var{key} and feedback mode -\var{mode}. If \var{mode} is \constant{MODE_CBC} or \constant{MODE_CFB}, \var{IV} must be provided, -and must be a string of the same length as the block size. Some -algorithms support additional keyword arguments to this function; see -the "Algorithm-specific Notes for Encryption Algorithms" section below for the details. -\end{funcdesc} - -\begin{datadesc}{block_size} -An integer value; the size of the blocks encrypted by this module. -Strings passed to the \code{encrypt} and \code{decrypt} functions -must be a multiple of this length. For stream ciphers, -\code{block_size} will be 1. -\end{datadesc} - -\begin{datadesc}{key_size} -An integer value; the size of the keys required by this module. If -\code{key_size} is zero, then the algorithm accepts arbitrary-length -keys. You cannot pass a key of length 0 (that is, the null string -\code{''} as such a variable-length key. -\end{datadesc} - -All cipher objects have at least three attributes: - -\begin{memberdesc}{block_size} -An integer value equal to the size of the blocks encrypted by this object. -Identical to the module variable of the same name. -\end{memberdesc} - -\begin{memberdesc}{IV} -Contains the initial value which will be used to start a cipher -feedback mode. After encrypting or decrypting a string, this value -will reflect the modified feedback text; it will always be one block -in length. It is read-only, and cannot be assigned a new value. -\end{memberdesc} - -\begin{memberdesc}{key_size} -An integer value equal to the size of the keys used by this object. If -\code{key_size} is zero, then the algorithm accepts arbitrary-length -keys. For algorithms that support variable length keys, this will be 0. -Identical to the module variable of the same name. -\end{memberdesc} - -All ciphering objects have the following methods: - -\begin{methoddesc}{decrypt}{string} -Decrypts \var{string}, using the key-dependent data in the object, and -with the appropriate feedback mode. The string's length must be an exact -multiple of the algorithm's block size. Returns a string containing -the plaintext. -\end{methoddesc} - -\begin{methoddesc}{encrypt}{string} -Encrypts a non-null \var{string}, using the key-dependent data in the -object, and with the appropriate feedback mode. The string's length -must be an exact multiple of the algorithm's block size; for stream -ciphers, the string can be of any length. Returns a string containing -the ciphertext. -\end{methoddesc} - - -\subsection{Algorithm-specific Notes for Encryption Algorithms} - -RC5 has a bunch of parameters; see Ronald Rivest's paper at -\url{http://theory.lcs.mit.edu/~rivest/rc5rev.ps} for the -implementation details. The keyword parameters are: - -\begin{itemize} -\item \code{version}: -The version -of the RC5 algorithm to use; currently the only legal value is -\code{0x10} for RC5 1.0. -\item \code{wordsize}: -The word size to use; -16 or 32 are the only legal values. (A larger word size is better, so -usually 32 will be used. 16-bit RC5 is probably only of academic -interest.) -\item \code{rounds}: -The number of rounds to apply, the larger the more secure: this -can be any value from 0 to 255, so you will have to choose a value -balanced between speed and security. -\end{itemize} - - -\subsection{Security Notes} -Encryption algorithms can be broken in several ways. If you have some -ciphertext and know (or can guess) the corresponding plaintext, you can -simply try every possible key in a \dfn{known-plaintext} attack. Or, it -might be possible to encrypt text of your choice using an unknown key; -for example, you might mail someone a message intending it to be -encrypted and forwarded to someone else. This is a -\dfn{chosen-plaintext} attack, which is particularly effective if it's -possible to choose plaintexts that reveal something about the key when -encrypted. - -DES (5100 K/sec) has a 56-bit key; this is starting to become too small -for safety. It has been estimated that it would only cost \$1,000,000 to -build a custom DES-cracking machine that could find a key in 3 hours. A -chosen-ciphertext attack using the technique of \dfn{linear -cryptanalysis} can break DES in \code{pow(2, 43)} steps. However, -unless you're encrypting data that you want to be safe from major -governments, DES will be fine. DES3 (1830 K/sec) uses three DES -encryptions for greater security and a 112-bit or 168-bit key, but is -correspondingly slower. - -There are no publicly known attacks against IDEA (3050 K/sec), and -it's been around long enough to have been examined. There are no -known attacks against ARC2 (2160 K/sec), ARC4 (8830 K/sec), Blowfish -(9250 K/sec), CAST (2960 K/sec), or RC5 (2060 K/sec), but they're all -relatively new algorithms and there hasn't been time for much analysis -to be performed; use them for serious applications only after careful -research. - -AES, the Advanced Encryption Standard, was chosen by the US National -Institute of Standards and Technology from among 6 competitors, and is -probably your best choice. It runs at 7060 K/sec, so it's among the -faster algorithms around. - - -\subsection{Credits} -The code for Blowfish was written by Bryan Olson, partially based on a -previous implementation by Bruce Schneier, who also invented the -algorithm; the Blowfish algorithm has been placed in the public domain -and can be used freely. (See \url{http://www.counterpane.com} for more -information about Blowfish.) The CAST implementation was written by -Wim Lewis. The DES implementation was written by Eric Young, and the -IDEA implementation by Colin Plumb. The RC5 implementation -was written by A.M. Kuchling. - -The Alleged RC4 code was posted to the \code{sci.crypt} newsgroup by an -unknown party, and re-implemented by A.M. Kuchling. - - -%====================================================================== -\section{Crypto.Protocol: Various Protocols} - -\subsection{Crypto.Protocol.AllOrNothing} - -This module implements all-or-nothing package transformations. -An all-or-nothing package transformation is one in which some text is -transformed into message blocks, such that all blocks must be obtained before -the reverse transformation can be applied. Thus, if any blocks are corrupted -or lost, the original message cannot be reproduced. - -An all-or-nothing package transformation is not encryption, although a block -cipher algorithm is used. The encryption key is randomly generated and is -extractable from the message blocks. - -\begin{classdesc}{AllOrNothing}{ciphermodule, mode=None, IV=None} -Class implementing the All-or-Nothing package transform. - -\var{ciphermodule} is a module implementing the cipher algorithm to -use. Optional arguments \var{mode} and \var{IV} are passed directly -through to the \var{ciphermodule}.\code{new()} method; they are the -feedback mode and initialization vector to use. All three arguments -must be the same for the object used to create the digest, and to -undigest'ify the message blocks. - -The module passed as \var{ciphermodule} must provide the \pep{272} -interface. An encryption key is randomly generated automatically when -needed. -\end{classdesc} - -The methods of the \class{AllOrNothing} class are: - -\begin{methoddesc}{digest}{text} -Perform the All-or-Nothing package transform on the -string \var{text}. Output is a list of message blocks describing the -transformed text, where each block is a string of bit length equal -to the cipher module's block_size. -\end{methoddesc} - -\begin{methoddesc}{undigest}{mblocks} -Perform the reverse package transformation on a list of message -blocks. Note that the cipher module used for both transformations -must be the same. \var{mblocks} is a list of strings of bit length -equal to \var{ciphermodule}'s block_size. The output is a string object. -\end{methoddesc} - - -\subsection{Crypto.Protocol.Chaffing} - -Winnowing and chaffing is a technique for enhancing privacy without requiring -strong encryption. In short, the technique takes a set of authenticated -message blocks (the wheat) and adds a number of chaff blocks which have -randomly chosen data and MAC fields. This means that to an adversary, the -chaff blocks look as valid as the wheat blocks, and so the authentication -would have to be performed on every block. By tailoring the number of chaff -blocks added to the message, the sender can make breaking the message -computationally infeasible. There are many other interesting properties of -the winnow/chaff technique. - -For example, say Alice is sending a message to Bob. She packetizes the -message and performs an all-or-nothing transformation on the packets. Then -she authenticates each packet with a message authentication code (MAC). The -MAC is a hash of the data packet, and there is a secret key which she must -share with Bob (key distribution is an exercise left to the reader). She then -adds a serial number to each packet, and sends the packets to Bob. - -Bob receives the packets, and using the shared secret authentication key, -authenticates the MACs for each packet. Those packets that have bad MACs are -simply discarded. The remainder are sorted by serial number, and passed -through the reverse all-or-nothing transform. The transform means that an -eavesdropper (say Eve) must acquire all the packets before any of the data can -be read. If even one packet is missing, the data is useless. - -There's one twist: by adding chaff packets, Alice and Bob can make Eve's job -much harder, since Eve now has to break the shared secret key, or try every -combination of wheat and chaff packet to read any of the message. The cool -thing is that Bob doesn't need to add any additional code; the chaff packets -are already filtered out because their MACs don't match (in all likelihood -- -since the data and MACs for the chaff packets are randomly chosen it is -possible, but very unlikely that a chaff MAC will match the chaff data). And -Alice need not even be the party adding the chaff! She could be completely -unaware that a third party, say Charles, is adding chaff packets to her -messages as they are transmitted. - -\begin{classdesc}{Chaff}{factor=1.0, blocksper=1} -Class implementing the chaff adding algorithm. -\var{factor} is the number of message blocks - to add chaff to, expressed as a percentage between 0.0 and 1.0; the default value is 1.0. -\var{blocksper} is the number of chaff blocks to include for each block - being chaffed, and defaults to 1. The default settings -add one chaff block to every - message block. By changing the defaults, you can adjust how - computationally difficult it could be for an adversary to - brute-force crack the message. The difficulty is expressed as: - -\begin{verbatim} -pow(blocksper, int(factor * number-of-blocks)) -\end{verbatim} - -For ease of implementation, when \var{factor} < 1.0, only the first -\code{int(\var{factor}*number-of-blocks)} message blocks are chaffed. -\end{classdesc} - -\class{Chaff} instances have the following methods: - -\begin{methoddesc}{chaff}{blocks} -Add chaff to message blocks. \var{blocks} is a list of 3-tuples of the -form (\var{serial-number}, \var{data}, \var{MAC}). - -Chaff is created by choosing a random number of the same -byte-length as \var{data}, and another random number of the same -byte-length as \var{MAC}. The message block's serial number is placed -on the chaff block and all the packet's chaff blocks are randomly -interspersed with the single wheat block. This method then -returns a list of 3-tuples of the same form. Chaffed blocks will -contain multiple instances of 3-tuples with the same serial -number, but the only way to figure out which blocks are wheat and -which are chaff is to perform the MAC hash and compare values. -\end{methoddesc} - - -%====================================================================== -\section{Crypto.PublicKey: Public-Key Algorithms} -So far, the encryption algorithms described have all been \dfn{private -key} ciphers. The same key is used for both encryption and decryption -so all correspondents must know it. This poses a problem: you may -want encryption to communicate sensitive data over an insecure -channel, but how can you tell your correspondent what the key is? You -can't just e-mail it to her because the channel is insecure. One -solution is to arrange the key via some other way: over the phone or -by meeting in person. - -Another solution is to use \dfn{public-key} cryptography. In a public -key system, there are two different keys: one for encryption and one for -decryption. The encryption key can be made public by listing it in a -directory or mailing it to your correspondent, while you keep the -decryption key secret. Your correspondent then sends you data encrypted -with your public key, and you use the private key to decrypt it. While -the two keys are related, it's very difficult to derive the private key -given only the public key; however, deriving the private key is always -possible given enough time and computing power. This makes it very -important to pick keys of the right size: large enough to be secure, but -small enough to be applied fairly quickly. - -Many public-key algorithms can also be used to sign messages; simply -run the message to be signed through a decryption with your private -key key. Anyone receiving the message can encrypt it with your -publicly available key and read the message. Some algorithms do only -one thing, others can both encrypt and authenticate. - -The currently available public-key algorithms are listed in the -following table: - -\begin{tableii}{c|l}{}{Algorithm}{Capabilities} -\lineii{RSA}{Encryption, authentication/signatures} -\lineii{ElGamal}{Encryption, authentication/signatures} -\lineii{DSA}{Authentication/signatures} -\lineii{qNEW}{Authentication/signatures} -\end{tableii} - -Many of these algorithms are patented. Before using any of them in a -commercial product, consult a patent attorney; you may have to arrange -a license with the patent holder. - -An example of using the RSA module to sign a message: -\begin{verbatim} ->>> from Crypto.Hash import MD5 ->>> from Crypto.PublicKey import RSA ->>> RSAkey = RSA.generate(384, randfunc) # This will take a while... ->>> hash = MD5.new(plaintext).digest() ->>> signature = RSAkey.sign(hash, "") ->>> signature # Print what an RSA sig looks like--you don't really care. -('\021\317\313\336\264\315' ...,) ->>> RSAkey.verify(hash, signature) # This sig will check out -1 ->>> RSAkey.verify(hash[:-1], signature)# This sig will fail -0 -\end{verbatim} - -Public-key modules make the following functions available: - -\begin{funcdesc}{construct}{tuple} -Constructs a key object from a tuple of data. This is -algorithm-specific; look at the source code for the details. (To be -documented later.) -\end{funcdesc} - -\begin{funcdesc}{generate}{size, randfunc, progress_func=\code{None}} -Generate a fresh public/private key pair. \var{size} is a -algorithm-dependent size parameter, usually measured in bits; the -larger it is, the more difficult it will be to break the key. Safe -key sizes vary from algorithm to algorithm; you'll have to research -the question and decide on a suitable key size for your application. -An N-bit keys can encrypt messages up to N-1 bits long. - -\var{randfunc} is a random number generation function; it should -accept a single integer \var{N} and return a string of random data -\var{N} bytes long. You should always use a cryptographically secure -random number generator, such as the one defined in the -\module{Crypto.Util.randpool} module; \emph{don't} just use the -current time and the \module{random} module. - -\var{progress_func} is an optional function that will be called with a short -string containing the key parameter currently being generated; it's -useful for interactive applications where a user is waiting for a key -to be generated. -\end{funcdesc} - -If you want to interface with some other program, you will have to know -the details of the algorithm being used; this isn't a big loss. If you -don't care about working with non-Python software, simply use the -\module{pickle} module when you need to write a key or a signature to a -file. It's portable across all the architectures that Python supports, -and it's simple to use. - -Public-key objects always support the following methods. Some of them -may raise exceptions if their functionality is not supported by the -algorithm. - -\begin{methoddesc}{can_blind}{} -Returns true if the algorithm is capable of blinding data; -returns false otherwise. -\end{methoddesc} - -\begin{methoddesc}{can_encrypt}{} -Returns true if the algorithm is capable of encrypting and decrypting -data; returns false otherwise. To test if a given key object can encrypt -data, use \code{key.can_encrypt() and key.has_private()}. -\end{methoddesc} - -\begin{methoddesc}{can_sign}{} -Returns true if the algorithm is capable of signing data; returns false -otherwise. To test if a given key object can sign data, use -\code{key.can_sign() and key.has_private()}. -\end{methoddesc} - -\begin{methoddesc}{decrypt}{tuple} -Decrypts \var{tuple} with the private key, returning another string. -This requires the private key to be present, and will raise an exception -if it isn't present. It will also raise an exception if \var{string} is -too long. -\end{methoddesc} - -\begin{methoddesc}{encrypt}{string, K} -Encrypts \var{string} with the private key, returning a tuple of -strings; the length of the tuple varies from algorithm to algorithm. -\var{K} should be a string of random data that is as long as -possible. Encryption does not require the private key to be present -inside the key object. It will raise an exception if \var{string} is -too long. For ElGamal objects, the value of \var{K} expressed as a -big-endian integer must be relatively prime to \code{self.p-1}; an -exception is raised if it is not. -\end{methoddesc} - -\begin{methoddesc}{has_private}{} -Returns true if the key object contains the private key data, which -will allow decrypting data and generating signatures. -Otherwise this returns false. -\end{methoddesc} - -\begin{methoddesc}{publickey}{} -Returns a new public key object that doesn't contain the private key -data. -\end{methoddesc} - -\begin{methoddesc}{sign}{string, K} -Sign \var{string}, returning a signature, which is just a tuple; in -theory the signature may be made up of any Python objects at all; in -practice they'll be either strings or numbers. \var{K} should be a -string of random data that is as long as possible. Different algorithms -will return tuples of different sizes. \code{sign()} raises an -exception if \var{string} is too long. For ElGamal objects, the value -of \var{K} expressed as a big-endian integer must be relatively prime to -\code{self.p-1}; an exception is raised if it is not. -\end{methoddesc} - -\begin{methoddesc}{size}{} -Returns the maximum size of a string that can be encrypted or signed, -measured in bits. String data is treated in big-endian format; the most -significant byte comes first. (This seems to be a \emph{de facto} standard -for cryptographical software.) If the size is not a multiple of 8, then -some of the high order bits of the first byte must be zero. Usually -it's simplest to just divide the size by 8 and round down. -\end{methoddesc} - -\begin{methoddesc}{verify}{string, signature} -Returns true if the signature is valid, and false otherwise. -\var{string} is not processed in any way; \code{verify} does -not run a hash function over the data, but you can easily do that yourself. -\end{methoddesc} - -\subsection{The ElGamal and DSA algorithms} -For RSA, the \var{K} parameters are unused; if you like, you can just -pass empty strings. The ElGamal and DSA algorithms require a real -\var{K} value for technical reasons; see Schneier's book for a detailed -explanation of the respective algorithms. This presents a possible -hazard that can -inadvertently reveal the private key. Without going into the -mathematical details, the danger is as follows. \var{K} is never derived -or needed by others; theoretically, it can be thrown away once the -encryption or signing operation is performed. However, revealing -\var{K} for a given message would enable others to derive the secret key -data; worse, reusing the same value of \var{K} for two different -messages would also enable someone to derive the secret key data. An -adversary could intercept and store every message, and then try deriving -the secret key from each pair of messages. - -This places implementors on the horns of a dilemma. On the one hand, -you want to store the \var{K} values to avoid reusing one; on the other -hand, storing them means they could fall into the hands of an adversary. -One can randomly generate \var{K} values of a suitable length such as -128 or 144 bits, and then trust that the random number generator -probably won't produce a duplicate anytime soon. This is an -implementation decision that depends on the desired level of security -and the expected usage lifetime of a private key. I can't choose and -enforce one policy for this, so I've added the \var{K} parameter to the -\method{encrypt} and \method{sign} methods. You must choose \var{K} by -generating a string of random data; for ElGamal, when interpreted as a -big-endian number (with the most significant byte being the first byte -of the string), \var{K} must be relatively prime to \code{self.p-1}; any -size will do, but brute force searches would probably start with small -primes, so it's probably good to choose fairly large numbers. It might be -simplest to generate a prime number of a suitable length using the -\module{Crypto.Util.number} module. - - -\subsection{Security Notes for Public-key Algorithms} -Any of these algorithms can be trivially broken; for example, RSA can be -broken by factoring the modulus \emph{n} into its two prime factors. -This is easily done by the following code: - -\begin{verbatim} -for i in range(2, n): - if (n%i)==0: - print i, 'is a factor' - break -\end{verbatim} - -However, \emph{n} is usually a few hundred bits long, so this simple -program wouldn't find a solution before the universe comes to an end. -Smarter algorithms can factor numbers more quickly, but it's still -possible to choose keys so large that they can't be broken in a -reasonable amount of time. For ElGamal and DSA, discrete logarithms are -used instead of factoring, but the principle is the same. - -Safe key sizes depend on the current state of number theory and -computer technology. At the moment, one can roughly define three -levels of security: low-security commercial, high-security commercial, -and military-grade. For RSA, these three levels correspond roughly to -768, 1024, and 2048-bit keys. - - -%====================================================================== -\section{Crypto.Util: Odds and Ends} -This chapter contains all the modules that don't fit into any of the -other chapters. - -\subsection{Crypto.Util.number} - -This module contains various number-theoretic functions. - -\begin{funcdesc}{GCD}{x,y} -Return the greatest common divisor of \var{x} and \var{y}. -\end{funcdesc} - -\begin{funcdesc}{getPrime}{N, randfunc} -Return an \var{N}-bit random prime number, using random data obtained -from the function \var{randfunc}. \var{randfunc} must take a single -integer argument, and return a string of random data of the -corresponding length; the \method{get_bytes()} method of a -\class{RandomPool} object will serve the purpose nicely, as will the -\method{read()} method of an opened file such as \file{/dev/random}. -\end{funcdesc} - -\begin{funcdesc}{getRandomNumber}{N, randfunc} -Return an \var{N}-bit random number, using random data obtained from the -function \var{randfunc}. As usual, \var{randfunc} must take a single -integer argument and return a string of random data of the -corresponding length. -\end{funcdesc} - -\begin{funcdesc}{inverse}{u, v} -Return the inverse of \var{u} modulo \var{v}. -\end{funcdesc} - -\begin{funcdesc}{isPrime}{N} -Returns true if the number \var{N} is prime, as determined by a -Rabin-Miller test. -\end{funcdesc} - - -\subsection{Crypto.Util.randpool} - -For cryptographic purposes, ordinary random number generators are -frequently insufficient, because if some of their output is known, it -is frequently possible to derive the generator's future (or past) -output. Given the generator's state at some point in time, someone -could try to derive any keys generated using it. The solution is to -use strong encryption or hashing algorithms to generate successive -data; this makes breaking the generator as difficult as breaking the -algorithms used. - -Understanding the concept of \dfn{entropy} is important for using the -random number generator properly. In the sense we'll be using it, -entropy measures the amount of randomness; the usual unit is in bits. -So, a single random bit has an entropy of 1 bit; a random byte has an -entropy of 8 bits. Now consider a one-byte field in a database containing a -person's sex, represented as a single character \samp{M} or \samp{F}. -What's the entropy of this field? Since there are only two possible -values, it's not 8 bits, but one; if you were trying to guess the value, -you wouldn't have to bother trying \samp{Q} or \samp{@}. - -Now imagine running that single byte field through a hash function that -produces 128 bits of output. Is the entropy of the resulting hash value -128 bits? No, it's still just 1 bit. The entropy is a measure of how many -possible states of the data exist. For English -text, the entropy of a five-character string is not 40 bits; it's -somewhat less, because not all combinations would be seen. \samp{Guido} -is a possible string, as is \samp{In th}; \samp{zJwvb} is not. - -The relevance to random number generation? We want enough bits of -entropy to avoid making an attack on our generator possible. An -example: One computer system had a mechanism which generated nonsense -passwords for its users. This is a good idea, since it would prevent -people from choosing their own name or some other easily guessed string. -Unfortunately, the random number generator used only had 65536 states, -which meant only 65536 different passwords would ever be generated, and -it was easy to compute all the possible passwords and try them. The -entropy of the random passwords was far too low. By the same token, if -you generate an RSA key with only 32 bits of entropy available, there -are only about 4.2 billion keys you could have generated, and an -adversary could compute them all to find your private key. See \rfc{1750}, -"Randomness Recommendations for Security", for an interesting discussion -of the issues related to random number generation. - -The \module{randpool} module implements a strong random number generator -in the \class{RandomPool} class. The internal state consists of a string -of random data, which is returned as callers request it. The class -keeps track of the number of bits of entropy left, and provides a function to -add new random data; this data can be obtained in various ways, such as -by using the variance in a user's keystroke timings. - -\begin{classdesc}{RandomPool}{\optional{numbytes, cipher, hash} } -An object of the \code{RandomPool} class can be created without -parameters if desired. \var{numbytes} sets the number of bytes of -random data in the pool, and defaults to 160 (1280 bits). \var{hash} -can be a string containing the module name of the hash function to use -in stirring the random data, or a module object supporting the hashing -interface. The default action is to use SHA. - -The \var{cipher} argument is vestigial; it was removed from version -1.1 so RandomPool would work even in the limited exportable subset of -the code. I recommend passing \var{hash} using a keyword argument so -that someday I can safely delete the \var{cipher} argument - -\end{classdesc} - -\class{RandomPool} objects define the following variables and methods: - -\begin{methoddesc}{add_event}{time\optional{, string}} -Adds an event to the random pool. \var{time} should be set to the -current system time, measured at the highest resolution available. -\var{string} can be a string of data that will be XORed into the pool, -and can be used to increase the entropy of the pool. For example, if -you're encrypting a document, you might use the hash value of the -document; an adversary presumably won't have the plaintext of the -document, and thus won't be able to use this information to break the -generator. -\end{methoddesc} - -The return value is the value of \member{self.entropy} after the data has -been added. The function works in the following manner: the time -between successive calls to the \method{add_event()} method is determined, -and the entropy of the data is guessed; the larger the time between -calls, the better. The system time is then read and added to the pool, -along with the \var{string} parameter, if present. The hope is that the -low-order bits of the time are effectively random. In an application, -it is recommended that \method{add_event()} be called as frequently as -possible, with whatever random data can be found. - -\begin{memberdesc}{bits} -A constant integer value containing the number of bits of data in -the pool, equal to the \member{bytes} attribute multiplied by 8. -\end{memberdesc} - -\begin{memberdesc}{bytes} -A constant integer value containing the number of bytes of data in -the pool. -\end{memberdesc} - -\begin{memberdesc}{entropy} -An integer value containing the number of bits of entropy currently in -the pool. The value is incremented by the \method{add_event()} method, -and decreased by the \method{get_bytes()} method. -\end{memberdesc} - -\begin{methoddesc}{get_bytes}{num} -Returns a string containing \var{num} bytes of random data, and -decrements the amount of entropy available. It is not an error to -reduce the entropy to zero, or to call this function when the entropy -is zero. This simply means that, in theory, enough random information has been -extracted to derive the state of the generator. It is the caller's -responsibility to monitor the amount of entropy remaining and decide -whether it is sufficent for secure operation. -\end{methoddesc} - -\begin{methoddesc}{stir}{} -Scrambles the random pool using the previously chosen encryption and -hash function. An adversary may attempt to learn or alter the state -of the pool in order to affect its future output; this function -destroys the existing state of the pool in a non-reversible way. It -is recommended that \method{stir()} be called before and after using -the \class{RandomPool} object. Even better, several calls to -\method{stir()} can be interleaved with calls to \method{add_event()}. -\end{methoddesc} - -The \class{PersistentRandomPool} class is a subclass of \class{RandomPool} -that adds the capability to save and load the pool from a disk file. - -\begin{classdesc}{PersistentRandomPool}{filename, \optional{numbytes, cipher, hash}} -The path given in \var{filename} will be automatically opened, and an -existing random pool read; if no such file exists, the pool will be -initialized as usual. If omitted, the filename defaults to the empty -string, which will prevent it from being saved to a file. These -arguments are identical to those for the \class{RandomPool} -constructor. -\end{classdesc} - -\begin{methoddesc}{save}{} -Opens the file named by the \member{filename} attribute, and saves the -random data into the file using the \module{pickle} module. -\end{methoddesc} - -The \class{KeyboardRandomPool} class is a subclass of -\class{PersistentRandomPool} that provides a method to obtain random -data from the keyboard: - -\begin{methoddesc}{randomize}{} -(Unix systems only) Obtain random data from the keyboard. This works -by prompting the -user to hit keys at random, and then using the keystroke timings (and -also the actual keys pressed) to add entropy to the pool. This works -similarly to PGP's random pool mechanism. -\end{methoddesc} - - -\subsection{Crypto.Util.RFC1751} -The keys for private-key algorithms should be arbitrary binary data. -Many systems err by asking the user to enter a password, and then -using the password as the key. This limits the space of possible -keys, as each key byte is constrained within the range of possible -ASCII characters, 32-127, instead of the whole 0-255 range possible -with ASCII. Unfortunately, it's difficult for humans to remember 16 -or 32 hex digits. - -One solution is to request a lengthy passphrase from the user, and -then run it through a hash function such as SHA or MD5. Another -solution is discussed in RFC 1751, "A Convention for Human-Readable -128-bit Keys", by Daniel L. McDonald. Binary keys are transformed -into a list of short English words that should be easier to remember. -For example, the hex key EB33F77EE73D4053 is transformed to "TIDE ITCH -SLOW REIN RULE MOT". - -\begin{funcdesc}{key_to_english}{key} -Accepts a string of arbitrary data \var{key}, and returns a string -containing uppercase English words separated by spaces. \var{key}'s -length must be a multiple of 8. -\end{funcdesc} - -\begin{funcdesc}{english_to_key}{string} -Accepts \var{string} containing English words, and returns a string of -binary data representing the key. Words must be separated by -whitespace, and can be any mixture of uppercase and lowercase -characters. 6 words are required for 8 bytes of key data, so -the number of words in \var{string} must be a multiple of 6. -\end{funcdesc} - - -%====================================================================== -\section{Extending the Toolkit} - -Preserving the a common interface for cryptographic routines is a good -idea. This chapter explains how to write new modules for the Toolkit. - -The basic process is as follows: -\begin{enumerate} - -\item Add a new \file{.c} file containing an implementation of the new -algorithm. -This file must define 3 or 4 standard functions, -a few constants, and a C \code{struct} encapsulating the state variables required by the algorithm. - -\item Add the new algorithm to \file{setup.py}. - -\item Send a copy of the code to me, if you like; code for new -algorithms will be gratefully accepted. -\end{enumerate} - - -\subsection{Adding Hash Algorithms} - -The required constant definitions are as follows: - -\begin{verbatim} -#define MODULE_NAME MD2 /* Name of algorithm */ -#define DIGEST_SIZE 16 /* Size of resulting digest in bytes */ -\end{verbatim} - -The C structure must be named \ctype{hash_state}: - -\begin{verbatim} -typedef struct { - ... whatever state variables you need ... -} hash_state; -\end{verbatim} - -There are four functions that need to be written: to initialize the -algorithm's state, to hash a string into the algorithm's state, to get -a digest from the current state, and to copy a state. - -\begin{itemize} - \item \code{void hash_init(hash_state *self);} - \item \code{void hash_update(hash_state *self, unsigned char *buffer, int length);} - \item \code{PyObject *hash_digest(hash_state *self);} - \item \code{void hash_copy(hash_state *source, hash_state *dest);} -\end{itemize} - -Put \code{\#include "hash_template.c"} at the end of the file to -include the actual implementation of the module. - - -\subsection{Adding Block Encryption Algorithms} - -The required constant definitions are as follows: - -\begin{verbatim} -#define MODULE_NAME AES /* Name of algorithm */ -#define BLOCK_SIZE 16 /* Size of encryption block */ -#define KEY_SIZE 0 /* Size of key in bytes (0 if not fixed size) */ -\end{verbatim} - -The C structure must be named \ctype{block_state}: - -\begin{verbatim} -typedef struct { - ... whatever state variables you need ... -} block_state; -\end{verbatim} - -There are three functions that need to be written: to initialize the -algorithm's state, and to encrypt and decrypt a single block. - -\begin{itemize} - \item \code{void block_init(block_state *self, unsigned char *key, - int keylen);} - \item \code{void block_encrypt(block_state *self, unsigned char *in, - unsigned char *out);} - \item \code{void block_decrypt(block_state *self, unsigned char *in, - unsigned char *out);} -\end{itemize} - -Put \code{\#include "block_template.c"} at the end of the file to -include the actual implementation of the module. - - -\subsection{Adding Stream Encryption Algorithms} - -The required constant definitions are as follows: - -\begin{verbatim} -#define MODULE_NAME ARC4 /* Name of algorithm */ -#define BLOCK_SIZE 1 /* Will always be 1 for a stream cipher */ -#define KEY_SIZE 0 /* Size of key in bytes (0 if not fixed size) */ -\end{verbatim} - -The C structure must be named \ctype{stream_state}: - -\begin{verbatim} -typedef struct { - ... whatever state variables you need ... -} stream_state; -\end{verbatim} - -There are three functions that need to be written: to initialize the -algorithm's state, and to encrypt and decrypt a single block. - -\begin{itemize} - \item \code{void stream_init(stream_state *self, unsigned char *key, - int keylen);} - \item \code{void stream_encrypt(stream_state *self, unsigned char *block, - int length);} - \item \code{void stream_decrypt(stream_state *self, unsigned char *block, - int length);} -\end{itemize} - -Put \code{\#include "stream_template.c"} at the end of the file to -include the actual implementation of the module. - - -\end{document} -- cgit v1.2.1