diff options
author | Georg Brandl <georg@python.org> | 2012-03-17 16:58:12 +0100 |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2012-03-17 16:58:12 +0100 |
commit | ffd4830f6ab0942a463772a01a91f051bfa2d376 (patch) | |
tree | 14d5f982ed3d51197f0dd69370d3f6002aec8e1e /Doc/c-api | |
parent | 206a0cc1b0f2545eef212be6210b4bf7008815ba (diff) | |
parent | 5d6d9423be73e8c6f3f18fa3e006573fc5836236 (diff) | |
download | cpython-ffd4830f6ab0942a463772a01a91f051bfa2d376.tar.gz |
merge with 3.2
Diffstat (limited to 'Doc/c-api')
-rw-r--r-- | Doc/c-api/arg.rst | 10 | ||||
-rw-r--r-- | Doc/c-api/buffer.rst | 493 | ||||
-rw-r--r-- | Doc/c-api/datetime.rst | 25 | ||||
-rw-r--r-- | Doc/c-api/dict.rst | 7 | ||||
-rw-r--r-- | Doc/c-api/exceptions.rst | 210 | ||||
-rw-r--r-- | Doc/c-api/function.rst | 10 | ||||
-rw-r--r-- | Doc/c-api/import.rst | 50 | ||||
-rw-r--r-- | Doc/c-api/list.rst | 7 | ||||
-rw-r--r-- | Doc/c-api/long.rst | 14 | ||||
-rw-r--r-- | Doc/c-api/memoryview.rst | 31 | ||||
-rw-r--r-- | Doc/c-api/module.rst | 34 | ||||
-rw-r--r-- | Doc/c-api/object.rst | 29 | ||||
-rw-r--r-- | Doc/c-api/set.rst | 7 | ||||
-rw-r--r-- | Doc/c-api/type.rst | 4 | ||||
-rw-r--r-- | Doc/c-api/typeobj.rst | 102 | ||||
-rw-r--r-- | Doc/c-api/unicode.rst | 872 |
16 files changed, 1441 insertions, 464 deletions
diff --git a/Doc/c-api/arg.rst b/Doc/c-api/arg.rst index d4dda7c693..196aa772b8 100644 --- a/Doc/c-api/arg.rst +++ b/Doc/c-api/arg.rst @@ -146,7 +146,7 @@ Unless otherwise stated, buffers are not NUL-terminated. Like ``u#``, but the Python object may also be ``None``, in which case the :c:type:`Py_UNICODE` pointer is set to *NULL*. -``U`` (:class:`str`) [PyUnicodeObject \*] +``U`` (:class:`str`) [PyObject \*] Requires that the Python object is a Unicode object, without attempting any conversion. Raises :exc:`TypeError` if the object is not a Unicode object. The C variable may also be declared as :c:type:`PyObject\*`. @@ -260,9 +260,11 @@ Numbers ``n`` (:class:`int`) [Py_ssize_t] Convert a Python integer to a C :c:type:`Py_ssize_t`. -``c`` (:class:`bytes` of length 1) [char] - Convert a Python byte, represented as a :class:`bytes` object of length 1, - to a C :c:type:`char`. +``c`` (:class:`bytes` or :class:`bytearray` of length 1) [char] + Convert a Python byte, represented as a :class:`bytes` or + :class:`bytearray` object of length 1, to a C :c:type:`char`. + + .. versionchanged:: 3.3 Allow :class:`bytearray` objects ``C`` (:class:`str` of length 1) [int] Convert a Python character, represented as a :class:`str` object of diff --git a/Doc/c-api/buffer.rst b/Doc/c-api/buffer.rst index d98ece3e98..d636935754 100644 --- a/Doc/c-api/buffer.rst +++ b/Doc/c-api/buffer.rst @@ -7,6 +7,7 @@ Buffer Protocol .. sectionauthor:: Greg Stein <gstein@lyra.org> .. sectionauthor:: Benjamin Peterson +.. sectionauthor:: Stefan Krah .. index:: @@ -20,7 +21,7 @@ as image processing or numeric analysis. While each of these types have their own semantics, they share the common characteristic of being backed by a possibly large memory buffer. It is -then desireable, in some situations, to access that buffer directly and +then desirable, in some situations, to access that buffer directly and without intermediate copying. Python provides such a facility at the C level in the form of the *buffer @@ -60,8 +61,10 @@ isn't needed anymore. Failure to do so could lead to various issues such as resource leaks. -The buffer structure -==================== +.. _buffer-structure: + +Buffer structure +================ Buffer structures (or simply "buffers") are useful as a way to expose the binary data from another object to the Python programmer. They can also be @@ -78,249 +81,411 @@ allows them to be created and copied very simply. When a generic wrapper around a buffer is needed, a :ref:`memoryview <memoryview-objects>` object can be created. +For short instructions how to write an exporting object, see +:ref:`Buffer Object Structures <buffer-structs>`. For obtaining +a buffer, see :c:func:`PyObject_GetBuffer`. .. c:type:: Py_buffer - .. c:member:: void *buf + .. c:member:: void \*obj + + A new reference to the exporting object. The reference is owned by + the consumer and automatically decremented and set to *NULL* by + :c:func:`PyBuffer_Release`. The field is the equivalent of the return + value of any standard C-API function. + + As a special case, for *temporary* buffers that are wrapped by + :c:func:`PyMemoryView_FromBuffer` or :c:func:`PyBuffer_FillInfo` + this field is *NULL*. In general, exporting objects MUST NOT + use this scheme. - A pointer to the start of the memory for the object. + .. c:member:: void \*buf + + A pointer to the start of the logical structure described by the buffer + fields. This can be any location within the underlying physical memory + block of the exporter. For example, with negative :c:member:`~Py_buffer.strides` + the value may point to the end of the memory block. + + For contiguous arrays, the value points to the beginning of the memory + block. .. c:member:: Py_ssize_t len - :noindex: - The total length of the memory in bytes. + ``product(shape) * itemsize``. For contiguous arrays, this is the length + of the underlying memory block. For non-contiguous arrays, it is the length + that the logical structure would have if it were copied to a contiguous + representation. + + Accessing ``((char *)buf)[0] up to ((char *)buf)[len-1]`` is only valid + if the buffer has been obtained by a request that guarantees contiguity. In + most cases such a request will be :c:macro:`PyBUF_SIMPLE` or :c:macro:`PyBUF_WRITABLE`. .. c:member:: int readonly - An indicator of whether the buffer is read only. + An indicator of whether the buffer is read-only. This field is controlled + by the :c:macro:`PyBUF_WRITABLE` flag. + + .. c:member:: Py_ssize_t itemsize + + Item size in bytes of a single element. Same as the value of :func:`struct.calcsize` + called on non-NULL :c:member:`~Py_buffer.format` values. + + Important exception: If a consumer requests a buffer without the + :c:macro:`PyBUF_FORMAT` flag, :c:member:`~Py_Buffer.format` will + be set to *NULL*, but :c:member:`~Py_buffer.itemsize` still has + the value for the original format. + + If :c:member:`~Py_Buffer.shape` is present, the equality + ``product(shape) * itemsize == len`` still holds and the consumer + can use :c:member:`~Py_buffer.itemsize` to navigate the buffer. + + If :c:member:`~Py_Buffer.shape` is *NULL* as a result of a :c:macro:`PyBUF_SIMPLE` + or a :c:macro:`PyBUF_WRITABLE` request, the consumer must disregard + :c:member:`~Py_buffer.itemsize` and assume ``itemsize == 1``. - .. c:member:: const char *format - :noindex: + .. c:member:: const char \*format - A *NULL* terminated string in :mod:`struct` module style syntax giving - the contents of the elements available through the buffer. If this is - *NULL*, ``"B"`` (unsigned bytes) is assumed. + A *NUL* terminated string in :mod:`struct` module style syntax describing + the contents of a single item. If this is *NULL*, ``"B"`` (unsigned bytes) + is assumed. + + This field is controlled by the :c:macro:`PyBUF_FORMAT` flag. .. c:member:: int ndim - The number of dimensions the memory represents as a multi-dimensional - array. If it is 0, :c:data:`strides` and :c:data:`suboffsets` must be - *NULL*. - - .. c:member:: Py_ssize_t *shape - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim` giving the - shape of the memory as a multi-dimensional array. Note that - ``((*shape)[0] * ... * (*shape)[ndims-1])*itemsize`` should be equal to - :c:data:`len`. - - .. c:member:: Py_ssize_t *strides - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim` giving the - number of bytes to skip to get to a new element in each dimension. - - .. c:member:: Py_ssize_t *suboffsets - - An array of :c:type:`Py_ssize_t`\s the length of :c:data:`ndim`. If these - suboffset numbers are greater than or equal to 0, then the value stored - along the indicated dimension is a pointer and the suboffset value - dictates how many bytes to add to the pointer after de-referencing. A - suboffset value that it negative indicates that no de-referencing should - occur (striding in a contiguous memory block). - - Here is a function that returns a pointer to the element in an N-D array - pointed to by an N-dimensional index when there are both non-NULL strides - and suboffsets:: - - void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, - Py_ssize_t *suboffsets, Py_ssize_t *indices) { - char *pointer = (char*)buf; - int i; - for (i = 0; i < ndim; i++) { - pointer += strides[i] * indices[i]; - if (suboffsets[i] >=0 ) { - pointer = *((char**)pointer) + suboffsets[i]; - } - } - return (void*)pointer; - } + The number of dimensions the memory represents as an n-dimensional array. + If it is 0, :c:member:`~Py_Buffer.buf` points to a single item representing + a scalar. In this case, :c:member:`~Py_buffer.shape`, :c:member:`~Py_buffer.strides` + and :c:member:`~Py_buffer.suboffsets` MUST be *NULL*. + The macro :c:macro:`PyBUF_MAX_NDIM` limits the maximum number of dimensions + to 64. Exporters MUST respect this limit, consumers of multi-dimensional + buffers SHOULD be able to handle up to :c:macro:`PyBUF_MAX_NDIM` dimensions. - .. c:member:: Py_ssize_t itemsize + .. c:member:: Py_ssize_t \*shape + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim` + indicating the shape of the memory as an n-dimensional array. Note that + ``shape[0] * ... * shape[ndim-1] * itemsize`` MUST be equal to + :c:member:`~Py_buffer.len`. + + Shape values are restricted to ``shape[n] >= 0``. The case + ``shape[n] == 0`` requires special attention. See `complex arrays`_ + for further information. + + The shape array is read-only for the consumer. + + .. c:member:: Py_ssize_t \*strides + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim` + giving the number of bytes to skip to get to a new element in each + dimension. + + Stride values can be any integer. For regular arrays, strides are + usually positive, but a consumer MUST be able to handle the case + ``strides[n] <= 0``. See `complex arrays`_ for further information. + + The strides array is read-only for the consumer. + + .. c:member:: Py_ssize_t \*suboffsets + + An array of :c:type:`Py_ssize_t` of length :c:member:`~Py_buffer.ndim`. + If ``suboffsets[n] >= 0``, the values stored along the nth dimension are + pointers and the suboffset value dictates how many bytes to add to each + pointer after de-referencing. A suboffset value that is negative + indicates that no de-referencing should occur (striding in a contiguous + memory block). - This is a storage for the itemsize (in bytes) of each element of the - shared memory. It is technically un-necessary as it can be obtained - using :c:func:`PyBuffer_SizeFromFormat`, however an exporter may know - this information without parsing the format string and it is necessary - to know the itemsize for proper interpretation of striding. Therefore, - storing it is more convenient and faster. + This type of array representation is used by the Python Imaging Library + (PIL). See `complex arrays`_ for further information how to access elements + of such an array. - .. c:member:: void *internal + The suboffsets array is read-only for the consumer. + + .. c:member:: void \*internal This is for use internally by the exporting object. For example, this might be re-cast as an integer by the exporter and used to store flags about whether or not the shape, strides, and suboffsets arrays must be - freed when the buffer is released. The consumer should never alter this + freed when the buffer is released. The consumer MUST NOT alter this value. +.. _buffer-request-types: -Buffer-related functions -======================== +Buffer request types +==================== +Buffers are usually obtained by sending a buffer request to an exporting +object via :c:func:`PyObject_GetBuffer`. Since the complexity of the logical +structure of the memory can vary drastically, the consumer uses the *flags* +argument to specify the exact buffer type it can handle. -.. c:function:: int PyObject_CheckBuffer(PyObject *obj) +All :c:data:`Py_buffer` fields are unambiguously defined by the request +type. + +request-independent fields +~~~~~~~~~~~~~~~~~~~~~~~~~~ +The following fields are not influenced by *flags* and must always be filled in +with the correct values: :c:member:`~Py_buffer.obj`, :c:member:`~Py_buffer.buf`, +:c:member:`~Py_buffer.len`, :c:member:`~Py_buffer.itemsize`, :c:member:`~Py_buffer.ndim`. - Return 1 if *obj* supports the buffer interface otherwise 0. When 1 is - returned, it doesn't guarantee that :c:func:`PyObject_GetBuffer` will - succeed. +readonly, format +~~~~~~~~~~~~~~~~ -.. c:function:: int PyObject_GetBuffer(PyObject *obj, Py_buffer *view, int flags) + .. c:macro:: PyBUF_WRITABLE - Export a view over some internal data from the target object *obj*. - *obj* must not be NULL, and *view* must point to an existing - :c:type:`Py_buffer` structure allocated by the caller (most uses of - this function will simply declare a local variable of type - :c:type:`Py_buffer`). The *flags* argument is a bit field indicating - what kind of buffer is requested. The buffer interface allows - for complicated memory layout possibilities; however, some callers - won't want to handle all the complexity and instead request a simple - view of the target object (using :c:macro:`PyBUF_SIMPLE` for a read-only - view and :c:macro:`PyBUF_WRITABLE` for a read-write view). + Controls the :c:member:`~Py_buffer.readonly` field. If set, the exporter + MUST provide a writable buffer or else report failure. Otherwise, the + exporter MAY provide either a read-only or writable buffer, but the choice + MUST be consistent for all consumers. - Some exporters may not be able to share memory in every possible way and - may need to raise errors to signal to some consumers that something is - just not possible. These errors should be a :exc:`BufferError` unless - there is another error that is actually causing the problem. The - exporter can use flags information to simplify how much of the - :c:data:`Py_buffer` structure is filled in with non-default values and/or - raise an error if the object can't support a simpler view of its memory. + .. c:macro:: PyBUF_FORMAT - On success, 0 is returned and the *view* structure is filled with useful - values. On error, -1 is returned and an exception is raised; the *view* - is left in an undefined state. + Controls the :c:member:`~Py_buffer.format` field. If set, this field MUST + be filled in correctly. Otherwise, this field MUST be *NULL*. - The following are the possible values to the *flags* arguments. - .. c:macro:: PyBUF_SIMPLE +:c:macro:`PyBUF_WRITABLE` can be \|'d to any of the flags in the next section. +Since :c:macro:`PyBUF_SIMPLE` is defined as 0, :c:macro:`PyBUF_WRITABLE` +can be used as a stand-alone flag to request a simple writable buffer. - This is the default flag. The returned buffer exposes a read-only - memory area. The format of data is assumed to be raw unsigned bytes, - without any particular structure. This is a "stand-alone" flag - constant. It never needs to be '|'d to the others. The exporter will - raise an error if it cannot provide such a contiguous buffer of bytes. +:c:macro:`PyBUF_FORMAT` can be \|'d to any of the flags except :c:macro:`PyBUF_SIMPLE`. +The latter already implies format ``B`` (unsigned bytes). - .. c:macro:: PyBUF_WRITABLE - Like :c:macro:`PyBUF_SIMPLE`, but the returned buffer is writable. If - the exporter doesn't support writable buffers, an error is raised. +shape, strides, suboffsets +~~~~~~~~~~~~~~~~~~~~~~~~~~ - .. c:macro:: PyBUF_STRIDES +The flags that control the logical structure of the memory are listed +in decreasing order of complexity. Note that each flag contains all bits +of the flags below it. - This implies :c:macro:`PyBUF_ND`. The returned buffer must provide - strides information (i.e. the strides cannot be NULL). This would be - used when the consumer can handle strided, discontiguous arrays. - Handling strides automatically assumes you can handle shape. The - exporter can raise an error if a strided representation of the data is - not possible (i.e. without the suboffsets). - .. c:macro:: PyBUF_ND ++-----------------------------+-------+---------+------------+ +| Request | shape | strides | suboffsets | ++=============================+=======+=========+============+ +| .. c:macro:: PyBUF_INDIRECT | yes | yes | if needed | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_STRIDES | yes | yes | NULL | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_ND | yes | NULL | NULL | ++-----------------------------+-------+---------+------------+ +| .. c:macro:: PyBUF_SIMPLE | NULL | NULL | NULL | ++-----------------------------+-------+---------+------------+ - The returned buffer must provide shape information. The memory will be - assumed C-style contiguous (last dimension varies the fastest). The - exporter may raise an error if it cannot provide this kind of - contiguous buffer. If this is not given then shape will be *NULL*. - .. c:macro:: PyBUF_C_CONTIGUOUS - PyBUF_F_CONTIGUOUS - PyBUF_ANY_CONTIGUOUS +contiguity requests +~~~~~~~~~~~~~~~~~~~ - These flags indicate that the contiguity returned buffer must be - respectively, C-contiguous (last dimension varies the fastest), Fortran - contiguous (first dimension varies the fastest) or either one. All of - these flags imply :c:macro:`PyBUF_STRIDES` and guarantee that the - strides buffer info structure will be filled in correctly. +C or Fortran contiguity can be explicitly requested, with and without stride +information. Without stride information, the buffer must be C-contiguous. - .. c:macro:: PyBUF_INDIRECT ++-----------------------------------+-------+---------+------------+--------+ +| Request | shape | strides | suboffsets | contig | ++===================================+=======+=========+============+========+ +| .. c:macro:: PyBUF_C_CONTIGUOUS | yes | yes | NULL | C | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_F_CONTIGUOUS | yes | yes | NULL | F | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_ANY_CONTIGUOUS | yes | yes | NULL | C or F | ++-----------------------------------+-------+---------+------------+--------+ +| .. c:macro:: PyBUF_ND | yes | NULL | NULL | C | ++-----------------------------------+-------+---------+------------+--------+ - This flag indicates the returned buffer must have suboffsets - information (which can be NULL if no suboffsets are needed). This can - be used when the consumer can handle indirect array referencing implied - by these suboffsets. This implies :c:macro:`PyBUF_STRIDES`. - .. c:macro:: PyBUF_FORMAT +compound requests +~~~~~~~~~~~~~~~~~ - The returned buffer must have true format information if this flag is - provided. This would be used when the consumer is going to be checking - for what 'kind' of data is actually stored. An exporter should always - be able to provide this information if requested. If format is not - explicitly requested then the format must be returned as *NULL* (which - means ``'B'``, or unsigned bytes). +All possible requests are fully defined by some combination of the flags in +the previous section. For convenience, the buffer protocol provides frequently +used combinations as single flags. - .. c:macro:: PyBUF_STRIDED +In the following table *U* stands for undefined contiguity. The consumer would +have to call :c:func:`PyBuffer_IsContiguous` to determine contiguity. - This is equivalent to ``(PyBUF_STRIDES | PyBUF_WRITABLE)``. - .. c:macro:: PyBUF_STRIDED_RO - This is equivalent to ``(PyBUF_STRIDES)``. ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| Request | shape | strides | suboffsets | contig | readonly | format | ++===============================+=======+=========+============+========+==========+========+ +| .. c:macro:: PyBUF_FULL | yes | yes | if needed | U | 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_FULL_RO | yes | yes | if needed | U | 1 or 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_RECORDS | yes | yes | NULL | U | 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_RECORDS_RO | yes | yes | NULL | U | 1 or 0 | yes | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_STRIDED | yes | yes | NULL | U | 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_STRIDED_RO | yes | yes | NULL | U | 1 or 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_CONTIG | yes | NULL | NULL | C | 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ +| .. c:macro:: PyBUF_CONTIG_RO | yes | NULL | NULL | C | 1 or 0 | NULL | ++-------------------------------+-------+---------+------------+--------+----------+--------+ - .. c:macro:: PyBUF_RECORDS - This is equivalent to ``(PyBUF_STRIDES | PyBUF_FORMAT | - PyBUF_WRITABLE)``. +Complex arrays +============== - .. c:macro:: PyBUF_RECORDS_RO +NumPy-style: shape and strides +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The logical structure of NumPy-style arrays is defined by :c:member:`~Py_buffer.itemsize`, +:c:member:`~Py_buffer.ndim`, :c:member:`~Py_buffer.shape` and :c:member:`~Py_buffer.strides`. + +If ``ndim == 0``, the memory location pointed to by :c:member:`~Py_buffer.buf` is +interpreted as a scalar of size :c:member:`~Py_buffer.itemsize`. In that case, +both :c:member:`~Py_buffer.shape` and :c:member:`~Py_buffer.strides` are *NULL*. + +If :c:member:`~Py_buffer.strides` is *NULL*, the array is interpreted as +a standard n-dimensional C-array. Otherwise, the consumer must access an +n-dimensional array as follows: + + ``ptr = (char *)buf + indices[0] * strides[0] + ... + indices[n-1] * strides[n-1]`` + ``item = *((typeof(item) *)ptr);`` + + +As noted above, :c:member:`~Py_buffer.buf` can point to any location within +the actual memory block. An exporter can check the validity of a buffer with +this function: + +.. code-block:: python + + def verify_structure(memlen, itemsize, ndim, shape, strides, offset): + """Verify that the parameters represent a valid array within + the bounds of the allocated memory: + char *mem: start of the physical memory block + memlen: length of the physical memory block + offset: (char *)buf - mem + """ + if offset % itemsize: + return False + if offset < 0 or offset+itemsize > memlen: + return False + if any(v % itemsize for v in strides): + return False + + if ndim <= 0: + return ndim == 0 and not shape and not strides + if 0 in shape: + return True + + imin = sum(strides[j]*(shape[j]-1) for j in range(ndim) + if strides[j] <= 0) + imax = sum(strides[j]*(shape[j]-1) for j in range(ndim) + if strides[j] > 0) + + return 0 <= offset+imin and offset+imax+itemsize <= memlen + + +PIL-style: shape, strides and suboffsets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In addition to the regular items, PIL-style arrays can contain pointers +that must be followed in order to get to the next element in a dimension. +For example, the regular three-dimensional C-array ``char v[2][2][3]`` can +also be viewed as an array of 2 pointers to 2 two-dimensional arrays: +``char (*v[2])[2][3]``. In suboffsets representation, those two pointers +can be embedded at the start of :c:member:`~Py_buffer.buf`, pointing +to two ``char x[2][3]`` arrays that can be located anywhere in memory. + + +Here is a function that returns a pointer to the element in an N-D array +pointed to by an N-dimensional index when there are both non-NULL strides +and suboffsets:: + + void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, + Py_ssize_t *suboffsets, Py_ssize_t *indices) { + char *pointer = (char*)buf; + int i; + for (i = 0; i < ndim; i++) { + pointer += strides[i] * indices[i]; + if (suboffsets[i] >=0 ) { + pointer = *((char**)pointer) + suboffsets[i]; + } + } + return (void*)pointer; + } - This is equivalent to ``(PyBUF_STRIDES | PyBUF_FORMAT)``. - .. c:macro:: PyBUF_FULL +Buffer-related functions +======================== - This is equivalent to ``(PyBUF_INDIRECT | PyBUF_FORMAT | - PyBUF_WRITABLE)``. +.. c:function:: int PyObject_CheckBuffer(PyObject *obj) - .. c:macro:: PyBUF_FULL_RO + Return 1 if *obj* supports the buffer interface otherwise 0. When 1 is + returned, it doesn't guarantee that :c:func:`PyObject_GetBuffer` will + succeed. - This is equivalent to ``(PyBUF_INDIRECT | PyBUF_FORMAT)``. - .. c:macro:: PyBUF_CONTIG +.. c:function:: int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) - This is equivalent to ``(PyBUF_ND | PyBUF_WRITABLE)``. + Send a request to *exporter* to fill in *view* as specified by *flags*. + If the exporter cannot provide a buffer of the exact type, it MUST raise + :c:data:`PyExc_BufferError`, set :c:member:`view->obj` to *NULL* and + return -1. - .. c:macro:: PyBUF_CONTIG_RO + On success, fill in *view*, set :c:member:`view->obj` to a new reference + to *exporter* and return 0. In the case of chained buffer providers + that redirect requests to a single object, :c:member:`view->obj` MAY + refer to this object instead of *exporter* (See :ref:`Buffer Object Structures <buffer-structs>`). - This is equivalent to ``(PyBUF_ND)``. + Successful calls to :c:func:`PyObject_GetBuffer` must be paired with calls + to :c:func:`PyBuffer_Release`, similar to :c:func:`malloc` and :c:func:`free`. + Thus, after the consumer is done with the buffer, :c:func:`PyBuffer_Release` + must be called exactly once. .. c:function:: void PyBuffer_Release(Py_buffer *view) - Release the buffer *view*. This should be called when the buffer is no - longer being used as it may free memory from it. + Release the buffer *view* and decrement the reference count for + :c:member:`view->obj`. This function MUST be called when the buffer + is no longer being used, otherwise reference leaks may occur. + + It is an error to call this function on a buffer that was not obtained via + :c:func:`PyObject_GetBuffer`. .. c:function:: Py_ssize_t PyBuffer_SizeFromFormat(const char *) - Return the implied :c:data:`~Py_buffer.itemsize` from the struct-stype - :c:data:`~Py_buffer.format`. + Return the implied :c:data:`~Py_buffer.itemsize` from :c:data:`~Py_buffer.format`. + This function is not yet implemented. -.. c:function:: int PyBuffer_IsContiguous(Py_buffer *view, char fortran) +.. c:function:: int PyBuffer_IsContiguous(Py_buffer *view, char order) - Return 1 if the memory defined by the *view* is C-style (*fortran* is - ``'C'``) or Fortran-style (*fortran* is ``'F'``) contiguous or either one - (*fortran* is ``'A'``). Return 0 otherwise. + Return 1 if the memory defined by the *view* is C-style (*order* is + ``'C'``) or Fortran-style (*order* is ``'F'``) contiguous or either one + (*order* is ``'A'``). Return 0 otherwise. -.. c:function:: void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t itemsize, char fortran) +.. c:function:: void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t itemsize, char order) Fill the *strides* array with byte-strides of a contiguous (C-style if - *fortran* is ``'C'`` or Fortran-style if *fortran* is ``'F'``) array of the + *order* is ``'C'`` or Fortran-style if *order* is ``'F'``) array of the given shape with the given number of bytes per element. -.. c:function:: int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len, int readonly, int infoflags) +.. c:function:: int PyBuffer_FillInfo(Py_buffer *view, PyObject *exporter, void *buf, Py_ssize_t len, int readonly, int flags) + + Handle buffer requests for an exporter that wants to expose *buf* of size *len* + with writability set according to *readonly*. *buf* is interpreted as a sequence + of unsigned bytes. + + The *flags* argument indicates the request type. This function always fills in + *view* as specified by flags, unless *buf* has been designated as read-only + and :c:macro:`PyBUF_WRITABLE` is set in *flags*. + + On success, set :c:member:`view->obj` to a new reference to *exporter* and + return 0. Otherwise, raise :c:data:`PyExc_BufferError`, set + :c:member:`view->obj` to *NULL* and return -1; + + If this function is used as part of a :ref:`getbufferproc <buffer-structs>`, + *exporter* MUST be set to the exporting object. Otherwise, *exporter* MUST + be NULL. + - Fill in a buffer-info structure, *view*, correctly for an exporter that can - only share a contiguous chunk of memory of "unsigned bytes" of the given - length. Return 0 on success and -1 (with raising an error) on error. diff --git a/Doc/c-api/datetime.rst b/Doc/c-api/datetime.rst index fcd13954a3..39542bd17a 100644 --- a/Doc/c-api/datetime.rst +++ b/Doc/c-api/datetime.rst @@ -170,6 +170,31 @@ and the type is not checked: Return the microsecond, as an int from 0 through 999999. +Macros to extract fields from time delta objects. The argument must be an +instance of :c:data:`PyDateTime_Delta`, including subclasses. The argument must +not be *NULL*, and the type is not checked: + +.. c:function:: int PyDateTime_DELTA_GET_DAYS(PyDateTime_Delta *o) + + Return the number of days, as an int from -999999999 to 999999999. + + .. versionadded:: 3.3 + + +.. c:function:: int PyDateTime_DELTA_GET_SECONDS(PyDateTime_Delta *o) + + Return the number of seconds, as an int from 0 through 86399. + + .. versionadded:: 3.3 + + +.. c:function:: int PyDateTime_DELTA_GET_MICROSECOND(PyDateTime_Delta *o) + + Return the number of microseconds, as an int from 0 through 999999. + + .. versionadded:: 3.3 + + Macros for the convenience of modules implementing the DB API: .. c:function:: PyObject* PyDateTime_FromTimestamp(PyObject *args) diff --git a/Doc/c-api/dict.rst b/Doc/c-api/dict.rst index 6df84e075a..ac714a6911 100644 --- a/Doc/c-api/dict.rst +++ b/Doc/c-api/dict.rst @@ -209,3 +209,10 @@ Dictionary Objects for key, value in seq2: if override or key not in a: a[key] = value + + +.. c:function:: int PyDict_ClearFreeList() + + Clear the free list. Return the total number of freed items. + + .. versionadded:: 3.3 diff --git a/Doc/c-api/exceptions.rst b/Doc/c-api/exceptions.rst index 6f13c8035a..fd7aee7460 100644 --- a/Doc/c-api/exceptions.rst +++ b/Doc/c-api/exceptions.rst @@ -421,17 +421,24 @@ Exception Objects .. c:function:: PyObject* PyException_GetCause(PyObject *ex) - Return the cause (another exception instance set by ``raise ... from ...``) - associated with the exception as a new reference, as accessible from Python - through :attr:`__cause__`. If there is no cause associated, this returns - *NULL*. + Return the cause (either an exception instance, or :const:`None`, + set by ``raise ... from ...``) associated with the exception as a new + reference, as accessible from Python through :attr:`__cause__`. + + If there is no cause associated, this returns *NULL* (from Python + ``__cause__ is Ellipsis``). If the cause is :const:`None`, the default + exception display routines stop showing the context chain. .. c:function:: void PyException_SetCause(PyObject *ex, PyObject *ctx) Set the cause associated with the exception to *ctx*. Use *NULL* to clear - it. There is no type check to make sure that *ctx* is an exception instance. - This steals a reference to *ctx*. + it. There is no type check to make sure that *ctx* is either an exception + instance or :const:`None`. This steals a reference to *ctx*. + + If the cause is set to :const:`None` the default exception display + routines will not display this exception's context, and will not follow the + chain any further. .. _unicodeexceptions: @@ -525,7 +532,7 @@ recursion depth automatically). Marks a point where a recursive C-level call is about to be performed. - If :const:`USE_STACKCHECK` is defined, this function checks if the the OS + If :const:`USE_STACKCHECK` is defined, this function checks if the OS stack overflowed using :c:func:`PyOS_CheckStack`. In this is the case, it sets a :exc:`MemoryError` and returns a nonzero value. @@ -582,65 +589,116 @@ All standard Python exceptions are available as global variables whose names are :c:type:`PyObject\*`; they are all class objects. For completeness, here are all the variables: -+-------------------------------------+----------------------------+----------+ -| C Name | Python Name | Notes | -+=====================================+============================+==========+ -| :c:data:`PyExc_BaseException` | :exc:`BaseException` | \(1) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_Exception` | :exc:`Exception` | \(1) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_ArithmeticError` | :exc:`ArithmeticError` | \(1) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_LookupError` | :exc:`LookupError` | \(1) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_AssertionError` | :exc:`AssertionError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_AttributeError` | :exc:`AttributeError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_EOFError` | :exc:`EOFError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_EnvironmentError` | :exc:`EnvironmentError` | \(1) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_FloatingPointError` | :exc:`FloatingPointError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_IOError` | :exc:`IOError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_ImportError` | :exc:`ImportError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_IndexError` | :exc:`IndexError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_KeyError` | :exc:`KeyError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_KeyboardInterrupt` | :exc:`KeyboardInterrupt` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_MemoryError` | :exc:`MemoryError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_NameError` | :exc:`NameError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_NotImplementedError` | :exc:`NotImplementedError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_OSError` | :exc:`OSError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_OverflowError` | :exc:`OverflowError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_ReferenceError` | :exc:`ReferenceError` | \(2) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_RuntimeError` | :exc:`RuntimeError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_SyntaxError` | :exc:`SyntaxError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_SystemError` | :exc:`SystemError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_SystemExit` | :exc:`SystemExit` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_TypeError` | :exc:`TypeError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_ValueError` | :exc:`ValueError` | | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_WindowsError` | :exc:`WindowsError` | \(3) | -+-------------------------------------+----------------------------+----------+ -| :c:data:`PyExc_ZeroDivisionError` | :exc:`ZeroDivisionError` | | -+-------------------------------------+----------------------------+----------+ ++-----------------------------------------+---------------------------------+----------+ +| C Name | Python Name | Notes | ++=========================================+=================================+==========+ +| :c:data:`PyExc_BaseException` | :exc:`BaseException` | \(1) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_Exception` | :exc:`Exception` | \(1) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ArithmeticError` | :exc:`ArithmeticError` | \(1) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_LookupError` | :exc:`LookupError` | \(1) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_AssertionError` | :exc:`AssertionError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_AttributeError` | :exc:`AttributeError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_BlockingIOError` | :exc:`BlockingIOError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_BrokenPipeError` | :exc:`BrokenPipeError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ChildProcessError` | :exc:`ChildProcessError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ConnectionError` | :exc:`ConnectionError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ConnectionAbortedError` | :exc:`ConnectionAbortedError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ConnectionRefusedError` | :exc:`ConnectionRefusedError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ConnectionResetError` | :exc:`ConnectionResetError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_FileExistsError` | :exc:`FileExistsError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_FileNotFoundError` | :exc:`FileNotFoundError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_EOFError` | :exc:`EOFError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_FloatingPointError` | :exc:`FloatingPointError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ImportError` | :exc:`ImportError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_IndexError` | :exc:`IndexError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_InterruptedError` | :exc:`InterruptedError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_IsADirectoryError` | :exc:`IsADirectoryError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_KeyError` | :exc:`KeyError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_KeyboardInterrupt` | :exc:`KeyboardInterrupt` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_MemoryError` | :exc:`MemoryError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_NameError` | :exc:`NameError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_NotADirectoryError` | :exc:`NotADirectoryError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_NotImplementedError` | :exc:`NotImplementedError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_OSError` | :exc:`OSError` | \(1) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_OverflowError` | :exc:`OverflowError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_PermissionError` | :exc:`PermissionError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ProcessLookupError` | :exc:`ProcessLookupError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ReferenceError` | :exc:`ReferenceError` | \(2) | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_RuntimeError` | :exc:`RuntimeError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_SyntaxError` | :exc:`SyntaxError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_SystemError` | :exc:`SystemError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_TimeoutError` | :exc:`TimeoutError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_SystemExit` | :exc:`SystemExit` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_TypeError` | :exc:`TypeError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ValueError` | :exc:`ValueError` | | ++-----------------------------------------+---------------------------------+----------+ +| :c:data:`PyExc_ZeroDivisionError` | :exc:`ZeroDivisionError` | | ++-----------------------------------------+---------------------------------+----------+ + +.. versionadded:: 3.3 + :c:data:`PyExc_BlockingIOError`, :c:data:`PyExc_BrokenPipeError`, + :c:data:`PyExc_ChildProcessError`, :c:data:`PyExc_ConnectionError`, + :c:data:`PyExc_ConnectionAbortedError`, :c:data:`PyExc_ConnectionRefusedError`, + :c:data:`PyExc_ConnectionResetError`, :c:data:`PyExc_FileExistsError`, + :c:data:`PyExc_FileNotFoundError`, :c:data:`PyExc_InterruptedError`, + :c:data:`PyExc_IsADirectoryError`, :c:data:`PyExc_NotADirectoryError`, + :c:data:`PyExc_PermissionError`, :c:data:`PyExc_ProcessLookupError` + and :c:data:`PyExc_TimeoutError` were introduced following :pep:`3151`. + + +These are compatibility aliases to :c:data:`PyExc_OSError`: + ++-------------------------------------+----------+ +| C Name | Notes | ++=====================================+==========+ +| :c:data:`PyExc_EnvironmentError` | | ++-------------------------------------+----------+ +| :c:data:`PyExc_IOError` | | ++-------------------------------------+----------+ +| :c:data:`PyExc_WindowsError` | \(3) | ++-------------------------------------+----------+ + +.. versionchanged:: 3.3 + These aliases used to be separate exception types. + .. index:: single: PyExc_BaseException @@ -649,28 +707,42 @@ the variables: single: PyExc_LookupError single: PyExc_AssertionError single: PyExc_AttributeError + single: PyExc_BlockingIOError + single: PyExc_BrokenPipeError + single: PyExc_ConnectionError + single: PyExc_ConnectionAbortedError + single: PyExc_ConnectionRefusedError + single: PyExc_ConnectionResetError single: PyExc_EOFError - single: PyExc_EnvironmentError + single: PyExc_FileExistsError + single: PyExc_FileNotFoundError single: PyExc_FloatingPointError - single: PyExc_IOError single: PyExc_ImportError single: PyExc_IndexError + single: PyExc_InterruptedError + single: PyExc_IsADirectoryError single: PyExc_KeyError single: PyExc_KeyboardInterrupt single: PyExc_MemoryError single: PyExc_NameError + single: PyExc_NotADirectoryError single: PyExc_NotImplementedError single: PyExc_OSError single: PyExc_OverflowError + single: PyExc_PermissionError + single: PyExc_ProcessLookupError single: PyExc_ReferenceError single: PyExc_RuntimeError single: PyExc_SyntaxError single: PyExc_SystemError single: PyExc_SystemExit + single: PyExc_TimeoutError single: PyExc_TypeError single: PyExc_ValueError - single: PyExc_WindowsError single: PyExc_ZeroDivisionError + single: PyExc_EnvironmentError + single: PyExc_IOError + single: PyExc_WindowsError Notes: diff --git a/Doc/c-api/function.rst b/Doc/c-api/function.rst index 31805fd0ad..ad9832233f 100644 --- a/Doc/c-api/function.rst +++ b/Doc/c-api/function.rst @@ -38,6 +38,16 @@ There are a few functions specific to Python functions. object, the argument defaults and closure are set to *NULL*. +.. c:function:: PyObject* PyFunction_NewWithQualName(PyObject *code, PyObject *globals, PyObject *qualname) + + As :c:func:`PyFunction_New`, but also allows to set the function object's + ``__qualname__`` attribute. *qualname* should be a unicode object or NULL; + if NULL, the ``__qualname__`` attribute is set to the same value as its + ``__name__`` attribute. + + .. versionadded:: 3.3 + + .. c:function:: PyObject* PyFunction_GetCode(PyObject *op) Return the code object associated with the function object *op*. diff --git a/Doc/c-api/import.rst b/Doc/c-api/import.rst index cf48363e10..b168751199 100644 --- a/Doc/c-api/import.rst +++ b/Doc/c-api/import.rst @@ -57,7 +57,7 @@ Importing Modules :c:func:`PyImport_ImportModule`. -.. c:function:: PyObject* PyImport_ImportModuleLevel(char *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) +.. c:function:: PyObject* PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) Import a module. This is best described by referring to the built-in Python function :func:`__import__`, as the standard :func:`__import__` function calls @@ -68,6 +68,13 @@ Importing Modules the return value when a submodule of a package was requested is normally the top-level package, unless a non-empty *fromlist* was given. + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyImport_ImportModuleLevel(char *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) + + Similar to :c:func:`PyImport_ImportModuleLevelObject`, but the name is an + UTF-8 encoded string instead of a Unicode object. .. c:function:: PyObject* PyImport_Import(PyObject *name) @@ -86,7 +93,7 @@ Importing Modules an exception set on failure (the module still exists in this case). -.. c:function:: PyObject* PyImport_AddModule(const char *name) +.. c:function:: PyObject* PyImport_AddModuleObject(PyObject *name) Return the module object corresponding to a module name. The *name* argument may be of the form ``package.module``. First check the modules dictionary if @@ -100,6 +107,14 @@ Importing Modules or one of its variants to import a module. Package structures implied by a dotted name for *name* are not created if not already present. + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyImport_AddModule(const char *name) + + Similar to :c:func:`PyImport_AddModuleObject`, but the name is a UTF-8 + encoded string instead of a Unicode object. + .. c:function:: PyObject* PyImport_ExecCodeModule(char *name, PyObject *co) @@ -136,14 +151,23 @@ Importing Modules See also :c:func:`PyImport_ExecCodeModuleWithPathnames`. -.. c:function:: PyObject* PyImport_ExecCodeModuleWithPathnames(char *name, PyObject *co, char *pathname, char *cpathname) +.. c:function:: PyObject* PyImport_ExecCodeModuleObject(PyObject *name, PyObject *co, PyObject *pathname, PyObject *cpathname) Like :c:func:`PyImport_ExecCodeModuleEx`, but the :attr:`__cached__` attribute of the module object is set to *cpathname* if it is non-``NULL``. Of the three functions, this is the preferred one to use. + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyImport_ExecCodeModuleWithPathnames(char *name, PyObject *co, char *pathname, char *cpathname) + + Like :c:func:`PyImport_ExecCodeModuleObject`, but *name*, *pathname* and + *cpathname* are UTF-8 encoded strings. + .. versionadded:: 3.2 + .. c:function:: long PyImport_GetMagicNumber() Return the magic number for Python bytecode files (a.k.a. :file:`.pyc` and @@ -200,7 +224,7 @@ Importing Modules For internal use only. -.. c:function:: int PyImport_ImportFrozenModule(char *name) +.. c:function:: int PyImport_ImportFrozenModuleObject(PyObject *name) Load a frozen module named *name*. Return ``1`` for success, ``0`` if the module is not found, and ``-1`` with an exception set if the initialization @@ -208,6 +232,14 @@ Importing Modules :c:func:`PyImport_ImportModule`. (Note the misnomer --- this function would reload the module if it was already imported.) + .. versionadded:: 3.3 + + +.. c:function:: int PyImport_ImportFrozenModule(char *name) + + Similar to :c:func:`PyImport_ImportFrozenModuleObject`, but the name is a + UTF-8 encoded string instead of a Unicode object. + .. c:type:: struct _frozen @@ -247,13 +279,13 @@ Importing Modules Structure describing a single entry in the list of built-in modules. Each of these structures gives the name and initialization function for a module built - into the interpreter. Programs which embed Python may use an array of these - structures in conjunction with :c:func:`PyImport_ExtendInittab` to provide - additional built-in modules. The structure is defined in - :file:`Include/import.h` as:: + into the interpreter. The name is an ASCII encoded string. Programs which + embed Python may use an array of these structures in conjunction with + :c:func:`PyImport_ExtendInittab` to provide additional built-in modules. + The structure is defined in :file:`Include/import.h` as:: struct _inittab { - char *name; + char *name; /* ASCII encoded string */ PyObject* (*initfunc)(void); }; diff --git a/Doc/c-api/list.rst b/Doc/c-api/list.rst index feb9015b6d..5b263a7b1c 100644 --- a/Doc/c-api/list.rst +++ b/Doc/c-api/list.rst @@ -142,3 +142,10 @@ List Objects Return a new tuple object containing the contents of *list*; equivalent to ``tuple(list)``. + + +.. c:function:: int PyList_ClearFreeList() + + Clear the free list. Return the total number of freed items. + + .. versionadded:: 3.3 diff --git a/Doc/c-api/long.rst b/Doc/c-api/long.rst index b2295e0fad..4c295fab49 100644 --- a/Doc/c-api/long.rst +++ b/Doc/c-api/long.rst @@ -100,6 +100,20 @@ All integers are implemented as "long" integer objects of arbitrary size. string is first encoded to a byte string using :c:func:`PyUnicode_EncodeDecimal` and then converted using :c:func:`PyLong_FromString`. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyLong_FromUnicodeObject`. + + +.. c:function:: PyObject* PyLong_FromUnicodeObject(PyObject *u, int base) + + Convert a sequence of Unicode digits in the string *u* to a Python integer + value. The Unicode string is first encoded to a byte string using + :c:func:`PyUnicode_EncodeDecimal` and then converted using + :c:func:`PyLong_FromString`. + + .. versionadded:: 3.3 + .. c:function:: PyObject* PyLong_FromVoidPtr(void *p) diff --git a/Doc/c-api/memoryview.rst b/Doc/c-api/memoryview.rst index 6b49cdf70a..5e50977cbe 100644 --- a/Doc/c-api/memoryview.rst +++ b/Doc/c-api/memoryview.rst @@ -17,16 +17,21 @@ any other object. Create a memoryview object from an object that provides the buffer interface. If *obj* supports writable buffer exports, the memoryview object will be - readable and writable, otherwise it will be read-only. + read/write, otherwise it may be either read-only or read/write at the + discretion of the exporter. +.. c:function:: PyObject *PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags) + + Create a memoryview object using *mem* as the underlying buffer. + *flags* can be one of :c:macro:`PyBUF_READ` or :c:macro:`PyBUF_WRITE`. + + .. versionadded:: 3.3 .. c:function:: PyObject *PyMemoryView_FromBuffer(Py_buffer *view) Create a memoryview object wrapping the given buffer structure *view*. - The memoryview object then owns the buffer represented by *view*, which - means you shouldn't try to call :c:func:`PyBuffer_Release` yourself: it - will be done on deallocation of the memoryview object. - + For simple byte buffers, :c:func:`PyMemoryView_FromMemory` is the preferred + function. .. c:function:: PyObject *PyMemoryView_GetContiguous(PyObject *obj, int buffertype, char order) @@ -43,10 +48,16 @@ any other object. currently allowed to create subclasses of :class:`memoryview`. -.. c:function:: Py_buffer *PyMemoryView_GET_BUFFER(PyObject *obj) +.. c:function:: Py_buffer *PyMemoryView_GET_BUFFER(PyObject *mview) + + Return a pointer to the memoryview's private copy of the exporter's buffer. + *mview* **must** be a memoryview instance; this macro doesn't check its type, + you must do it yourself or you will risk crashes. + +.. c:function:: Py_buffer *PyMemoryView_GET_BASE(PyObject *mview) - Return a pointer to the buffer structure wrapped by the given - memoryview object. The object **must** be a memoryview instance; - this macro doesn't check its type, you must do it yourself or you - will risk crashes. + Return either a pointer to the exporting object that the memoryview is based + on or *NULL* if the memoryview has been created by one of the functions + :c:func:`PyMemoryView_FromMemory` or :c:func:`PyMemoryView_FromBuffer`. + *mview* **must** be a memoryview instance. diff --git a/Doc/c-api/module.rst b/Doc/c-api/module.rst index ffd68e3084..32587be4c9 100644 --- a/Doc/c-api/module.rst +++ b/Doc/c-api/module.rst @@ -29,7 +29,7 @@ There are only a few functions special to module objects. :c:data:`PyModule_Type`. -.. c:function:: PyObject* PyModule_New(const char *name) +.. c:function:: PyObject* PyModule_NewObject(PyObject *name) .. index:: single: __name__ (module attribute) @@ -40,6 +40,14 @@ There are only a few functions special to module objects. Only the module's :attr:`__doc__` and :attr:`__name__` attributes are filled in; the caller is responsible for providing a :attr:`__file__` attribute. + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyModule_New(const char *name) + + Similar to :c:func:`PyImport_NewObject`, but the name is an UTF-8 encoded + string instead of a Unicode object. + .. c:function:: PyObject* PyModule_GetDict(PyObject *module) @@ -52,7 +60,7 @@ There are only a few functions special to module objects. manipulate a module's :attr:`__dict__`. -.. c:function:: char* PyModule_GetName(PyObject *module) +.. c:function:: PyObject* PyModule_GetNameObject(PyObject *module) .. index:: single: __name__ (module attribute) @@ -61,15 +69,13 @@ There are only a few functions special to module objects. Return *module*'s :attr:`__name__` value. If the module does not provide one, or if it is not a string, :exc:`SystemError` is raised and *NULL* is returned. + .. versionadded:: 3.3 -.. c:function:: char* PyModule_GetFilename(PyObject *module) - Similar to :c:func:`PyModule_GetFilenameObject` but return the filename - encoded to 'utf-8'. +.. c:function:: char* PyModule_GetName(PyObject *module) - .. deprecated:: 3.2 - :c:func:`PyModule_GetFilename` raises :c:type:`UnicodeEncodeError` on - unencodable filenames, use :c:func:`PyModule_GetFilenameObject` instead. + Similar to :c:func:`PyModule_GetNameObject` but return the name encoded to + ``'utf-8'``. .. c:function:: PyObject* PyModule_GetFilenameObject(PyObject *module) @@ -81,11 +87,21 @@ There are only a few functions special to module objects. Return the name of the file from which *module* was loaded using *module*'s :attr:`__file__` attribute. If this is not defined, or if it is not a unicode string, raise :exc:`SystemError` and return *NULL*; otherwise return - a reference to a :c:type:`PyUnicodeObject`. + a reference to a Unicode object. .. versionadded:: 3.2 +.. c:function:: char* PyModule_GetFilename(PyObject *module) + + Similar to :c:func:`PyModule_GetFilenameObject` but return the filename + encoded to 'utf-8'. + + .. deprecated:: 3.2 + :c:func:`PyModule_GetFilename` raises :c:type:`UnicodeEncodeError` on + unencodable filenames, use :c:func:`PyModule_GetFilenameObject` instead. + + .. c:function:: void* PyModule_GetState(PyObject *module) Return the "state" of the module, that is, a pointer to the block of memory diff --git a/Doc/c-api/object.rst b/Doc/c-api/object.rst index d0d45ad001..d895547f1f 100644 --- a/Doc/c-api/object.rst +++ b/Doc/c-api/object.rst @@ -6,6 +6,19 @@ Object Protocol =============== +.. c:var:: PyObject* Py_NotImplemented + + The ``NotImplemented`` singleton, used to signal that an operation is + not implemented for the given type combination. + + +.. c:macro:: Py_RETURN_NOTIMPLEMENTED + + Properly handle returning :c:data:`Py_NotImplemented` from within a C + function (that is, increment the reference count of NotImplemented and + return it). + + .. c:function:: int PyObject_Print(PyObject *o, FILE *fp, int flags) Print an object *o*, on file *fp*. Returns ``-1`` on error. The flags argument @@ -88,6 +101,22 @@ Object Protocol This is the equivalent of the Python statement ``del o.attr_name``. +.. c:function:: PyObject* PyType_GenericGetDict(PyObject *o, void *context) + + A generic implementation for the getter of a ``__dict__`` descriptor. It + creates the dictionary if necessary. + + .. versionadded:: 3.3 + + +.. c:function:: int PyType_GenericSetDict(PyObject *o, void *context) + + A generic implementation for the setter of a ``__dict__`` descriptor. This + implementation does not allow the dictionary to be deleted. + + .. versionadded:: 3.3 + + .. c:function:: PyObject* PyObject_RichCompare(PyObject *o1, PyObject *o2, int opid) Compare the values of *o1* and *o2* using the operation specified by *opid*, diff --git a/Doc/c-api/set.rst b/Doc/c-api/set.rst index 66b47c4f71..5f0ef90869 100644 --- a/Doc/c-api/set.rst +++ b/Doc/c-api/set.rst @@ -157,3 +157,10 @@ subtypes but not for instances of :class:`frozenset` or its subtypes. .. c:function:: int PySet_Clear(PyObject *set) Empty an existing set of all elements. + + +.. c:function:: int PySet_ClearFreeList() + + Clear the free list. Return the total number of freed items. + + .. versionadded:: 3.3 diff --git a/Doc/c-api/type.rst b/Doc/c-api/type.rst index b3386eaa8b..cfd0d78809 100644 --- a/Doc/c-api/type.rst +++ b/Doc/c-api/type.rst @@ -75,8 +75,8 @@ Type Objects .. c:function:: PyObject* PyType_GenericNew(PyTypeObject *type, PyObject *args, PyObject *kwds) - XXX: Document. - + Generic handler for the :attr:`tp_new` slot of a type object. Initialize + all instance variables to *NULL*. .. c:function:: int PyType_Ready(PyTypeObject *type) diff --git a/Doc/c-api/typeobj.rst b/Doc/c-api/typeobj.rst index 68ca9adaa9..ea1a0ad971 100644 --- a/Doc/c-api/typeobj.rst +++ b/Doc/c-api/typeobj.rst @@ -1198,46 +1198,88 @@ Buffer Object Structures .. sectionauthor:: Greg J. Stein <greg@lyra.org> .. sectionauthor:: Benjamin Peterson +.. sectionauthor:: Stefan Krah +.. c:type:: PyBufferProcs -The :ref:`buffer interface <bufferobjects>` exports a model where an object can expose its internal -data. + This structure holds pointers to the functions required by the + :ref:`Buffer protocol <bufferobjects>`. The protocol defines how + an exporter object can expose its internal data to consumer objects. -If an object does not export the buffer interface, then its :attr:`tp_as_buffer` -member in the :c:type:`PyTypeObject` structure should be *NULL*. Otherwise, the -:attr:`tp_as_buffer` will point to a :c:type:`PyBufferProcs` structure. +.. c:member:: getbufferproc PyBufferProcs.bf_getbuffer + The signature of this function is:: -.. c:type:: PyBufferProcs + int (PyObject *exporter, Py_buffer *view, int flags); + + Handle a request to *exporter* to fill in *view* as specified by *flags*. + Except for point (3), an implementation of this function MUST take these + steps: + + (1) Check if the request can be met. If not, raise :c:data:`PyExc_BufferError`, + set :c:data:`view->obj` to *NULL* and return -1. + + (2) Fill in the requested fields. + + (3) Increment an internal counter for the number of exports. + + (4) Set :c:data:`view->obj` to *exporter* and increment :c:data:`view->obj`. + + (5) Return 0. + + If *exporter* is part of a chain or tree of buffer providers, two main + schemes can be used: + + * Re-export: Each member of the tree acts as the exporting object and + sets :c:data:`view->obj` to a new reference to itself. + + * Redirect: The buffer request is redirected to the root object of the + tree. Here, :c:data:`view->obj` will be a new reference to the root + object. + + The individual fields of *view* are described in section + :ref:`Buffer structure <buffer-structure>`, the rules how an exporter + must react to specific requests are in section + :ref:`Buffer request types <buffer-request-types>`. + + All memory pointed to in the :c:type:`Py_buffer` structure belongs to + the exporter and must remain valid until there are no consumers left. + :c:member:`~Py_buffer.format`, :c:member:`~Py_buffer.shape`, + :c:member:`~Py_buffer.strides`, :c:member:`~Py_buffer.suboffsets` + and :c:member:`~Py_buffer.internal` + are read-only for the consumer. + + :c:func:`PyBuffer_FillInfo` provides an easy way of exposing a simple + bytes buffer while dealing correctly with all request types. + + :c:func:`PyObject_GetBuffer` is the interface for the consumer that + wraps this function. + +.. c:member:: releasebufferproc PyBufferProcs.bf_releasebuffer + + The signature of this function is:: + + void (PyObject *exporter, Py_buffer *view); - Structure used to hold the function pointers which define an implementation of - the buffer protocol. + Handle a request to release the resources of the buffer. If no resources + need to be released, :c:member:`PyBufferProcs.bf_releasebuffer` may be + *NULL*. Otherwise, a standard implementation of this function will take + these optional steps: - .. c:member:: getbufferproc bf_getbuffer + (1) Decrement an internal counter for the number of exports. - This should fill a :c:type:`Py_buffer` with the necessary data for - exporting the type. The signature of :data:`getbufferproc` is ``int - (PyObject *obj, Py_buffer *view, int flags)``. *obj* is the object to - export, *view* is the :c:type:`Py_buffer` struct to fill, and *flags* gives - the conditions the caller wants the memory under. (See - :c:func:`PyObject_GetBuffer` for all flags.) :c:member:`bf_getbuffer` is - responsible for filling *view* with the appropriate information. - (:c:func:`PyBuffer_FillView` can be used in simple cases.) See - :c:type:`Py_buffer`\s docs for what needs to be filled in. + (2) If the counter is 0, free all memory associated with *view*. + The exporter MUST use the :c:member:`~Py_buffer.internal` field to keep + track of buffer-specific resources. This field is guaranteed to remain + constant, while a consumer MAY pass a copy of the original buffer as the + *view* argument. - .. c:member:: releasebufferproc bf_releasebuffer - This should release the resources of the buffer. The signature of - :c:data:`releasebufferproc` is ``void (PyObject *obj, Py_buffer *view)``. - If the :c:data:`bf_releasebuffer` function is not provided (i.e. it is - *NULL*), then it does not ever need to be called. + This function MUST NOT decrement :c:data:`view->obj`, since that is + done automatically in :c:func:`PyBuffer_Release` (this scheme is + useful for breaking reference cycles). - The exporter of the buffer interface must make sure that any memory - pointed to in the :c:type:`Py_buffer` structure remains valid until - releasebuffer is called. Exporters will need to define a - :c:data:`bf_releasebuffer` function if they can re-allocate their memory, - strides, shape, suboffsets, or format variables which they might share - through the struct bufferinfo. - See :c:func:`PyBuffer_Release`. + :c:func:`PyBuffer_Release` is the interface for the consumer that + wraps this function. diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst index 35006547c6..a74a73d350 100644 --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@ -6,38 +6,72 @@ Unicode Objects and Codecs -------------------------- .. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com> +.. sectionauthor:: Georg Brandl <georg@python.org> Unicode Objects ^^^^^^^^^^^^^^^ +Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally +use a variety of representations, in order to allow handling the complete range +of Unicode characters while staying memory efficient. There are special cases +for strings where all code points are below 128, 256, or 65536; otherwise, code +points must be below 1114112 (which is the full Unicode range). + +:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached +in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated +and inefficient; it should be avoided in performance- or memory-sensitive +situations. + +Due to the transition between the old APIs and the new APIs, unicode objects +can internally be in two states depending on how they were created: + +* "canonical" unicode objects are all objects created by a non-deprecated + unicode API. They use the most efficient representation allowed by the + implementation. + +* "legacy" unicode objects have been created through one of the deprecated + APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the + :c:type:`Py_UNICODE*` representation; you will have to call + :c:func:`PyUnicode_READY` on them before calling any other API. + + Unicode Type """""""""""" These are the basic Unicode object types used for the Unicode implementation in Python: +.. c:type:: Py_UCS4 + Py_UCS2 + Py_UCS1 + + These types are typedefs for unsigned integer types wide enough to contain + characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with + single Unicode characters, use :c:type:`Py_UCS4`. + + .. versionadded:: 3.3 + .. c:type:: Py_UNICODE - This type represents the storage type which is used by Python internally as - basis for holding Unicode ordinals. Python's default builds use a 16-bit type - for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also - possible to build a UCS4 version of Python (most recent Linux distributions come - with UCS4 builds of Python). These builds then use a 32-bit type for - :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms - where :c:type:`wchar_t` is available and compatible with the chosen Python - Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for - :c:type:`wchar_t` to enhance native platform compatibility. On all other - platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned - short` (UCS2) or :c:type:`unsigned long` (UCS4). + This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type + depending on the platform. -Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep -this in mind when writing extensions or interfaces. + .. versionchanged:: 3.3 + In previous versions, this was a 16-bit type or a 32-bit type depending on + whether you selected a "narrow" or "wide" Unicode version of Python at + build time. -.. c:type:: PyUnicodeObject +.. c:type:: PyASCIIObject + PyCompactUnicodeObject + PyUnicodeObject - This subtype of :c:type:`PyObject` represents a Python Unicode object. + These subtypes of :c:type:`PyObject` represent a Python Unicode object. In + almost all cases, they shouldn't be used directly, since all API functions + that deal with Unicode objects take and return :c:type:`PyObject` pointers. + + .. versionadded:: 3.3 .. c:var:: PyTypeObject PyUnicode_Type @@ -45,10 +79,10 @@ this in mind when writing extensions or interfaces. This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It is exposed to Python code as ``str``. + The following APIs are really C macros and can be used to do fast checks and to access internal read-only data of Unicode objects: - .. c:function:: int PyUnicode_Check(PyObject *o) Return true if the object *o* is a Unicode object or an instance of a Unicode @@ -61,28 +95,106 @@ access internal read-only data of Unicode objects: subtype. -.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o) +.. c:function:: int PyUnicode_READY(PyObject *o) - Return the size of the object. *o* has to be a :c:type:`PyUnicodeObject` (not - checked). + Ensure the string object *o* is in the "canonical" representation. This is + required before using any of the access macros described below. + .. XXX expand on when it is not required -.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o) + Returns 0 on success and -1 with an exception set on failure, which in + particular happens if memory allocation fails. - Return the size of the object's internal buffer in bytes. *o* has to be a - :c:type:`PyUnicodeObject` (not checked). + .. versionadded:: 3.3 -.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o) +.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o) + + Return the length of the Unicode string, in code points. *o* has to be a + Unicode object in the "canonical" representation (not checked). + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o) + Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o) + Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o) + + Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4 + integer types for direct character access. No checks are performed if the + canonical representation has the correct character size; use + :c:func:`PyUnicode_KIND` to select the right macro. Make sure + :c:func:`PyUnicode_READY` has been called before accessing this. + + .. versionadded:: 3.3 + + +.. c:macro:: PyUnicode_WCHAR_KIND + PyUnicode_1BYTE_KIND + PyUnicode_2BYTE_KIND + PyUnicode_4BYTE_KIND + + Return values of the :c:func:`PyUnicode_KIND` macro. + + .. versionadded:: 3.3 + + +.. c:function:: int PyUnicode_KIND(PyObject *o) + + Return one of the PyUnicode kind constants (see above) that indicate how many + bytes per character this Unicode object uses to store its data. *o* has to + be a Unicode object in the "canonical" representation (not checked). + + .. XXX document "0" return value? + + .. versionadded:: 3.3 + + +.. c:function:: void* PyUnicode_DATA(PyObject *o) + + Return a void pointer to the raw unicode buffer. *o* has to be a Unicode + object in the "canonical" representation (not checked). + + .. versionadded:: 3.3 + + +.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \ + Py_UCS4 value) + + Write into a canonical representation *data* (as obtained with + :c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is + intended for usage in loops. The caller should cache the *kind* value and + *data* pointer as obtained from other macro calls. *index* is the index in + the string (starts at 0) and *value* is the new code point value which should + be written to that location. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index) + + Read a code point from a canonical representation *data* (as obtained with + :c:func:`PyUnicode_DATA`). No checks or ready calls are performed. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index) + + Read a character from a Unicode object *o*, which must be in the "canonical" + representation. This is less efficient than :c:func:`PyUnicode_READ` if you + do multiple consecutive reads. + + .. versionadded:: 3.3 - Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object. *o* - has to be a :c:type:`PyUnicodeObject` (not checked). +.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o) -.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o) + Return the maximum code point that is suitable for creating another string + based on *o*, which must be in the "canonical" representation. This is + always an approximation but more efficient than iterating over the string. - Return a pointer to the internal buffer of the object. *o* has to be a - :c:type:`PyUnicodeObject` (not checked). + .. versionadded:: 3.3 .. c:function:: int PyUnicode_ClearFreeList() @@ -90,6 +202,46 @@ access internal read-only data of Unicode objects: Clear the free list. Return the total number of freed items. +.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o) + + Return the size of the deprecated :c:type:`Py_UNICODE` representation, in + code units (this includes surrogate pairs as 2 units). *o* has to be a + Unicode object (not checked). + + .. deprecated-removed:: 3.3 4.0 + Part of the old-style Unicode API, please migrate to using + :c:func:`PyUnicode_GET_LENGTH`. + + +.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o) + + Return the size of the deprecated :c:type:`Py_UNICODE` representation in + bytes. *o* has to be a Unicode object (not checked). + + .. deprecated-removed:: 3.3 4.0 + Part of the old-style Unicode API, please migrate to using + :c:func:`PyUnicode_GET_LENGTH`. + + +.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o) + const char* PyUnicode_AS_DATA(PyObject *o) + + Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The + ``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be + a Unicode object (not checked). + + .. versionchanged:: 3.3 + This macro is now inefficient -- because in many cases the + :c:type:`Py_UNICODE` representation does not exist and needs to be created + -- and can fail (return *NULL* with an exception set). Try to port the + code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use + :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`. + + .. deprecated-removed:: 3.3 4.0 + Part of the old-style Unicode API, please migrate to using the + :c:func:`PyUnicode_nBYTE_DATA` family of macros. + + Unicode Character Properties """""""""""""""""""""""""""" @@ -166,16 +318,25 @@ These APIs can be used for fast direct character conversions: Return the character *ch* converted to lower case. + .. deprecated:: 3.3 + This function uses simple case mappings. + .. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch) Return the character *ch* converted to upper case. + .. deprecated:: 3.3 + This function uses simple case mappings. + .. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch) Return the character *ch* converted to title case. + .. deprecated:: 3.3 + This function uses simple case mappings. + .. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch) @@ -195,31 +356,66 @@ These APIs can be used for fast direct character conversions: possible. This macro does not raise exceptions. -Plain Py_UNICODE -"""""""""""""""" +These APIs can be used to work with surrogates: + +.. c:macro:: Py_UNICODE_IS_SURROGATE(ch) + + Check if *ch* is a surrogate (``0xD800 <= ch <= 0xDFFF``). + +.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch) + + Check if *ch* is an high surrogate (``0xD800 <= ch <= 0xDBFF``). + +.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch) + + Check if *ch* is a low surrogate (``0xDC00 <= ch <= 0xDFFF``). + +.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low) + + Join two surrogate characters and return a single Py_UCS4 value. + *high* and *low* are respectively the leading and trailing surrogates in a + surrogate pair. + + +Creating and accessing Unicode strings +"""""""""""""""""""""""""""""""""""""" To create Unicode objects and access their basic sequence properties, use these APIs: +.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar) -.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) + Create a new Unicode object. *maxchar* should be the true maximum code point + to be placed in the string. As an approximation, it can be rounded up to the + nearest value in the sequence 127, 255, 65535, 1114111. - Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u* - may be *NULL* which causes the contents to be undefined. It is the user's - responsibility to fill in the needed data. The buffer is copied into the new - object. If the buffer is not *NULL*, the return value might be a shared object. - Therefore, modification of the resulting Unicode object is only allowed when *u* - is *NULL*. + This is the recommended way to allocate a new Unicode object. Objects + created using this function are not resizable. + + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \ + Py_ssize_t size) + + Create a new Unicode object with the given *kind* (possible values are + :c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by + :c:func:`PyUnicode_KIND`). The *buffer* must point to an array of *size* + units of 1, 2 or 4 bytes per character, as given by the kind. + + .. versionadded:: 3.3 .. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size) - Create a Unicode object from the char buffer *u*. The bytes will be interpreted - as being UTF-8 encoded. *u* may also be *NULL* which - causes the contents to be undefined. It is the user's responsibility to fill in - the needed data. The buffer is copied into the new object. If the buffer is not - *NULL*, the return value might be a shared object. Therefore, modification of - the resulting Unicode object is only allowed when *u* is *NULL*. + Create a Unicode object from the char buffer *u*. The bytes will be + interpreted as being UTF-8 encoded. The buffer is copied into the new + object. If the buffer is not *NULL*, the return value might be a shared + object, i.e. modification of the data is not allowed. + + If *u* is *NULL*, this function behaves like :c:func:`PyUnicode_FromUnicode` + with the buffer set to *NULL*. This usage is deprecated in favor of + :c:func:`PyUnicode_New`. .. c:function:: PyObject *PyUnicode_FromString(const char *u) @@ -260,18 +456,27 @@ APIs: | :attr:`%ld` | long | Exactly equivalent to | | | | ``printf("%ld")``. | +-------------------+---------------------+--------------------------------+ + | :attr:`%li` | long | Exactly equivalent to | + | | | ``printf("%li")``. | + +-------------------+---------------------+--------------------------------+ | :attr:`%lu` | unsigned long | Exactly equivalent to | | | | ``printf("%lu")``. | +-------------------+---------------------+--------------------------------+ | :attr:`%lld` | long long | Exactly equivalent to | | | | ``printf("%lld")``. | +-------------------+---------------------+--------------------------------+ + | :attr:`%lli` | long long | Exactly equivalent to | + | | | ``printf("%lli")``. | + +-------------------+---------------------+--------------------------------+ | :attr:`%llu` | unsigned long long | Exactly equivalent to | | | | ``printf("%llu")``. | +-------------------+---------------------+--------------------------------+ | :attr:`%zd` | Py_ssize_t | Exactly equivalent to | | | | ``printf("%zd")``. | +-------------------+---------------------+--------------------------------+ + | :attr:`%zi` | Py_ssize_t | Exactly equivalent to | + | | | ``printf("%zi")``. | + +-------------------+---------------------+--------------------------------+ | :attr:`%zu` | size_t | Exactly equivalent to | | | | ``printf("%zu")``. | +-------------------+---------------------+--------------------------------+ @@ -322,27 +527,178 @@ APIs: .. versionchanged:: 3.2 Support for ``"%lld"`` and ``"%llu"`` added. + .. versionchanged:: 3.3 + Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added. + .. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs) Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two arguments. + +.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \ + const char *encoding, const char *errors) + + Coerce an encoded object *obj* to an Unicode object and return a reference with + incremented refcount. + + :class:`bytes`, :class:`bytearray` and other char buffer compatible objects + are decoded according to the given *encoding* and using the error handling + defined by *errors*. Both can be *NULL* to have the interface use the default + values (see the next section for details). + + All other objects, including Unicode objects, cause a :exc:`TypeError` to be + set. + + The API returns *NULL* if there was an error. The caller is responsible for + decref'ing the returned objects. + + +.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode) + + Return the length of the Unicode object, in code points. + + .. versionadded:: 3.3 + + +.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \ + PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many) + + Copy characters from one Unicode object into another. This function performs + character conversion when necessary and falls back to :c:func:`memcpy` if + possible. Returns ``-1`` and sets an exception on error, otherwise returns + ``0``. + + .. versionadded:: 3.3 + + +.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \ + Py_ssize_t length, Py_UCS4 fill_char) + + Fill a string with a character: write *fill_char* into + ``unicode[start:start+length]``. + + Fail if *fill_char* is bigger than the string maximum character, or if the + string has more than 1 reference. + + Return the number of written character, or return ``-1`` and raise an + exception on error. + + .. versionadded:: 3.3 + + +.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \ + Py_UCS4 character) + + Write a character to a string. The string must have been created through + :c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable, + the string must not be shared, or have been hashed yet. + + This function checks that *unicode* is a Unicode object, that the index is + not out of bounds, and that the object can be modified safely (i.e. that it + its reference count is one), in contrast to the macro version + :c:func:`PyUnicode_WRITE_CHAR`. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index) + + Read a character from a string. This function checks that *unicode* is a + Unicode object and the index is not out of bounds, in contrast to the macro + version :c:func:`PyUnicode_READ_CHAR`. + + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \ + Py_ssize_t end) + + Return a substring of *str*, from character index *start* (included) to + character index *end* (excluded). Negative indices are not supported. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \ + Py_ssize_t buflen, int copy_null) + + Copy the string *u* into a UCS4 buffer, including a null character, if + *copy_null* is set. Returns *NULL* and sets an exception on error (in + particular, a :exc:`ValueError` if *buflen* is smaller than the length of + *u*). *buffer* is returned on success. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u) + + Copy the string *u* into a new UCS4 buffer that is allocated using + :c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a + :exc:`MemoryError` set. + + .. versionadded:: 3.3 + + +Deprecated Py_UNICODE APIs +"""""""""""""""""""""""""" + +.. deprecated-removed:: 3.3 4.0 + +These API functions are deprecated with the implementation of :pep:`393`. +Extension modules can continue using them, as they will not be removed in Python +3.x, but need to be aware that their use can now cause performance and memory hits. + + +.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) + + Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u* + may be *NULL* which causes the contents to be undefined. It is the user's + responsibility to fill in the needed data. The buffer is copied into the new + object. + + If the buffer is not *NULL*, the return value might be a shared object. + Therefore, modification of the resulting Unicode object is only allowed when + *u* is *NULL*. + + If the buffer is *NULL*, :c:func:`PyUnicode_READY` must be called once the + string content has been filled before using any of the access macros such as + :c:func:`PyUnicode_KIND`. + + Please migrate to using :c:func:`PyUnicode_FromKindAndData` or + :c:func:`PyUnicode_New`. + + +.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) + + Return a read-only pointer to the Unicode object's internal + :c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the + :c:type:`Py_UNICODE*` representation of the object if it is not yet + available. Note that the resulting :c:type:`Py_UNICODE` string may contain + embedded null characters, which would cause the string to be truncated when + used in most C functions. + + Please migrate to using :c:func:`PyUnicode_AsUCS4`, + :c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new + APIs. + + .. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size) Create a Unicode object by replacing all decimal digits in :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9 - according to their decimal value. Return *NULL* if an exception - occurs. + according to their decimal value. Return *NULL* if an exception occurs. -.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) +.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) - Return a read-only pointer to the Unicode object's internal - :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object. - Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded - null characters, which would cause the string to be truncated when used in - most C functions. + Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE` + array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string + may contain embedded null characters, which would cause the string to be + truncated when used in most C functions. + + .. versionadded:: 3.3 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode) @@ -350,44 +706,76 @@ APIs: Create a copy of a Unicode string ending with a nul character. Return *NULL* and raise a :exc:`MemoryError` exception on memory allocation failure, otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free - the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may contain - embedded null characters, which would cause the string to be truncated when - used in most C functions. + the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may + contain embedded null characters, which would cause the string to be + truncated when used in most C functions. .. versionadded:: 3.2 + Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs. + .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode) - Return the length of the Unicode object. + Return the size of the deprecated :c:type:`Py_UNICODE` representation, in + code units (this includes surrogate pairs as 2 units). + Please migrate to using :c:func:`PyUnicode_GetLength`. -.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors) - Coerce an encoded object *obj* to an Unicode object and return a reference with - incremented refcount. +.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj) - :class:`bytes`, :class:`bytearray` and other char buffer compatible objects - are decoded according to the given *encoding* and using the error handling - defined by *errors*. Both can be *NULL* to have the interface use the default - values (see the next section for details). + Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used + throughout the interpreter whenever coercion to Unicode is needed. - All other objects, including Unicode objects, cause a :exc:`TypeError` to be - set. - The API returns *NULL* if there was an error. The caller is responsible for - decref'ing the returned objects. +Locale Encoding +""""""""""""""" +The current locale encoding can be used to decode text from the operating +system. -.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj) +.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len, int surrogateescape) + + Decode a string from the current locale encoding. The decoder is strict if + *surrogateescape* is equal to zero, otherwise it uses the + ``'surrogateescape'`` error handler (:pep:`383`) to escape undecodable + bytes. If a byte sequence can be decoded as a surrogate character and + *surrogateescape* is not equal to zero, the byte sequence is escaped using + the ``'surrogateescape'`` error handler instead of being decoded. *str* + must end with a null character but cannot contain embedded null characters. + + .. seealso:: + + Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from + :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at + Python startup). + + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, int surrogateescape) + + Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string + length using :c:func:`strlen`. + + .. versionadded:: 3.3 - Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used - throughout the interpreter whenever coercion to Unicode is needed. -If the platform supports :c:type:`wchar_t` and provides a header file wchar.h, -Python can interface directly to this type using the following functions. -Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to -the system's :c:type:`wchar_t`. +.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, int surrogateescape) + + Encode a Unicode object to the current locale encoding. The encoder is + strict if *surrogateescape* is equal to zero, otherwise it uses the + ``'surrogateescape'`` error handler (:pep:`383`). Return a :class:`bytes` + object. *str* cannot contain embedded null characters. + + .. seealso:: + + Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to + :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at + Python startup). + + .. versionadded:: 3.3 File System Encoding @@ -430,6 +818,13 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the locale encoding. + .. seealso:: + + :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the + locale encoding and cannot be modified later. If you need to decode a + string from the current locale encoding, use + :c:func:`PyUnicode_DecodeLocaleAndSize`. + .. versionchanged:: 3.2 Use ``'strict'`` error handler on Windows. @@ -458,6 +853,13 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the locale encoding. + .. seealso:: + + :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the + locale encoding and cannot be modified later. If you need to encode a + string to the current locale encoding, use + :c:func:`PyUnicode_EncodeLocale`. + .. versionadded:: 3.2 @@ -479,9 +881,9 @@ wchar_t Support Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing 0-termination character). Return the number of :c:type:`wchar_t` characters - copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t` + copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*` string may or may not be 0-terminated. It is the responsibility of the caller - to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is + to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is required by the application. Also, note that the :c:type:`wchar_t*` string might contain null characters, which would cause the string to be truncated when used with most C functions. @@ -497,12 +899,32 @@ wchar_t Support Returns a buffer allocated by :c:func:`PyMem_Alloc` (use :c:func:`PyMem_Free` to free it) on success. On error, returns *NULL*, *\*size* is undefined and raises a :exc:`MemoryError`. Note that the - resulting :c:type:`wchar_t*` string might contain null characters, which + resulting :c:type:`wchar_t` string might contain null characters, which would cause the string to be truncated when used with most C functions. .. versionadded:: 3.2 +UCS4 Support +"""""""""""" + +.. versionadded:: 3.3 + +.. XXX are these meant to be public? + +.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u) + Py_UCS4* Py_UCS4_strcpy(Py_UCS4 *s1, const Py_UCS4 *s2) + Py_UCS4* Py_UCS4_strncpy(Py_UCS4 *s1, const Py_UCS4 *s2, size_t n) + Py_UCS4* Py_UCS4_strcat(Py_UCS4 *s1, const Py_UCS4 *s2) + int Py_UCS4_strcmp(const Py_UCS4 *s1, const Py_UCS4 *s2) + int Py_UCS4_strncmp(const Py_UCS4 *s1, const Py_UCS4 *s2, size_t n) + Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c) + Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c) + + These utility functions work on strings of :c:type:`Py_UCS4` characters and + otherwise behave like the C standard library functions with the same name. + + .. _builtincodecs: Built-in Codecs @@ -537,7 +959,8 @@ Generic Codecs These are the generic codec APIs: -.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors) +.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \ + const char *encoding, const char *errors) Create a Unicode object by decoding *size* bytes of the encoded string *s*. *encoding* and *errors* have the same meaning as the parameters of the same name @@ -546,7 +969,18 @@ These are the generic codec APIs: the codec. -.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors) +.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \ + const char *encoding, const char *errors) + + Encode a Unicode object and return the result as Python bytes object. + *encoding* and *errors* have the same meaning as the parameters of the same + name in the Unicode :meth:`encode` method. The codec to be used is looked up + using the Python codec registry. Return *NULL* if an exception was raised by + the codec. + + +.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \ + const char *encoding, const char *errors) Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python bytes object. *encoding* and *errors* have the same meaning as the @@ -554,14 +988,9 @@ These are the generic codec APIs: to be used is looked up using the Python codec registry. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors) - - Encode a Unicode object and return the result as Python bytes object. - *encoding* and *errors* have the same meaning as the parameters of the same - name in the Unicode :meth:`encode` method. The codec to be used is looked up - using the Python codec registry. Return *NULL* if an exception was raised by - the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsEncodedString`. UTF-8 Codecs @@ -576,7 +1005,8 @@ These are the UTF-8 codec APIs: *s*. Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) +.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \ + const char *errors, Py_ssize_t *consumed) If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be @@ -584,18 +1014,45 @@ These are the UTF-8 codec APIs: that have been decoded will be stored in *consumed*. +.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode) + + Encode a Unicode object using UTF-8 and return the result as Python bytes + object. Error handling is "strict". Return *NULL* if an exception was + raised by the codec. + + +.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size) + + Return a pointer to the default encoding (UTF-8) of the Unicode object, and + store the size of the encoded representation (in bytes) in *size*. *size* + can be *NULL*, in this case no size will be stored. + + In the case of an error, *NULL* is returned with an exception set and no + *size* is stored. + + This caches the UTF-8 representation of the string in the Unicode object, and + subsequent calls will return a pointer to the same buffer. The caller is not + responsible for deallocating the buffer. + + .. versionadded:: 3.3 + + +.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode) + + As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size. + + .. versionadded:: 3.3 + + .. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors) Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and return a Python bytes object. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode) - - Encode a Unicode object using UTF-8 and return the result as Python bytes - object. Error handling is "strict". Return *NULL* if an exception was - raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`. UTF-32 Codecs @@ -604,7 +1061,8 @@ UTF-32 Codecs These are the UTF-32 codec APIs: -.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder) +.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \ + const char *errors, int *byteorder) Decode *size* bytes from a UTF-32 encoded buffer string and return the corresponding Unicode object. *errors* (if non-*NULL*) defines the error @@ -632,7 +1090,8 @@ These are the UTF-32 codec APIs: Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) +.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \ + const char *errors, int *byteorder, Py_ssize_t *consumed) If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat @@ -641,7 +1100,15 @@ These are the UTF-32 codec APIs: that have been decoded will be stored in *consumed*. -.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) +.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode) + + Return a Python byte string using the UTF-32 encoding in native byte + order. The string always starts with a BOM mark. Error handling is "strict". + Return *NULL* if an exception was raised by the codec. + + +.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \ + const char *errors, int byteorder) Return a Python bytes object holding the UTF-32 encoded value of the Unicode data in *s*. Output is written according to the following byte order:: @@ -658,12 +1125,9 @@ These are the UTF-32 codec APIs: Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode) - - Return a Python byte string using the UTF-32 encoding in native byte - order. The string always starts with a BOM mark. Error handling is "strict". - Return *NULL* if an exception was raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsUTF32String`. UTF-16 Codecs @@ -672,7 +1136,8 @@ UTF-16 Codecs These are the UTF-16 codec APIs: -.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder) +.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \ + const char *errors, int *byteorder) Decode *size* bytes from a UTF-16 encoded buffer string and return the corresponding Unicode object. *errors* (if non-*NULL*) defines the error @@ -699,7 +1164,8 @@ These are the UTF-16 codec APIs: Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) +.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \ + const char *errors, int *byteorder, Py_ssize_t *consumed) If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat @@ -708,7 +1174,15 @@ These are the UTF-16 codec APIs: number of bytes that have been decoded will be stored in *consumed*. -.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) +.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode) + + Return a Python byte string using the UTF-16 encoding in native byte + order. The string always starts with a BOM mark. Error handling is "strict". + Return *NULL* if an exception was raised by the codec. + + +.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \ + const char *errors, int byteorder) Return a Python bytes object holding the UTF-16 encoded value of the Unicode data in *s*. Output is written according to the following byte order:: @@ -726,12 +1200,9 @@ These are the UTF-16 codec APIs: Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode) - - Return a Python byte string using the UTF-16 encoding in native byte - order. The string always starts with a BOM mark. Error handling is "strict". - Return *NULL* if an exception was raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsUTF16String`. UTF-7 Codecs @@ -746,7 +1217,8 @@ These are the UTF-7 codec APIs: *s*. Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) +.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \ + const char *errors, Py_ssize_t *consumed) If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`. If *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not @@ -754,7 +1226,8 @@ These are the UTF-7 codec APIs: bytes that have been decoded will be stored in *consumed*. -.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors) +.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \ + int base64SetO, int base64WhiteSpace, const char *errors) Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and return a Python bytes object. Return *NULL* if an exception was raised by @@ -765,6 +1238,11 @@ These are the UTF-7 codec APIs: nonzero, whitespace will be encoded in base-64. Both are set to zero for the Python "utf-7" codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API. + + .. XXX replace with what? + Unicode-Escape Codecs """"""""""""""""""""" @@ -772,24 +1250,29 @@ Unicode-Escape Codecs These are the "Unicode Escape" codec APIs: -.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) +.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \ + Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded string *s*. Return *NULL* if an exception was raised by the codec. +.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode) + + Encode a Unicode object using Unicode-Escape and return the result as Python + string object. Error handling is "strict". Return *NULL* if an exception was + raised by the codec. + + .. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size) Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and return a Python string object. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode) - - Encode a Unicode object using Unicode-Escape and return the result as Python - string object. Error handling is "strict". Return *NULL* if an exception was - raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsUnicodeEscapeString`. Raw-Unicode-Escape Codecs @@ -798,19 +1281,13 @@ Raw-Unicode-Escape Codecs These are the "Raw Unicode Escape" codec APIs: -.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) +.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \ + Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape encoded string *s*. Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors) - - Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape - and return a Python string object. Return *NULL* if an exception was raised by - the codec. - - .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode) Encode a Unicode object using Raw-Unicode-Escape and return the result as @@ -818,6 +1295,18 @@ These are the "Raw Unicode Escape" codec APIs: was raised by the codec. +.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \ + Py_ssize_t size, const char *errors) + + Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape + and return a Python string object. Return *NULL* if an exception was raised by + the codec. + + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsRawUnicodeEscapeString`. + + Latin-1 Codecs """""""""""""" @@ -831,18 +1320,22 @@ ordinals and only these are accepted by the codecs during encoding. *s*. Return *NULL* if an exception was raised by the codec. +.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode) + + Encode a Unicode object using Latin-1 and return the result as Python bytes + object. Error handling is "strict". Return *NULL* if an exception was + raised by the codec. + + .. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors) Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and return a Python bytes object. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode) - - Encode a Unicode object using Latin-1 and return the result as Python bytes - object. Error handling is "strict". Return *NULL* if an exception was - raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsLatin1String`. ASCII Codecs @@ -858,18 +1351,22 @@ codes generate errors. *s*. Return *NULL* if an exception was raised by the codec. +.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode) + + Encode a Unicode object using ASCII and return the result as Python bytes + object. Error handling is "strict". Return *NULL* if an exception was + raised by the codec. + + .. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors) Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and return a Python bytes object. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode) - - Encode a Unicode object using ASCII and return the result as Python bytes - object. Error handling is "strict". Return *NULL* if an exception was - raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsASCIIString`. Character Map Codecs @@ -898,7 +1395,8 @@ characters to different code points. These are the mapping codec APIs: -.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors) +.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \ + PyObject *mapping, const char *errors) Create a Unicode object by decoding *size* bytes of the encoded string *s* using the given *mapping* object. Return *NULL* if an exception was raised by the @@ -908,13 +1406,6 @@ These are the mapping codec APIs: treated as "undefined mapping". -.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors) - - Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given - *mapping* object and return a Python string object. Return *NULL* if an - exception was raised by the codec. - - .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping) Encode a Unicode object using the given *mapping* object and return the result @@ -924,7 +1415,8 @@ These are the mapping codec APIs: The following codec API is special in that maps Unicode to Unicode. -.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors) +.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \ + PyObject *table, const char *errors) Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a character mapping *table* to it and return the resulting Unicode object. Return @@ -937,6 +1429,22 @@ The following codec API is special in that maps Unicode to Unicode. and sequences work well. Unmapped character ordinals (ones which cause a :exc:`LookupError`) are left untouched and are copied as-is. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API. + + .. XXX replace with what? + + +.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \ + PyObject *mapping, const char *errors) + + Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given + *mapping* object and return a Python string object. Return *NULL* if an + exception was raised by the codec. + + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsCharmapString`. MBCS codecs for Windows @@ -953,7 +1461,8 @@ the user settings on the machine running the codec. Return *NULL* if an exception was raised by the codec. -.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed) +.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \ + const char *errors, int *consumed) If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode @@ -961,18 +1470,31 @@ the user settings on the machine running the codec. in *consumed*. +.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode) + + Encode a Unicode object using MBCS and return the result as Python bytes + object. Error handling is "strict". Return *NULL* if an exception was + raised by the codec. + + +.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors) + + Encode the Unicode object using the specified code page and return a Python + bytes object. Return *NULL* if an exception was raised by the codec. Use + :c:data:`CP_ACP` code page to get the MBCS encoder. + + .. versionadded:: 3.3 + + .. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors) Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return a Python bytes object. Return *NULL* if an exception was raised by the codec. - -.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode) - - Encode a Unicode object using MBCS and return the result as Python bytes - object. Error handling is "strict". Return *NULL* if an exception was - raised by the codec. + .. deprecated-removed:: 3.3 4.0 + Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using + :c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`. Methods & Slots @@ -1011,7 +1533,8 @@ They all return *NULL* or ``-1`` if an exception occurs. characters are not included in the resulting strings. -.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors) +.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, \ + const char *errors) Translate a string by applying a character mapping table to it and return the resulting Unicode object. @@ -1033,14 +1556,16 @@ They all return *NULL* or ``-1`` if an exception occurs. Unicode string. -.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) +.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \ + Py_ssize_t start, Py_ssize_t end, int direction) Return 1 if *substr* matches ``str[start:end]`` at the given tail end (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match), 0 otherwise. Return ``-1`` if an error occurred. -.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) +.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \ + Py_ssize_t start, Py_ssize_t end, int direction) Return the first position of *substr* in ``str[start:end]`` using the given *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a @@ -1049,13 +1574,27 @@ They all return *NULL* or ``-1`` if an exception occurs. occurred and an exception has been set. -.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end) +.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \ + Py_ssize_t start, Py_ssize_t end, int direction) + + Return the first position of the character *ch* in ``str[start:end]`` using + the given *direction* (*direction* == 1 means to do a forward search, + *direction* == -1 a backward search). The return value is the index of the + first match; a value of ``-1`` indicates that no match was found, and ``-2`` + indicates that an error occurred and an exception has been set. + + .. versionadded:: 3.3 + + +.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \ + Py_ssize_t start, Py_ssize_t end) Return the number of non-overlapping occurrences of *substr* in ``str[start:end]``. Return ``-1`` if an error occurred. -.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount) +.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \ + PyObject *replstr, Py_ssize_t maxcount) Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and return the resulting Unicode object. *maxcount* == -1 means replace all @@ -1103,8 +1642,8 @@ They all return *NULL* or ``-1`` if an exception occurs. Check whether *element* is contained in *container* and return true or false accordingly. - *element* has to coerce to a one element Unicode string. ``-1`` is returned if - there was an error. + *element* has to coerce to a one element Unicode string. ``-1`` is returned + if there was an error. .. c:function:: void PyUnicode_InternInPlace(PyObject **string) @@ -1123,7 +1662,6 @@ They all return *NULL* or ``-1`` if an exception occurs. .. c:function:: PyObject* PyUnicode_InternFromString(const char *v) A combination of :c:func:`PyUnicode_FromString` and - :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string object - that has been interned, or a new ("owned") reference to an earlier interned - string object with the same value. - + :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string + object that has been interned, or a new ("owned") reference to an earlier + interned string object with the same value. |