diff options
author | Nathaniel J. Smith <njs@pobox.com> | 2012-05-11 14:31:50 +0100 |
---|---|---|
committer | Nathaniel J. Smith <njs@pobox.com> | 2012-06-16 10:45:38 +0100 |
commit | b272bc605ce7784be5b3edb13ad7afe22b04e71f (patch) | |
tree | 40fc10c60fd1b48d94be48a80e7cfc98525bd6e7 /doc/source | |
parent | 1b6582d98c58afd977a69ac49f7e8e0d08a800b8 (diff) | |
download | numpy-b272bc605ce7784be5b3edb13ad7afe22b04e71f.tar.gz |
Remove maskna API from ndarray, and all (and only) the code supporting it
The original masked-NA-NEP branch contained a large number of changes
in addition to the core NA support. For example:
- ufunc.__call__ support for where= argument
- nditer support for arbitrary masks (in support of where=)
- ufunc.reduce support for simultaneous reduction over multiple axes
- a new "array assignment API"
- ndarray.diagonal() returning a view in all cases
- bug-fixes in __array_priority__ handling
- datetime test changes
etc. There's no consensus yet on what should be done with the
maskna-related part of this branch, but the rest is generally useful
and uncontroversial, so the goal of this branch is to identify exactly
which code changes are involved in maskna support.
The basic strategy used to create this patch was:
- Remove the new masking-related fields from ndarray, so no arrays
are masked
- Go through and remove all the code that this makes
dead/inaccessible/irrelevant, in a largely mechanical fashion. So
for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that
whole block was obviously just dead code if no arrays have masks,
and I removed it. Likewise for function arguments like skipna that
are useless if there aren't any NAs to skip.
This changed the signature of a number of functions that were newly
exposed in the numpy public API. I've removed all such functions from
the public API, since releasing them with the NA-less signature in 1.7
would create pointless compatibility hassles later if and when we add
back the NA-related functionality. Most such functions are removed by
this commit; the exception is PyArray_ReduceWrapper, which requires
more extensive surgery, and will be handled in followup commits.
I also removed the new ndarray.setasflat method. Reason: a comment
noted that the only reason this was added was to allow easier testing
of one branch of PyArray_CopyAsFlat. That branch is now the main
branch, so that isn't an issue. Nonetheless this function is arguably
useful, so perhaps it should have remained, but I judged that since
numpy's API is already hairier than we would like, it's not a good
idea to add extra hair "just in case". (Also AFAICT the test for this
method in test_maskna was actually incorrect, as noted here:
https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py
so I'm not confident that it ever worked in master, though I haven't
had a chance to follow-up on this.)
I also removed numpy.count_reduce_items, since without skipna it
became trivial.
I believe that these are the only exceptions to the "remove dead code"
strategy.
Diffstat (limited to 'doc/source')
-rw-r--r-- | doc/source/reference/arrays.maskna.rst | 306 | ||||
-rw-r--r-- | doc/source/reference/arrays.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/c-api.array.rst | 114 | ||||
-rw-r--r-- | doc/source/reference/c-api.maskna.rst | 592 | ||||
-rw-r--r-- | doc/source/reference/c-api.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/routines.polynomials.classes.rst | 28 | ||||
-rw-r--r-- | doc/source/reference/routines.rst | 1 |
7 files changed, 0 insertions, 1043 deletions
diff --git a/doc/source/reference/arrays.maskna.rst b/doc/source/reference/arrays.maskna.rst deleted file mode 100644 index bd9516eba..000000000 --- a/doc/source/reference/arrays.maskna.rst +++ /dev/null @@ -1,306 +0,0 @@ -.. currentmodule:: numpy - -.. _arrays.maskna: - -**************** -NA-Masked Arrays -**************** - -.. versionadded:: 1.7.0 - -NumPy 1.7 adds preliminary support for missing values using an interface -based on an NA (Not Available) placeholder, implemented as masks in the -core ndarray. This system is highly flexible, allowing NAs to be used -with any underlying dtype, and supports creating multiple views of the same -data with different choices of NAs. - -.. note:: The NA API is *experimental*, and may undergo changes in future - versions of NumPy. The current implementation based on masks will likely be - supplemented by a second one based on bit-patterns, and it is possible that - a difference will be made between missing and ignored data. - -Other Missing Data Approaches -============================= - -The previous recommended approach for working with missing values was the -:mod:`numpy.ma` module, a subclass of ndarray written purely in Python. -By placing NA-masks directly in the NumPy core, it's possible to avoid -the need for calling "ma.<func>(arr)" instead of "np.<func>(arr)". - -Another approach many people have taken is to use NaN as the -placeholder for missing values. There are a few functions -like :func:`numpy.nansum` which behave similarly to usage of the -ufunc.reduce *skipna* parameter. - -As experienced in the R language, a programming interface based on an -NA placeholder is generally more intuitive to work with than direct -mask manipulation. - -Missing Data Model -================== - -The model adopted by NumPy for missing values is that NA is a -placeholder for a value which is there, but is unknown to computations. -The value may be temporarily hidden by the mask, or may be unknown -for any reason, but could be any value the dtype of the array is able -to hold. - -This model affects computations in specific, well-defined ways. Any time -we have a computation, like *c = NA + 1*, we must reason about whether -*c* will be an NA or not. The NA is not available now, but maybe a -measurement will be made later to determine what its value is, so anything -we calculate must be consistent with it eventually being revealed. One way -to do this is with thought experiments imagining we have discovered -the value of this NA. If the NA is 0, then *c* is 1. If the NA is -100, then *c* is 101. Because the value of *c* is ambiguous, it -isn't available either, so must be NA as well. - -A consequence of separating the NA model from the dtype is that, unlike -in R, NaNs are not considered to be NA. An NA is a value that is completely -unknown, whereas a NaN is usually the result of an invalid computation -as defined in the IEEE 754 floating point arithmetic specification. - -Most computations whose input is NA will output NA as well, a property -known as propagation. Some operations, however, always produce the -same result no matter what the value of the NA is. The clearest -example of this is with the logical operations *and* and *or*. Since both -np.logical_or(True, True) and np.logical_or(False, True) are True, -all possible boolean values on the left hand side produce the -same answer. This means that np.logical_or(np.NA, True) can produce -True instead of the more conservative np.NA. There is a similar case -for np.logical_and. - -A similar, but slightly deceptive, example is wanting to treat (NA * 0.0) -as 0.0 instead of as NA. This is invalid because the NA might be Inf -or NaN, in which case the result is NaN instead of 0.0. This idea is -valid for integer dtypes, but NumPy still chooses to return NA because -checking this special case would adversely affect performance. - -The NA Object -============= - -In the root numpy namespace, there is a new object NA. This is not -the only possible instance of an NA as is the case for None, since an NA -may have a dtype associated with it and has been designed for future -expansion to carry a multi-NA payload. It can be used in computations -like any value:: - - >>> np.NA - NA - >>> np.NA * 3 - NA(dtype='int64') - >>> np.sin(np.NA) - NA(dtype='float64') - -To check whether a value is NA, use the :func:`numpy.isna` function:: - - >>> np.isna(np.NA) - True - >>> np.isna(1.5) - False - >>> np.isna(np.nan) - False - >>> np.isna(np.NA * 3) - True - >>> (np.NA * 3) is np.NA - False - - -Creating NA-Masked Arrays -========================= - -Because having NA support adds some overhead to NumPy arrays, one -must explicitly request it when creating arrays. There are several ways -to get an NA-masked array. The easiest way is to include an NA -value in the list used to construct the array.:: - - >>> a = np.array([1,3,5]) - >>> a - array([1, 3, 5]) - >>> a.flags.maskna - False - - >>> b = np.array([1,3,np.NA]) - >>> b - array([1, 3, NA]) - >>> b.flags.maskna - True - -If one already has an array without an NA-mask, it can be added -by directly setting the *maskna* flag to True. Assigning an NA -to an array without NA support will raise an error rather than -automatically creating an NA-mask, with the idea that supporting -NA should be an explicit user choice.:: - - >>> a = np.array([1,3,5]) - >>> a[1] = np.NA - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: Cannot assign NA to an array which does not support NAs - >>> a.flags.maskna = True - >>> a[1] = np.NA - >>> a - array([1, NA, 5]) - -Most array construction functions have a new parameter *maskna*, which -can be set to True to produce an array with an NA-mask.:: - - >>> np.arange(5., maskna=True) - array([ 0., 1., 2., 3., 4.], maskna=True) - >>> np.eye(3, maskna=True) - array([[ 1., 0., 0.], - [ 0., 1., 0.], - [ 0., 0., 1.]], maskna=True) - >>> np.array([1,3,5], maskna=True) - array([1, 3, 5], maskna=True) - -Creating NA-Masked Views -======================== - -It will sometimes be desirable to view an array with an NA-mask, without -adding an NA-mask to that array. This is possible by taking an NA-masked -view of the array. There are two ways to do this, one which simply -guarantees that the view has an NA-mask, and another which guarantees that the -view has its own NA-mask, even if the array already had an NA-mask. - -Starting with a non-masked array, we can use the :func:`ndarray.view` method -to get an NA-masked view.:: - - >>> a = np.array([1,3,5]) - >>> b = a.view(maskna=True) - - >>> b[2] = np.NA - >>> a - array([1, 3, 5]) - >>> b - array([1, 3, NA]) - - >>> b[0] = 2 - >>> a - array([2, 3, 5]) - >>> b - array([2, 3, NA]) - - -It is important to be cautious here, though, since if the array already -has a mask, this will also take a view of that mask. This means the original -array's mask will be affected by assigning NA to the view.:: - - >>> a = np.array([1,np.NA,5]) - >>> b = a.view(maskna=True) - - >>> b[2] = np.NA - >>> a - array([1, NA, NA]) - >>> b - array([1, NA, NA]) - - >>> b[1] = 4 - >>> a - array([1, 4, NA]) - >>> b - array([1, 4, NA]) - - -To guarantee that the view created has its own NA-mask, there is another -flag *ownmaskna*. Using this flag will cause a copy of the array's mask -to be created for the view when the array already has a mask.:: - - >>> a = np.array([1,np.NA,5]) - >>> b = a.view(ownmaskna=True) - - >>> b[2] = np.NA - >>> a - array([1, NA, 5]) - >>> b - array([1, NA, NA]) - - >>> b[1] = 4 - >>> a - array([1, NA, 5]) - >>> b - array([1, 4, NA]) - - -In general, when an NA-masked view of an array has been taken, any time -an NA is assigned to an element of the array the data for that element -will remain untouched. This mechanism allows for multiple temporary -views with NAs of the same original array. - -NA-Masked Reductions -==================== - -Many of NumPy's reductions like :func:`numpy.sum` and :func:`numpy.std` -have been extended to work with NA-masked arrays. A consequence of the -missing value model is that any NA value in an array will cause the -output including that value to become NA.:: - - >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) - >>> a.sum(axis=0) - array([1, NA, NA, 4]) - >>> a.sum(axis=1) - array([NA, NA], dtype=int64) - -This is not always the desired result, so NumPy includes a parameter -*skipna* which causes the NA values to be skipped during computation.:: - - >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) - >>> a.sum(axis=0, skipna=True) - array([1, 2, 1, 4]) - >>> a.sum(axis=1, skipna=True) - array([6, 2]) - -Iterating Over NA-Masked Arrays -=============================== - -The :class:`nditer` object can be used to iterate over arrays with -NA values just like over normal arrays.:: - - >>> a = np.array([1,3,np.NA]) - >>> for x in np.nditer(a): - ... print x, - ... - 1 3 NA - >>> b = np.zeros(3, maskna=True) - >>> for x, y in np.nditer([a,b], op_flags=[['readonly'], - ... ['writeonly']]): - ... y[...] = -x - ... - >>> b - array([-1., -3., NA]) - -When using the C-API version of the nditer, one must explicitly -add the NPY_ITER_USE_MASKNA flag and take care to deal with the NA -mask appropriately. In the Python exposure, this flag is added -automatically. - -Planned Future Additions -======================== - -The NA support in 1.7 is fairly preliminary, and is focused on getting -the basics solid. This particularly meant getting the API in C refined -to a level where adding NA support to all of NumPy and to third party -software using NumPy would be a reasonable task. - -The biggest missing feature within the core is supporting NA values with -structured arrays. The design for this involves a mask slot for each -field in the structured array, motivated by the fact that many important -uses of structured arrays involve treating the structured fields like -another dimension. - -Another feature that was discussed during the design process is the ability -to support more than one NA value. The design created supports this multi-NA -idea with the addition of a payload to the NA value and to the NA-mask. -The API has been designed in such a way that adding this feature in a future -release should be possible without changing existing API functions in any way. - -To see a more complete list of what is supported and unsupported in the -1.7 release of NumPy, please refer to the release notes. - -During the design phase of this feature, two implementation approaches -for NA values were discussed, called "mask" and "bitpattern". What -has been implemented is the "mask" approach, but the design document, -or "NEP", describes a way both approaches could co-operatively exist -in NumPy, since each has both pros and cons. This design document is -available in the file "doc/neps/missing-data.rst" of the NumPy source -code. diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 91b43132a..40c9f755d 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -44,7 +44,6 @@ of also more complicated arrangements of data. arrays.indexing arrays.nditer arrays.classes - arrays.maskna maskedarray arrays.interface arrays.datetime diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index 8eedc689a..8736cbc3f 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -92,40 +92,6 @@ sub-types). A synonym for PyArray_DESCR, named to be consistent with the 'dtype' usage within Python. -.. cfunction:: npy_bool PyArray_HASMASKNA(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns true if the array has an NA-mask, false otherwise. - -.. cfunction:: PyArray_Descr *PyArray_MASKNA_DTYPE(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a borrowed reference to the dtype property for the NA mask - of the array, or NULL if the array has no NA mask. This function does - not raise an exception when it returns NULL, it is simply returning - the appropriate field. - -.. cfunction:: char *PyArray_MASKNA_DATA(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a pointer to the raw data for the NA mask of the array, - or NULL if the array has no NA mask. This function does - not raise an exception when it returns NULL, it is simply returning - the appropriate field. - -.. cfunction:: npy_intp *PyArray_MASKNA_STRIDES(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a pointer to strides of the NA mask of the array, If the - array has no NA mask, the values contained in the array will be - invalid. The shape of the NA mask is identical to the shape of the - array itself, so the number of strides is always the same as the - number of array dimensions. - .. cfunction:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags) .. versionadded:: 1.7 @@ -254,11 +220,6 @@ From scratch provided *dims* and *strides* are copied into newly allocated dimension and strides arrays for the new array object. - Because the flags are ignored when *data* is NULL, you cannot - create a new array from scratch with an NA mask. If one is desired, - call the function :cfunc:`PyArray_AllocateMaskNA` after the array - is created. - .. cfunction:: PyObject* PyArray_NewLikeArray(PyArrayObject* prototype, NPY_ORDER order, PyArray_Descr* descr, int subok) .. versionadded:: 1.6 @@ -281,11 +242,6 @@ From scratch *prototype* to create the new array, otherwise it will create a base-class array. - The newly allocated array does not have an NA mask even if the - *prototype* provided does. If an NA mask is desired in the array, - call the function :cfunc:`PyArray_AllocateMaskNA` after the array - is created. - .. cfunction:: PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num, npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj) This is similar to :cfunc:`PyArray_DescrNew` (...) except you @@ -475,31 +431,6 @@ From other objects with, then an error is raised. If *op* is not already an array, then this flag has no effect. - .. cvar:: NPY_ARRAY_MASKNA - - .. versionadded:: 1.7 - - Make sure the array has an NA mask associated with its data. - - .. cvar:: NPY_ARRAY_OWNMASKNA - - .. versionadded:: 1.7 - - Make sure the array has an NA mask which it owns - associated with its data. - - .. cvar:: NPY_ARRAY_ALLOWNA - - .. versionadded:: 1.7 - - To prevent simple errors from slipping in, arrays with NA - masks are not permitted to pass through by default. Instead - an exception is raised indicating the operation doesn't support - NA masks yet. In order to enable NA mask support, this flag - must be passed in to allow the NA mask through, signalling that - the later code is written appropriately to handle NA mask - semantics. - .. cvar:: NPY_ARRAY_BEHAVED :cdata:`NPY_ARRAY_ALIGNED` \| :cdata:`NPY_ARRAY_WRITEABLE` @@ -1415,24 +1346,6 @@ or :cdata:`NPY_ARRAY_F_CONTIGUOUS` can be determined by the ``strides``, would have returned an error because :cdata:`NPY_ARRAY_UPDATEIFCOPY` would not have been possible. -.. cvar:: NPY_ARRAY_MASKNA - - If this flag is enabled, the array has an NA mask associated with - the data. C code which interacts with the NA mask must follow - specific semantic rules about when to overwrite data and when not - to. The mask can be accessed through the functions - :cfunc:`PyArray_MASKNA_DTYPE`, :cfunc:`PyArray_MASKNA_DATA`, and - :cfunc:`PyArray_MASKNA_STRIDES`. - -.. cvar:: NPY_ARRAY_OWNMASKNA - - If this flag is enabled, the array owns its own NA mask. If it is not - enabled, the NA mask is a view into a different array's NA mask. - - In order to ensure that an array owns its own NA mask, you can - call :cfunc:`PyArray_AllocateMaskNA` with the parameter *ownmaskna* - set to 1. - :cfunc:`PyArray_UpdateFlags` (obj, flags) will update the ``obj->flags`` for ``flags`` which can be any of :cdata:`NPY_ARRAY_C_CONTIGUOUS`, :cdata:`NPY_ARRAY_F_CONTIGUOUS`, :cdata:`NPY_ARRAY_ALIGNED`, or @@ -2541,9 +2454,6 @@ Array Scalars if so, returns the appropriate array scalar. It should be used whenever 0-dimensional arrays could be returned to Python. - If *arr* is a 0-dimensional NA-masked array with its value hidden, - an instance of :ctype:`NpyNA *` is returned. - .. cfunction:: PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize) Return an array scalar object of the given enumerated *typenum* @@ -2756,19 +2666,6 @@ to. . No matter what is returned, you must DECREF the object returned by this routine in *address* when you are done with it. - If the input is an array with NA support, this will either raise - an error if it contains any NAs, or will make a copy of the array - without NA support if it does not contain any NAs. Use the function - :cfunc:`PyArray_AllowNAConverter` to support NA-arrays directly - and more efficiently. - -.. cfunction:: int PyArray_AllowConverter(PyObject* obj, PyObject** address) - - This is the same as :cfunc:`PyArray_Converter`, but allows arrays - with NA support to pass through untouched. This function was created - so that the existing converter could raise errors appropriately - for functions which have not been updated with NA support - .. cfunction:: int PyArray_OutputConverter(PyObject* obj, PyArrayObject** address) This is a default converter for output arrays given to @@ -2777,17 +2674,6 @@ to. *obj*) is TRUE then it is returned in *\*address* without incrementing its reference count. - If the output is an array with NA support, this will raise an error. - Use the function :cfunc:`PyArray_OutputAllowNAConverter` to support - NA-arrays directly. - -.. cfunction:: int PyArray_OutputAllowNAConverter(PyObject* obj, PyArrayObject** address) - - This is the same as :cfunc:`PyArray_OutputConverter`, but allows arrays - with NA support to pass through. This function was created - so that the existing output converter could raise errors appropriately - for functions which have not been updated with NA support - .. cfunction:: int PyArray_IntpConverter(PyObject* obj, PyArray_Dims* seq) Convert any Python sequence, *obj*, smaller than :cdata:`NPY_MAXDIMS` diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst deleted file mode 100644 index 6abb624eb..000000000 --- a/doc/source/reference/c-api.maskna.rst +++ /dev/null @@ -1,592 +0,0 @@ -Array NA Mask API -================== - -.. sectionauthor:: Mark Wiebe - -.. index:: - pair: maskna; C-API - pair: C-API; maskna - -.. versionadded:: 1.7 - -NA Masks in Arrays ------------------- - -NumPy supports the idea of NA (Not Available) missing values in its -arrays. In the design document leading up to the implementation, two -mechanisms for this were proposed, NA masks and NA bitpatterns. NA masks -have been implemented as the first representation of these values. This -mechanism supports working with NA values similar to what the R language -provides, and when combined with views, allows one to temporarily mark -elements as NA without affecting the original data. - -The C API has been updated with mechanisms to allow NumPy extensions -to work with these masks, and this document provides some examples and -reference for the NA mask-related functions. - -The NA Object -------------- - -The main *numpy* namespace in Python has a new object called *NA*. -This is an instance of :ctype:`NpyNA`, which is a Python object -representing an NA value. This object is analogous to the NumPy -scalars, and is returned by :cfunc:`PyArray_Return` instead of -a scalar where appropriate. - -The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`. -This is an NA value with no data type or multi-NA payload. Use it -just as you would Py_None, except use :cfunc:`NpyNA_Check` to -see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't -the only instance of NA possible. - -If you want to see whether a general PyObject* is NA, you should -use the API function :cfunc:`NpyNA_FromObject` with *suppress_error* -set to true. If this returns NULL, the object is not an NA, and if -it returns an NpyNA instance, the object is NA and you can then -access its *dtype* and *payload* fields as needed. - -To make new :ctype:`NpyNA` objects, use -:cfunc:`NpyNA_FromDTypeAndPayload`. The functions -:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and -:cfunc:`NpyNA_GetPayload` provide access to the data members. - -Working With NA-Masked Arrays ------------------------------ - -The starting point for many C-API functions which manipulate NumPy -arrays is the function :cfunc:`PyArray_FromAny`. This function converts -a general PyObject* object into a NumPy ndarray, based on options -specified in the flags. To avoid surprises, this function does -not allow NA-masked arrays to pass through by default. - -To allow third-party code to work with NA-masked arrays which contain -no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into -a new array without an NA-mask, and return that. This allows for -proper interoperability in cases where it's possible until functions -are updated to provide optimal code paths for NA-masked arrays. - -To update a function with NA-mask support, add the flag -:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`. -This allows NA-masked arrays to pass through untouched, and will -convert PyObject lists containing NA values into NA-masked arrays -instead of the alternative of switching to object arrays. - -To check whether an array has an NA-mask, use the function -:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag. -There are a number of things that one will typically want to do -when encountering an NA-masked array. We'll go through a few -of these cases. - -Forbidding Any NA Values -~~~~~~~~~~~~~~~~~~~~~~~~ - -The simplest case is to forbid any NA values. Note that it is better -to still be aware of the NA mask and explicitly test for NA values -than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible -to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The -check for NAs will go something like this:: - - PyArrayObject *arr = ...; - int containsna; - - /* ContainsNA checks HASMASKNA() for you */ - containsna = PyArray_ContainsNA(arr, NULL, NULL); - /* Error case */ - if (containsna < 0) { - return NULL; - } - /* If it found an NA */ - else if (containsna) { - PyErr_SetString(PyExc_ValueError, - "this operation does not support arrays with NA values"); - return NULL; - } - -After this check, you can be certain that the array doesn't contain any -NA values, and can proceed accordingly. For example, if you iterate -over the elements of the array, you may pass the flag -:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without -touching the NA-mask at all. - -Manipulating NA Values -~~~~~~~~~~~~~~~~~~~~~~ - -The semantics of the NA-mask demand that whenever an array element -is hidden by the NA-mask, no computations are permitted to modify -the data backing that element. The :ctype:`NpyIter` provides -a number of flags to assist with visiting both the array data -and the mask data simultaneously, and preserving the masking semantics -even when buffering is required. - -The main flag for iterating over NA-masked arrays is -:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this -flag specified, a new operand is added to the end of the iterator operand -list, and is set to iterate over the original operand's NA-mask. Operands -which do not have an NA mask are permitted as well when they are flagged -as read-only. The new operand in this case points to a single exposed -mask value and all its strides are zero. The latter feature is useful -when combining multiple read-only inputs, where some of them have masks. - -Accumulating NA Values -~~~~~~~~~~~~~~~~~~~~~~ - -More complex operations, like the NumPy ufunc reduce functions, need -to take extra care to follow the masking semantics. If we accumulate -the NA mask and the data values together, we could discover half way -through that the output is NA, and that we have violated the contract -to never change the underlying output value when it is being assigned -NA. - -The solution to this problem is to first accumulate the NA-mask as necessary -to produce the output's NA-mask, then accumulate the data values without -touching NA-masked values in the output. The parameter *preservena* in -functions like :cfunc:`PyArray_AssignArray` can assist when initializing -values in such an algorithm. - -Example NA-Masked Operation in C --------------------------------- - -As an example, let's implement a simple binary NA-masked operation -for the double dtype. We'll make a divide operation which turns -divide by zero into NA instead of Inf or NaN. - -To start, we define the function prototype and some basic -:ctype:`NpyIter` boilerplate setup. We'll make a function which -supports an optional *out* parameter, which may be NULL.:: - - static PyArrayObject* - SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out) - { - NpyIter *iter = NULL; - PyArrayObject *op[3]; - PyArray_Descr *dtypes[3]; - npy_uint32 flags, op_flags[3]; - - /* Iterator construction parameters */ - op[0] = a; - op[1] = b; - op[2] = out; - - dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE); - if (dtypes[0] == NULL) { - return NULL; - } - dtypes[1] = dtypes[0]; - dtypes[2] = dtypes[0]; - - flags = NPY_ITER_BUFFERED | - NPY_ITER_EXTERNAL_LOOP | - NPY_ITER_GROWINNER | - NPY_ITER_REFS_OK | - NPY_ITER_ZEROSIZE_OK; - - /* Every operand gets the flag NPY_ITER_USE_MASKNA */ - op_flags[0] = NPY_ITER_READONLY | - NPY_ITER_ALIGNED | - NPY_ITER_USE_MASKNA; - op_flags[1] = op_flags[0]; - op_flags[2] = NPY_ITER_WRITEONLY | - NPY_ITER_ALIGNED | - NPY_ITER_USE_MASKNA | - NPY_ITER_NO_BROADCAST | - NPY_ITER_ALLOCATE; - - iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER, - NPY_SAME_KIND_CASTING, op_flags, dtypes); - /* Don't need the dtype reference anymore */ - Py_DECREF(dtypes[0]); - if (iter == NULL) { - return NULL; - } - -At this point, the input operands have been validated according to -the casting rule, the shapes of the arrays have been broadcast together, -and any buffering necessary has been prepared. This means we can -dive into the inner loop of this function.:: - - ... - if (NpyIter_GetIterSize(iter) > 0) { - NpyIter_IterNextFunc *iternext; - char **dataptr; - npy_intp *stridesptr, *countptr; - - /* Variables needed for looping */ - iternext = NpyIter_GetIterNext(iter, NULL); - if (iternext == NULL) { - NpyIter_Deallocate(iter); - return NULL; - } - dataptr = NpyIter_GetDataPtrArray(iter); - stridesptr = NpyIter_GetInnerStrideArray(iter); - countptr = NpyIter_GetInnerLoopSizePtr(iter); - -The loop gets a bit messy when dealing with NA-masks, because it -doubles the number of operands being processed in the iterator. Here -we are naming things clearly so that the content of the innermost loop -can be easy to work with.:: - - ... - do { - /* Data pointers and strides needed for innermost loop */ - char *data_a = dataptr[0], *data_b = dataptr[1]; - char *data_out = dataptr[2]; - char *maskna_a = dataptr[3], *maskna_b = dataptr[4]; - char *maskna_out = dataptr[5]; - npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1]; - npy_intp stride_out = strides[2]; - npy_intp maskna_stride_a = stridesptr[3]; - npy_intp maskna_stride_b = stridesptr[4]; - npy_intp maskna_stride_out = stridesptr[5]; - npy_intp i, count = *countptr; - - for (i = 0; i < count; ++i) { - -Here is the code for performing one special division. We use -the functions :cfunc:`NpyMaskValue_IsExposed` and -:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be -as general as possible. These are inline functions, and the compiler -optimizer should be able to produce the same result as if you performed -these operations directly inline here.:: - - ... - /* If neither of the inputs are NA */ - if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) && - NpyMaskValue_IsExposed((npy_mask)*maskna_b)) { - double a_val = *(double *)data_a; - double b_val = *(double *)data_b; - /* Do the divide if 'b' isn't zero */ - if (b_val != 0.0) { - *(double *)data_out = a_val / b_val; - /* Need to also set this element to exposed */ - *maskna_out = NpyMaskValue_Create(1, 0); - } - /* Otherwise output an NA without touching its data */ - else { - *maskna_out = NpyMaskValue_Create(0, 0); - } - } - /* Turn the output into NA without touching its data */ - else { - *maskna_out = NpyMaskValue_Create(0, 0); - } - - data_a += stride_a; - data_b += stride_b; - data_out += stride_out; - maskna_a += maskna_stride_a; - maskna_b += maskna_stride_b; - maskna_out += maskna_stride_out; - } - } while (iternext(iter)); - } - -A little bit more boilerplate for returning the result from the iterator, -and the function is done.:: - - ... - if (out == NULL) { - out = NpyIter_GetOperandArray(iter)[2]; - } - Py_INCREF(out); - NpyIter_Deallocate(iter); - - return out; - } - -To run this example, you can create a simple module with a C-file spdiv_mod.c -consisting of:: - - #include <Python.h> - #include <numpy/arrayobject.h> - - /* INSERT SpecialDivide source code here */ - - static PyObject * - spdiv(PyObject *self, PyObject *args, PyObject *kwds) - { - PyArrayObject *a, *b, *out = NULL; - static char *kwlist[] = {"a", "b", "out", NULL}; - - if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist, - &PyArray_AllowNAConverter, &a, - &PyArray_AllowNAConverter, &b, - &PyArray_OutputAllowNAConverter, &out)) { - return NULL; - } - - /* - * The usual NumPy way is to only use PyArray_Return when - * the 'out' parameter is not provided. - */ - if (out == NULL) { - return PyArray_Return(SpecialDivide(a, b, out)); - } - else { - return (PyObject *)SpecialDivide(a, b, out); - } - } - - static PyMethodDef SpDivMethods[] = { - {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL}, - {NULL, NULL, 0, NULL} - }; - - - PyMODINIT_FUNC initspdiv_mod(void) - { - PyObject *m; - - m = Py_InitModule("spdiv_mod", SpDivMethods); - if (m == NULL) { - return; - } - - /* Make sure NumPy is initialized */ - import_array(); - } - -Create a setup.py file like:: - - #!/usr/bin/env python - def configuration(parent_package='',top_path=None): - from numpy.distutils.misc_util import Configuration - config = Configuration('.',parent_package,top_path) - config.add_extension('spdiv_mod',['spdiv_mod.c']) - return config - - if __name__ == "__main__": - from numpy.distutils.core import setup - setup(configuration=configuration) - -With these two files in a directory by itself, run:: - - $ python setup.py build_ext --inplace - -and the file spdiv_mod.so (or .dll) will be placed in the same directory. -Now you can try out this sample, to see how it behaves.:: - - >>> import numpy as np - >>> from spdiv_mod import spdiv - -Because we used :cfunc:`PyArray_Return` when wrapping SpecialDivide, -it returns scalars like any typical NumPy function does:: - - >>> spdiv(1, 2) - 0.5 - >>> spdiv(2, 0) - NA(dtype='float64') - >>> spdiv(np.NA, 1.5) - NA(dtype='float64') - -Here we can see how NAs propagate, and how 0 in the output turns into NA -as desired.:: - - >>> a = np.arange(6) - >>> b = np.array([0,np.NA,0,2,1,0]) - >>> spdiv(a, b) - array([ NA, NA, NA, 1.5, 4. , NA]) - -Finally, we can see the masking behavior by creating a masked -view of an array. The ones in *c_orig* are preserved whereever -NA got assigned.:: - - >>> c_orig = np.ones(6) - >>> c = c_orig.view(maskna=True) - >>> spdiv(a, b, out=c) - array([ NA, NA, NA, 1.5, 4. , NA]) - >>> c_orig - array([ 1. , 1. , 1. , 1.5, 4. , 1. ]) - -NA Object Data Type -------------------- - -.. ctype:: NpyNA - - This is the C object corresponding to objects of type - numpy.NAType. The fields themselves are hidden from consumers of the - API, you must use the functions provided to create new NA objects - and get their properties. - - This object contains two fields, a :ctype:`PyArray_Descr *` dtype - which is either NULL or indicates the data type the NA represents, - and a payload which is there for the future addition of multi-NA support. - -.. cvar:: Npy_NA - - This is a global singleton, similar to Py_None, which is the - *numpy.NA* object. Note that unlike Py_None, multiple NAs may be - created, for instance with different multi-NA payloads or with - different dtypes. If you want to return an NA with no payload - or dtype, return a new reference to Npy_NA. - -NA Object Functions -------------------- - -.. cfunction:: NpyNA_Check(obj) - - Evaluates to true if *obj* is an instance of :ctype:`NpyNA`. - -.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na) - - Returns the *dtype* field of the NA object, which is NULL when - the NA has no dtype. Does not raise an error. - -.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na) - - Returns true if the NA has a multi-NA payload, false otherwise. - -.. cfunction:: int NpyNA_GetPayload(NpyNA* na) - - Gets the multi-NA payload of the NA, or 0 if *na* doesn't have - a multi-NA payload. - -.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error) - - If *obj* represents an object which is NA, for example if it - is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with - its value hidden by the mask, returns a new reference to an - :ctype:`NpyNA` object representing *obj*. Otherwise returns - NULL. - - If *suppress_error* is true, this function doesn't raise an exception - when the input isn't NA and it returns NULL, otherwise it does. - -.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload) - - - Constructs a new :ctype:`NpyNA` instance with the specified *dtype* - and *payload*. For an NA with no dtype, provide NULL in *dtype*. - - Until multi-NA is implemented, just pass 0 for both *multina* - and *payload*. - -NA Mask Functions ------------------ - -A mask dtype can be one of three different possibilities. It can -be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose -fields are all mask dtypes. - -A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying -value 1, for an element that is exposed, and False, with underlying -value 0, for an element that is hidden. - -A mask of :cdata:`NPY_MASK` can additionally carry a payload which -is a value from 0 to 127. This allows for missing data implementations -based on such masks to support multiple reasons for data being missing. - -A mask of a struct dtype can only pair up with another struct dtype -with the same field names. In this way, each field of the mask controls -the masking for the corresponding field in the associated data array. - -Inline functions to work with masks are as follows. - -.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask) - - Returns true if the data element corresponding to the mask element - can be modified, false if not. - -.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask) - - Returns the payload contained in the mask. The return value - is between 0 and 127. - -.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload) - - Creates a mask from a flag indicating whether the element is exposed - or not and a payload value. - -NA Mask Array Functions ------------------------ - -.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask) - - Allocates an NA mask for the array *arr* if necessary. If *ownmaskna* - if false, it only allocates an NA mask if none exists, but if - *ownmaskna* is true, it also allocates one if the NA mask is a view - into another array's NA mask. Here are the two most common usage - patterns:: - - /* Use this to make sure 'arr' has an NA mask */ - if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) { - return NULL; - } - - /* Use this to make sure 'arr' owns an NA mask */ - if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) { - return NULL; - } - - The parameter *multina* is provided for future expansion, when - mult-NA support is added to NumPy. This will affect the dtype of - the NA mask, which currently must be always NPY_BOOL, but will be - NPY_MASK for arrays multi-NA when this is implemented. - - When a new NA mask is allocated, and the mask needs to be filled, - it uses the value *defaultmask*. In nearly all cases, this should be set - to 1, indicating that the elements are exposed. If a mask is allocated - just because of *ownmaskna*, the existing mask values are copied - into the newly allocated mask. - - This function returns 0 for success, -1 for failure. - -.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr) - - Returns true if *arr* is an array which supports NA. This function - exists because the design for adding NA proposed two mechanisms - for NAs in NumPy, NA masks and NA bitpatterns. Currently, just - NA masks have been implemented, but when NA bitpatterns are implemented - this would return true for arrays with an NA bitpattern dtype as well. - -.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna) - - Checks whether the array *arr* contains any NA values. - - If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can - broadcast onto *arr*. Whereever the where mask is True, *arr* - is checked for NA, and whereever it is False, the *arr* value is - ignored. - - The parameter *whichna* is provided for future expansion to multi-NA - support. When implemented, this parameter will be a 128 element - array of npy_bool, with the value True for the NA values that are - being looked for. - - This function returns 1 when the array contains NA values, 0 when - it does not, and -1 when a error has occurred. - -.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) - - Assigns the given *na* value to elements of *arr*. - - If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable - onto *arr*, and only elements of *arr* with a corresponding value - of True in *wheremask* will have *na* assigned. - - The parameters *preservena* and *preservewhichna* are provided for - future expansion to multi-NA support. With a single NA value, one - NA cannot be distinguished from another, so preserving NA values - does not make sense. With multiple NA values, preserving NA values - becomes an important concept because that implies not overwriting the - multi-NA payloads. The parameter *preservewhichna* will be a 128 element - array of npy_bool, indicating which NA payloads to preserve. - - This function returns 0 for success, -1 for failure. - -.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) - - Assigns the given NA mask *maskvalue* to elements of *arr*. - - If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable - onto *arr*, and only elements of *arr* with a corresponding value - of True in *wheremask* will have the NA *maskvalue* assigned. - - The parameters *preservena* and *preservewhichna* are provided for - future expansion to multi-NA support. With a single NA value, one - NA cannot be distinguished from another, so preserving NA values - does not make sense. With multiple NA values, preserving NA values - becomes an important concept because that implies not overwriting the - multi-NA payloads. The parameter *preservewhichna* will be a 128 element - array of npy_bool, indicating which NA payloads to preserve. - - This function returns 0 for success, -1 for failure. diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst index 6e97cec36..b1a5eb477 100644 --- a/doc/source/reference/c-api.rst +++ b/doc/source/reference/c-api.rst @@ -45,7 +45,6 @@ code. c-api.dtype c-api.array c-api.iterator - c-api.maskna c-api.ufunc c-api.generalized-ufuncs c-api.coremath diff --git a/doc/source/reference/routines.polynomials.classes.rst b/doc/source/reference/routines.polynomials.classes.rst index 2cfbec5d9..9294728c8 100644 --- a/doc/source/reference/routines.polynomials.classes.rst +++ b/doc/source/reference/routines.polynomials.classes.rst @@ -322,31 +322,3 @@ illustrated below for a fit to a noisy sin curve. >>> p.window array([-1., 1.]) >>> plt.show() - -The fit will ignore data points masked with NA. We demonstrate this with -the previous example, but add an outlier that messes up the fit, then mask -it out. - -.. plot:: - - >>> import numpy as np - >>> import matplotlib.pyplot as plt - >>> from numpy.polynomial import Chebyshev as T - >>> np.random.seed(11) - >>> x = np.linspace(0, 2*np.pi, 20) - >>> y = np.sin(x) + np.random.normal(scale=.1, size=x.shape) - >>> y[10] = 2 - >>> p = T.fit(x, y, 5) - >>> plt.plot(x, y, 'o') - [<matplotlib.lines.Line2D object at 0x2136c10>] - >>> xx, yy = p.linspace() - >>> plt.plot(xx, yy, lw=2, label="unmasked") - [<matplotlib.lines.Line2D object at 0x1cf2890>] - >>> ym = y.view(maskna=1) - >>> ym[10] = np.NA - >>> p = T.fit(x, ym, 5) - >>> xx, yy = p.linspace() - >>> plt.plot(xx, yy, lw=2, label="masked") - >>> plt.legend(loc="upper right") - <matplotlib.legend.Legend object at 0x3b3ee10> - >>> plt.show() diff --git a/doc/source/reference/routines.rst b/doc/source/reference/routines.rst index 10d12330c..37b16de59 100644 --- a/doc/source/reference/routines.rst +++ b/doc/source/reference/routines.rst @@ -34,7 +34,6 @@ indentation. routines.linalg routines.logic routines.ma - routines.maskna routines.math routines.matlib routines.numarray |