summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/source/user/basics.io.genfromtxt.rst391
1 files changed, 239 insertions, 152 deletions
diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst
index 82c7a661a..edf48bc15 100644
--- a/doc/source/user/basics.io.genfromtxt.rst
+++ b/doc/source/user/basics.io.genfromtxt.rst
@@ -7,32 +7,37 @@ Importing data with :func:`~numpy.genfromtxt`
Numpy provides several functions to create arrays from tabular data.
We focus here on the :func:`~numpy.genfromtxt` function.
-In a nutshell, :func:`~numpy.genfromtxt` runs two main loops.
-The first loop converts each line of the file in a sequence of strings.
-The second loop converts each string to the appropriate data type.
-This mechanism is slower than a single loop, but gives more flexibility.
-In particular, :func:`~numpy.genfromtxt` is able to take missing data into account, when other faster and simpler functions like :func:`~numpy.loadtxt` cannot
-
+In a nutshell, :func:`~numpy.genfromtxt` runs two main loops. The first
+loop converts each line of the file in a sequence of strings. The second
+loop converts each string to the appropriate data type. This mechanism is
+slower than a single loop, but gives more flexibility. In particular,
+:func:`~numpy.genfromtxt` is able to take missing data into account, when
+other faster and simpler functions like :func:`~numpy.loadtxt` cannot.
.. note::
- When giving examples, we will use the following conventions
-
- >>> import numpy as np
- >>> from StringIO import StringIO
+
+ When giving examples, we will use the following conventions::
+
+ >>> import numpy as np
+ >>> from StringIO import StringIO
Defining the input
==================
-The only mandatory argument of :func:`~numpy.genfromtxt` is the source of the data.
-It can be a string corresponding to the name of a local or remote file, or a file-like object with a :meth:`read` method (such as an actual file or a :class:`StringIO.StringIO` object).
-If the argument is the URL of a remote file, this latter is automatically downloaded in the current directory.
+The only mandatory argument of :func:`~numpy.genfromtxt` is the source of
+the data. It can be a string corresponding to the name of a local or
+remote file, or a file-like object with a :meth:`read` method (such as an
+actual file or a :class:`StringIO.StringIO` object). If the argument is
+the URL of a remote file, this latter is automatically downloaded in the
+current directory.
-The input file can be a text file or an archive.
-Currently, the function recognizes :class:`gzip` and :class:`bz2` (`bzip2`) archives.
-The type of the archive is determined by examining the extension of the file:
-if the filename ends with ``'.gz'``, a :class:`gzip` archive is expected; if it ends with ``'bz2'``, a :class:`bzip2` archive is assumed.
+The input file can be a text file or an archive. Currently, the function
+recognizes :class:`gzip` and :class:`bz2` (`bzip2`) archives. The type of
+the archive is determined by examining the extension of the file: if the
+filename ends with ``'.gz'``, a :class:`gzip` archive is expected; if it
+ends with ``'bz2'``, a :class:`bzip2` archive is assumed.
@@ -42,24 +47,30 @@ Splitting the lines into columns
The :keyword:`delimiter` argument
---------------------------------
-Once the file is defined and open for reading, :func:`~numpy.genfromtxt` splits each non-empty line into a sequence of strings.
-Empty or commented lines are just skipped.
-The :keyword:`delimiter` keyword is used to define how the splitting should take place.
+Once the file is defined and open for reading, :func:`~numpy.genfromtxt`
+splits each non-empty line into a sequence of strings. Empty or commented
+lines are just skipped. The :keyword:`delimiter` keyword is used to define
+how the splitting should take place.
-Quite often, a single character marks the separation between columns.
-For example, comma-separated files (CSV) use a comma (``,``) or a semicolon (``;``) as delimiter.
+Quite often, a single character marks the separation between columns. For
+example, comma-separated files (CSV) use a comma (``,``) or a semicolon
+(``;``) as delimiter::
>>> data = "1, 2, 3\n4, 5, 6"
>>> np.genfromtxt(StringIO(data), delimiter=",")
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
-Another common separator is ``"\t"``, the tabulation character.
-However, we are not limited to a single character, any string will do.
-By default, :func:`~numpy.genfromtxt` assumes ``delimiter=None``, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.
+Another common separator is ``"\t"``, the tabulation character. However,
+we are not limited to a single character, any string will do. By default,
+:func:`~numpy.genfromtxt` assumes ``delimiter=None``, meaning that the line
+is split along white spaces (including tabs) and that consecutive white
+spaces are considered as a single white space.
-Alternatively, we may be dealing with a fixed-width file, where columns are defined as a given number of characters.
-In that case, we need to set :keyword:`delimiter` to a single integer (if all the columns have the same size) or to a sequence of integers (if columns can have different sizes).
+Alternatively, we may be dealing with a fixed-width file, where columns are
+defined as a given number of characters. In that case, we need to set
+:keyword:`delimiter` to a single integer (if all the columns have the same
+size) or to a sequence of integers (if columns can have different sizes)::
>>> data = " 1 2 3\n 4 5 67\n890123 4"
>>> np.genfromtxt(StringIO(data), delimiter=3)
@@ -76,29 +87,32 @@ In that case, we need to set :keyword:`delimiter` to a single integer (if all th
The :keyword:`autostrip` argument
---------------------------------
-By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading nor trailing white spaces.
-This behavior can be overwritten by setting the optional argument :keyword:`autostrip` to a value of ``True``.
+By default, when a line is decomposed into a series of strings, the
+individual entries are not stripped of leading nor trailing white spaces.
+This behavior can be overwritten by setting the optional argument
+:keyword:`autostrip` to a value of ``True``::
>>> data = "1, abc , 2\n 3, xxx, 4"
>>> # Without autostrip
>>> np.genfromtxt(StringIO(data), dtype="|S5")
array([['1', ' abc ', ' 2'],
- ['3', ' xxx', ' 4']],
+ ['3', ' xxx', ' 4']],
dtype='|S5')
>>> # With autostrip
>>> np.genfromtxt(StringIO(data), dtype="|S5", autostrip=True)
array([['1', 'abc', '2'],
- ['3', 'xxx', '4']],
+ ['3', 'xxx', '4']],
dtype='|S5')
-
+
The :keyword:`comments` argument
--------------------------------
-The optional argument :keyword:`comments` is used to define a character string that marks the beginning of a comment.
-By default, :func:`~numpy.genfromtxt` assumes ``comments='#'``.
-The comment marker may occur anywhere on the line.
-Any character present after the comment marker(s) is simply ignored.
+The optional argument :keyword:`comments` is used to define a character
+string that marks the beginning of a comment. By default,
+:func:`~numpy.genfromtxt` assumes ``comments='#'``. The comment marker may
+occur anywhere on the line. Any character present after the comment
+marker(s) is simply ignored::
>>> data = """#
... # Skip me !
@@ -118,7 +132,9 @@ Any character present after the comment marker(s) is simply ignored.
[ 9. 0.]]
.. note::
- There is one notable exception to this behavior: if the optional argument ``names=True``, the first commented line will be examined for names.
+
+ There is one notable exception to this behavior: if the optional argument
+ ``names=True``, the first commented line will be examined for names.
@@ -128,45 +144,54 @@ Skipping lines and choosing columns
The :keyword:`skip_header` and :keyword:`skip_footer` arguments
---------------------------------------------------------------
-The presence of a header in the file can hinder data processing.
-In that case, we need to use the :keyword:`skip_header` optional argument.
-The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed.
-Similarly, we can skip the last ``n`` lines of the file by using the :keyword:`skip_footer` attribute and giving it a value of ``n``.
+The presence of a header in the file can hinder data processing. In that
+case, we need to use the :keyword:`skip_header` optional argument. The
+values of this argument must be an integer which corresponds to the number
+of lines to skip at the beginning of the file, before any other action is
+performed. Similarly, we can skip the last ``n`` lines of the file by
+using the :keyword:`skip_footer` attribute and giving it a value of ``n``::
>>> data = "\n".join(str(i) for i in range(10))
>>> np.genfromtxt(StringIO(data),)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
- >>> np.genfromtxt(StringIO(data),
+ >>> np.genfromtxt(StringIO(data),
... skip_header=3, skip_footer=5)
array([ 3., 4.])
-By default, ``skip_header=0`` and ``skip_footer=0``, meaning that no lines are skipped.
+By default, ``skip_header=0`` and ``skip_footer=0``, meaning that no lines
+are skipped.
The :keyword:`usecols` argument
-------------------------------
-In some cases, we are not interested in all the columns of the data but only a few of them.
-We can select which columns to import with the :keyword:`usecols` argument.
-This argument accepts a single integer or a sequence of integers corresponding to the indices of the columns to import.
-Remember that by convention, the first column has an index of 0.
-Negative integers behave the same as regular Python negative indexes.
+In some cases, we are not interested in all the columns of the data but
+only a few of them. We can select which columns to import with the
+:keyword:`usecols` argument. This argument accepts a single integer or a
+sequence of integers corresponding to the indices of the columns to import.
+Remember that by convention, the first column has an index of 0. Negative
+integers behave the same as regular Python negative indexes.
+
+For example, if we want to import only the first and the last columns, we
+can use ``usecols=(0, -1)``::
-For example, if we want to import only the first and the last columns, we can use ``usecols=(0, -1)``:
>>> data = "1 2 3\n4 5 6"
>>> np.genfromtxt(StringIO(data), usecols=(0, -1))
array([[ 1., 3.],
[ 4., 6.]])
-If the columns have names, we can also select which columns to import by giving their name to the :keyword:`usecols` argument, either as a sequence of strings or a comma-separated string.
+If the columns have names, we can also select which columns to import by
+giving their name to the :keyword:`usecols` argument, either as a sequence
+of strings or a comma-separated string::
+
>>> data = "1 2 3\n4 5 6"
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a", "c"))
- array([(1.0, 3.0), (4.0, 6.0)],
+ array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a, c"))
- array([(1.0, 3.0), (4.0, 6.0)],
+ array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
@@ -175,32 +200,40 @@ If the columns have names, we can also select which columns to import by giving
Choosing the data type
======================
-The main way to control how the sequences of strings we have read from the file are converted to other types is to set the :keyword:`dtype` argument.
+The main way to control how the sequences of strings we have read from the
+file are converted to other types is to set the :keyword:`dtype` argument.
Acceptable values for this argument are:
* a single type, such as ``dtype=float``.
- The output will be 2D with the given dtype, unless a name has been associated with each column with the use of the :keyword:`names` argument (see below).
- Note that ``dtype=float`` is the default for :func:`~numpy.genfromtxt`.
+ The output will be 2D with the given dtype, unless a name has been
+ associated with each column with the use of the :keyword:`names` argument
+ (see below). Note that ``dtype=float`` is the default for
+ :func:`~numpy.genfromtxt`.
* a sequence of types, such as ``dtype=(int, float, float)``.
* a comma-separated string, such as ``dtype="i4,f8,|S3"``.
* a dictionary with two keys ``'names'`` and ``'formats'``.
-* a sequence of tuples ``(name, type)``, such as ``dtype=[('A', int), ('B', float)]``.
+* a sequence of tuples ``(name, type)``, such as
+ ``dtype=[('A', int), ('B', float)]``.
* an existing :class:`numpy.dtype` object.
* the special value ``None``.
- In that case, the type of the columns will be determined from the data itself (see below).
-
-In all the cases but the first one, the output will be a 1D array with a structured dtype.
-This dtype has as many fields as items in the sequence.
+ In that case, the type of the columns will be determined from the data
+ itself (see below).
+
+In all the cases but the first one, the output will be a 1D array with a
+structured dtype. This dtype has as many fields as items in the sequence.
The field names are defined with the :keyword:`names` keyword.
-When ``dtype=None``, the type of each column is determined iteratively from its data.
-We start by checking whether a string can be converted to a boolean (that is, if the string matches ``true`` or ``false`` in lower cases);
-then whether it can be converted to an integer, then to a float, then to a complex and eventually to a string.
-This behavior may be changed by modifying the default mapper of the :class:`~numpy.lib._iotools.StringConverter` class.
+When ``dtype=None``, the type of each column is determined iteratively from
+its data. We start by checking whether a string can be converted to a
+boolean (that is, if the string matches ``true`` or ``false`` in lower
+cases); then whether it can be converted to an integer, then to a float,
+then to a complex and eventually to a string. This behavior may be changed
+by modifying the default mapper of the
+:class:`~numpy.lib._iotools.StringConverter` class.
-The option ``dtype=None`` is provided for convenience.
-However, it is significantly slower than setting the dtype explicitly.
+The option ``dtype=None`` is provided for convenience. However, it is
+significantly slower than setting the dtype explicitly.
@@ -210,83 +243,108 @@ Setting the names
The :keyword:`names` argument
-----------------------------
-A natural approach when dealing with tabular data is to allocate a name to each column.
-A first possibility is to use an explicit structured dtype, as mentioned previously.
+A natural approach when dealing with tabular data is to allocate a name to
+each column. A first possibility is to use an explicit structured dtype,
+as mentioned previously::
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=[(_, int) for _ in "abc"])
- array([(1, 2, 3), (4, 5, 6)],
+ array([(1, 2, 3), (4, 5, 6)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
-
-Another simpler possibility is to use the :keyword:`names` keyword with a sequence of strings or a comma-separated string.
+
+Another simpler possibility is to use the :keyword:`names` keyword with a
+sequence of strings or a comma-separated string::
+
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, names="A, B, C")
- array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
+ array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In the example above, we used the fact that by default, ``dtype=float``.
-By giving a sequence of names, we are forcing the output to a structured dtype.
+By giving a sequence of names, we are forcing the output to a structured
+dtype.
-We may sometimes need to define the column names from the data itself.
-In that case, we must use the :keyword:`names` keyword with a value of ``True``.
-The names will then be read from the first line (after the ``skip_header`` ones), even if the line is commented out.
+We may sometimes need to define the column names from the data itself. In
+that case, we must use the :keyword:`names` keyword with a value of
+``True``. The names will then be read from the first line (after the
+``skip_header`` ones), even if the line is commented out::
>>> data = StringIO("So it goes\n#a b c\n1 2 3\n 4 5 6")
>>> np.genfromtxt(data, skip_header=1, names=True)
- array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
+ array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)],
dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
-The default value of :keyword:`names` is ``None``.
-If we give any other value to the keyword, the new names will overwrite the field names we may have defined with the dtype.
+The default value of :keyword:`names` is ``None``. If we give any other
+value to the keyword, the new names will overwrite the field names we may
+have defined with the dtype::
>>> data = StringIO("1 2 3\n 4 5 6")
>>> ndtype=[('a',int), ('b', float), ('c', int)]
>>> names = ["A", "B", "C"]
>>> np.genfromtxt(data, names=names, dtype=ndtype)
- array([(1, 2.0, 3), (4, 5.0, 6)],
+ array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('A', '<i8'), ('B', '<f8'), ('C', '<i8')])
The :keyword:`defaultfmt` argument
----------------------------------
-If ``names=None`` but a structured dtype is expected, names are defined with the standard NumPy default of ``"f%i"``, yielding names like ``f0``, ``f1`` and so forth.
+If ``names=None`` but a structured dtype is expected, names are defined
+with the standard NumPy default of ``"f%i"``, yielding names like ``f0``,
+``f1`` and so forth::
+
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=(int, float, int))
- array([(1, 2.0, 3), (4, 5.0, 6)],
+ array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<i8')])
-In the same way, if we don't give enough names to match the length of the dtype, the missing names will be defined with this default template.
+In the same way, if we don't give enough names to match the length of the
+dtype, the missing names will be defined with this default template::
+
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=(int, float, int), names="a")
- array([(1, 2.0, 3), (4, 5.0, 6)],
+ array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('a', '<i8'), ('f0', '<f8'), ('f1', '<i8')])
-We can overwrite this default with the :keyword:`defaultfmt` argument, that takes any format string:
+We can overwrite this default with the :keyword:`defaultfmt` argument, that
+takes any format string::
+
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=(int, float, int), defaultfmt="var_%02i")
- array([(1, 2.0, 3), (4, 5.0, 6)],
+ array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('var_00', '<i8'), ('var_01', '<f8'), ('var_02', '<i8')])
.. note::
- We need to keep in mind that ``defaultfmt`` is used only if some names are expected but not defined.
+
+ We need to keep in mind that ``defaultfmt`` is used only if some names
+ are expected but not defined.
Validating names
----------------
-Numpy arrays with a structured dtype can also be viewed as :class:`~numpy.recarray`, where a field can be accessed as if it were an attribute.
-For that reason, we may need to make sure that the field name doesn't contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like ``size`` or ``shape``), which would confuse the interpreter.
-:func:`~numpy.genfromtxt` accepts three optional arguments that provide a finer control on the names:
+Numpy arrays with a structured dtype can also be viewed as
+:class:`~numpy.recarray`, where a field can be accessed as if it were an
+attribute. For that reason, we may need to make sure that the field name
+doesn't contain any space or invalid character, or that it does not
+correspond to the name of a standard attribute (like ``size`` or
+``shape``), which would confuse the interpreter. :func:`~numpy.genfromtxt`
+accepts three optional arguments that provide a finer control on the names:
:keyword:`deletechars`
- Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are ``~!@#$%^&*()-=+~\|]}[{';: /?.>,<``.
+ Gives a string combining all the characters that must be deleted from
+ the name. By default, invalid characters are
+ ``~!@#$%^&*()-=+~\|]}[{';:
+ /?.>,<``.
:keyword:`excludelist`
- Gives a list of the names to exclude, such as ``return``, ``file``, ``print``...
- If one of the input name is part of this list, an underscore character (``'_'``) will be appended to it.
+ Gives a list of the names to exclude, such as ``return``, ``file``,
+ ``print``... If one of the input name is part of this list, an
+ underscore character (``'_'``) will be appended to it.
:keyword:`case_sensitive`
Whether the names should be case-sensitive (``case_sensitive=True``),
- converted to upper case (``case_sensitive=False`` or ``case_sensitive='upper'``) or to lower case (``case_sensitive='lower'``).
+ converted to upper case (``case_sensitive=False`` or
+ ``case_sensitive='upper'``) or to lower case
+ (``case_sensitive='lower'``).
@@ -296,46 +354,57 @@ Tweaking the conversion
The :keyword:`converters` argument
----------------------------------
-Usually, defining a dtype is sufficient to define how the sequence of strings must be converted.
-However, some additional control may sometimes be required.
-For example, we may want to make sure that a date in a format ``YYYY/MM/DD`` is converted to a :class:`datetime` object, or that a string like ``xx%`` is properly converted to a float between 0 and 1.
-In such cases, we should define conversion functions with the :keyword:`converters` arguments.
+Usually, defining a dtype is sufficient to define how the sequence of
+strings must be converted. However, some additional control may sometimes
+be required. For example, we may want to make sure that a date in a format
+``YYYY/MM/DD`` is converted to a :class:`datetime` object, or that a string
+like ``xx%`` is properly converted to a float between 0 and 1. In such
+cases, we should define conversion functions with the :keyword:`converters`
+arguments.
-The value of this argument is typically a dictionary with column indices or column names as keys and a conversion functions as values.
-These conversion functions can either be actual functions or lambda functions. In any case, they should accept only a string as input and output only a single element of the wanted type.
+The value of this argument is typically a dictionary with column indices or
+column names as keys and a conversion functions as values. These
+conversion functions can either be actual functions or lambda functions. In
+any case, they should accept only a string as input and output only a
+single element of the wanted type.
+
+In the following example, the second column is converted from as string
+representing a percentage to a float between 0 and 1::
-In the following example, the second column is converted from as string representing a percentage to a float between 0 and 1
>>> convertfunc = lambda x: float(x.strip("%"))/100.
>>> data = "1, 2.3%, 45.\n6, 78.9%, 0"
>>> names = ("i", "p", "n")
>>> # General case .....
- >>> np.genfromtxt(StringIO(data), delimiter=",", names=names)
- array([(1.0, nan, 45.0), (6.0, nan, 0.0)],
+ >>> np.genfromtxt(StringIO(data), delimiter=",", names=names)
+ array([(1.0, nan, 45.0), (6.0, nan, 0.0)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])
-We need to keep in mind that by default, ``dtype=float``.
-A float is therefore expected for the second column.
-However, the strings ``' 2.3%'`` and ``' 78.9%'`` cannot be converted to float and we end up having ``np.nan`` instead.
-Let's now use a converter.
+We need to keep in mind that by default, ``dtype=float``. A float is
+therefore expected for the second column. However, the strings ``' 2.3%'``
+and ``' 78.9%'`` cannot be converted to float and we end up having
+``np.nan`` instead. Let's now use a converter::
>>> # Converted case ...
- >>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
+ >>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
... converters={1: convertfunc})
- array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
+ array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])
-The same results can be obtained by using the name of the second column (``"p"``) as key instead of its index (1).
+The same results can be obtained by using the name of the second column
+(``"p"``) as key instead of its index (1)::
>>> # Using a name for the converter ...
- >>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
+ >>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
... converters={"p": convertfunc})
- array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
+ array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])
-Converters can also be used to provide a default for missing entries.
-In the following example, the converter ``convert`` transforms a stripped string into the corresponding float or into -999 if the string is empty.
-We need to explicitly strip the string from white spaces as it is not done by default.
+Converters can also be used to provide a default for missing entries. In
+the following example, the converter ``convert`` transforms a stripped
+string into the corresponding float or into -999 if the string is empty.
+We need to explicitly strip the string from white spaces as it is not done
+by default::
>>> data = "1, , 3\n 4, 5, 6"
>>> convert = lambda x: float(x.strip() or -999)
@@ -350,33 +419,42 @@ We need to explicitly strip the string from white spaces as it is not done by de
Using missing and filling values
--------------------------------
-Some entries may be missing in the dataset we are trying to import.
-In a previous example, we used a converter to transform an empty string into a float.
-However, user-defined converters may rapidly become cumbersome to manage.
+Some entries may be missing in the dataset we are trying to import. In a
+previous example, we used a converter to transform an empty string into a
+float. However, user-defined converters may rapidly become cumbersome to
+manage.
-The :func:`~nummpy.genfromtxt` function provides two other complementary mechanisms: the :keyword:`missing_values` argument is used to recognize missing data and a second argument, :keyword:`filling_values`, is used to process these missing data.
+The :func:`~nummpy.genfromtxt` function provides two other complementary
+mechanisms: the :keyword:`missing_values` argument is used to recognize
+missing data and a second argument, :keyword:`filling_values`, is used to
+process these missing data.
:keyword:`missing_values`
-------------------------
-By default, any empty string is marked as missing.
-We can also consider more complex strings, such as ``"N/A"`` or ``"???"`` to represent missing or invalid data.
-The :keyword:`missing_values` argument accepts three kind of values:
+By default, any empty string is marked as missing. We can also consider
+more complex strings, such as ``"N/A"`` or ``"???"`` to represent missing
+or invalid data. The :keyword:`missing_values` argument accepts three kind
+of values:
a string or a comma-separated string
- This string will be used as the marker for missing data for all the columns
+ This string will be used as the marker for missing data for all the
+ columns
a sequence of strings
In that case, each item is associated to a column, in order.
a dictionary
- Values of the dictionary are strings or sequence of strings.
- The corresponding keys can be column indices (integers) or column names (strings). In addition, the special key ``None`` can be used to define a default applicable to all columns.
+ Values of the dictionary are strings or sequence of strings. The
+ corresponding keys can be column indices (integers) or column names
+ (strings). In addition, the special key ``None`` can be used to
+ define a default applicable to all columns.
:keyword:`filling_values`
-------------------------
-We know how to recognize missing data, but we still need to provide a value for these missing entries.
-By default, this value is determined from the expected dtype according to this table:
+We know how to recognize missing data, but we still need to provide a value
+for these missing entries. By default, this value is determined from the
+expected dtype according to this table:
============= ==============
Expected type Default
@@ -388,37 +466,43 @@ Expected type Default
``string`` ``'???'``
============= ==============
-We can get a finer control on the conversion of missing values with the :keyword:`filling_values` optional argument.
-Like :keyword:`missing_values`, this argument accepts different kind of values:
+We can get a finer control on the conversion of missing values with the
+:keyword:`filling_values` optional argument. Like
+:keyword:`missing_values`, this argument accepts different kind of values:
a single value
This will be the default for all columns
a sequence of values
Each entry will be the default for the corresponding column
a dictionary
- Each key can be a column index or a column name, and the corresponding value should be a single object.
- We can use the special key ``None`` to define a default for all columns.
-
-In the following example, we suppose that the missing values are flagged with ``"N/A"`` in the first column and by ``"???"`` in the third column.
-We wish to transform these missing values to 0 if they occur in the first and second column, and to -999 if they occur in the last column.
-
->>> data = "N/A, 2, 3\n4, ,???"
->>> kwargs = dict(delimiter=",",
-... dtype=int,
-... names="a,b,c",
-... missing_values={0:"N/A", 'b':" ", 2:"???"},
-... filling_values={0:0, 'b':0, 2:-999})
->>> np.genfromtxt(StringIO.StringIO(data), **kwargs)
-array([(0, 2, 3), (4, 0, -999)],
- dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
+ Each key can be a column index or a column name, and the
+ corresponding value should be a single object. We can use the
+ special key ``None`` to define a default for all columns.
+
+In the following example, we suppose that the missing values are flagged
+with ``"N/A"`` in the first column and by ``"???"`` in the third column.
+We wish to transform these missing values to 0 if they occur in the first
+and second column, and to -999 if they occur in the last column::
+
+ >>> data = "N/A, 2, 3\n4, ,???"
+ >>> kwargs = dict(delimiter=",",
+ ... dtype=int,
+ ... names="a,b,c",
+ ... missing_values={0:"N/A", 'b':" ", 2:"???"},
+ ... filling_values={0:0, 'b':0, 2:-999})
+ >>> np.genfromtxt(StringIO.StringIO(data), **kwargs)
+ array([(0, 2, 3), (4, 0, -999)],
+ dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
:keyword:`usemask`
------------------
-We may also want to keep track of the occurrence of missing data by constructing a boolean mask, with ``True`` entries where data was missing and ``False`` otherwise.
-To do that, we just have to set the optional argument :keyword:`usemask` to ``True`` (the default is ``False``).
-The output array will then be a :class:`~numpy.ma.MaskedArray`.
+We may also want to keep track of the occurrence of missing data by
+constructing a boolean mask, with ``True`` entries where data was missing
+and ``False`` otherwise. To do that, we just have to set the optional
+argument :keyword:`usemask` to ``True`` (the default is ``False``). The
+output array will then be a :class:`~numpy.ma.MaskedArray`.
.. unpack=None, loose=True, invalid_raise=True)
@@ -427,8 +511,10 @@ The output array will then be a :class:`~numpy.ma.MaskedArray`.
Shortcut functions
==================
-In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.io` module provides several convenience functions derived from :func:`~numpy.genfromtxt`.
-These functions work the same way as the original, but they have different default values.
+In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.io` module
+provides several convenience functions derived from
+:func:`~numpy.genfromtxt`. These functions work the same way as the
+original, but they have different default values.
:func:`~numpy.ndfromtxt`
Always set ``usemask=False``.
@@ -437,8 +523,9 @@ These functions work the same way as the original, but they have different defau
Always set ``usemask=True``.
The output is always a :class:`~numpy.ma.MaskedArray`
:func:`~numpy.recfromtxt`
- Returns a standard :class:`numpy.recarray` (if ``usemask=False``) or a :class:`~numpy.ma.MaskedRecords` array (if ``usemaske=True``).
- The default dtype is ``dtype=None``, meaning that the types of each column will be automatically determined.
+ Returns a standard :class:`numpy.recarray` (if ``usemask=False``) or a
+ :class:`~numpy.ma.MaskedRecords` array (if ``usemaske=True``). The
+ default dtype is ``dtype=None``, meaning that the types of each column
+ will be automatically determined.
:func:`~numpy.recfromcsv`
Like :func:`~numpy.recfromtxt`, but with a default ``delimiter=","``.
-