diff options
-rw-r--r-- | doc/neps/nep-0044-restructuring-numpy-docs.rst | 239 |
1 files changed, 239 insertions, 0 deletions
diff --git a/doc/neps/nep-0044-restructuring-numpy-docs.rst b/doc/neps/nep-0044-restructuring-numpy-docs.rst new file mode 100644 index 000000000..8ab2f9b87 --- /dev/null +++ b/doc/neps/nep-0044-restructuring-numpy-docs.rst @@ -0,0 +1,239 @@ +=================================================== +NEP 44 — Restructuring the NumPy Documentation +=================================================== + +:Author: Ralf Gommers +:Author: Melissa Mendonça +:Author: Mars Lee +:Status: Draft +:Type: Process +:Created: 2020-02-11 + +Abstract +======== + +This document proposes a restructuring of the NumPy Documentation, both in form +and content, with the goal of making it more organized and discoverable for +beginners and experienced users. + +Motivation and Scope +==================== + +See `here <https://numpy.org/devdocs/>`_ for the front page of the latest docs. +The organization is quite confusing and illogical (e.g. user and developer docs +are mixed). We propose the following: + +- Reorganizing the docs into the four categories mentioned in [1]_, namely *Tutorials*, *How Tos*, *Reference Guide* and *Explanations* (more about this below). +- Creating dedicated sections for Tutorials and How-Tos, including orientation + on how to create new content; +- Adding an Explanations section for key concepts and techniques that require + deeper descriptions, some of which will be rearranged from the Reference Guide. + +Usage and Impact +================ + +The documentation is a fundamental part of any software project, especially +open source projects. In the case of NumPy, many beginners might feel demotivated +by the current structure of the documentation, since it is difficult to discover +what to learn (unless the user has a clear view of what to look for in the +Reference docs, which is not always the case). + +Looking at the results of a "NumPy Tutorial" search on any search engine also +gives an idea of the demand for this kind of content. Having official high-level +documentation written using up-to-date content and techniques will certainly +mean more users (and developers/contributors) are involved in the NumPy +community. + +Backward compatibility +====================== + +The restructuring will effectively demand a complete rewrite of links and some +of the current content. Input from the community will be useful for identifying +key links and pages that should not be broken. + +Detailed description +==================== + +As discussed in the article [1]_, there are four categories of doc content: + +- Tutorials +- How-to guides +- Explanations +- Reference guide + +We propose to use those categories as the ones we use (for writing and +reviewing) whenever we add a new documentation section. + +The reasoning for this is that it is clearer both for +developers/documentation writers and to users where each piece of +information should go, and the scope and tone of each document. For +example, if explanations are mixed with basic tutorials, beginners +might be overwhelmed and alienated. On the other hand, if the reference +guide contains basic how-tos, it might be difficult for experienced +users to find the information they need, quickly. + +Currently, there are many blogs and tutorials on the internet about NumPy or +using NumPy. One of the issues with this is that if users search for this +information they may end up in an outdated (unofficial) tutorial before +they find the current official documentation. This can be especially +confusing, especially for beginners. Having a better infrastructure for the +documentation also aims to solve this problem by giving users high-level, +up-to-date official documentation that can be easily updated. + +Status and ideas of each type of doc content +-------------------------------------------- + +**Reference guide** + +NumPy has a quite complete reference guide. All functions are documented, most +have examples, and most are cross-linked well with *See Also* sections. Further +improving the reference guide is incremental work that can be done (and is being +done) by many people. There are, however, many explanations in the reference +guide. These can be moved to a more dedicated Explanations section on the docs. + +**How-to guides** + +NumPy does not have many how-to's. The subclassing and array ducktyping section +may be an example of a how-to. Others that could be added are: + +- Parallelization (controlling BLAS multithreading with ``threadpoolctl``, using + multiprocessing, random number generation, etc.) +- Storing and loading data (``.npy``/``.npz`` format, text formats, Zarr, HDF5, + Bloscpack, etc.) +- Performance (memory layout, profiling, use with Numba, Cython, or Pythran) +- Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse, etc. + +**Explanations** + +There is a reasonable amount of content on fundamental NumPy concepts such as +indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could be +organized better and clarified to ensure it's really about explaining the concepts +and not mixed with tutorial or how-to like content. + +There are few explanations about anything other than those fundamental NumPy +concepts. + +Some examples of concepts that could be expanded: + +- Copies vs. Views; +- BLAS and other linear algebra libraries; +- Fancy indexing. + +In addition, there are many explanations in the Reference Guide, which should be +moved to this new dedicated Explanations section. + +**Tutorials** + +There's a lot of scope for writing better tutorials. We have a new *NumPy for +absolute beginners tutorial* [3]_ (GSoD project of Anne Bonner). In addition we +need a number of tutorials addressing different levels of experience with Python +and NumPy. This could be done using engaging data sets, ideas or stories. For +example, curve fitting with polynomials and functions in ``numpy.linalg`` could +be done with the Keeling curve (decades worth of CO2 concentration in air +measurements) rather than with synthetic random data. + +Ideas for tutorials (these capture the types of things that make sense, they're +not necessarily the exact topics we propose to implement): + +- Conway's game of life with only NumPy (note: already in `Nicolas Rougier's book + <https://www.labri.fr/perso/nrougier/from-python-to-numpy/#the-game-of-life>`_) +- Using masked arrays to deal with missing data in time series measurements +- Using Fourier transforms to analyze the Keeling curve data, and extrapolate it. +- Geospatial data (e.g. lat/lon/time to create maps for every year via a stacked + array, like `gridMet data <http://www.climatologylab.org/gridmet.html>`_) +- Using text data and dtypes (e.g. use speeches from different people, shape + ``(n_speech, n_sentences, n_words)``) + +The *Preparing to Teach* document [2]_ from the Software Carpentry Instructor +Training materials is a nice summary of how to write effective lesson plans (and +tutorials would be very similar). In addition to adding new tutorials, we also +propose a *How to write a tutorial* document, which would help users contribute +new high-quality content to the documentation. + +Data sets +--------- + +Using interesting data in the NumPy docs requires giving all users access to +that data, either inside NumPy or in a separate package. The former is not the +best idea, since it's hard to do without increasing the size of NumPy +significantly. Even for SciPy there has so far been no consensus on this (see +`scipy PR 8707 <https://github.com/scipy/scipy/pull/8707>`_ on adding a new +``scipy.datasets`` subpackage). + +So we'll aim for a new (pure Python) package, named ``numpy-datasets`` or +``scipy-datasets`` or something similar. That package can take some lessons from +how, e.g., scikit-learn ships data sets. Small data sets can be included in the +repo, large data sets can be accessed via a downloader class or function. + +Related Work +============ + +Some examples of documentation organization in other projects: + +- `Documentation for Jupyter <https://jupyter.org/documentation>`_ +- `Documentation for Python <https://docs.python.org/3/>`_ +- `Documentation for TensorFlow <https://www.tensorflow.org/learn>`_ + +These projects make the intended audience for each part of the documentation +more explicit, as well as previewing some of the content in each section. + +Implementation +============== + +Currently, the `documentation for NumPy <https://numpy.org/devdocs/>`_ can be +confusing, especially for beginners. Our proposal is to reorganize the docs in +the following structure: + +- For users: + - Absolute Beginners Tutorial + - main Tutorials section + - How Tos for common tasks with NumPy + - Reference Guide (API Reference) + - Explanations + - F2Py Guide + - Glossary +- For developers/contributors: + - Contributor's Guide + - Under-the-hood docs + - Building and extending the documentation + - Benchmarking + - NumPy Enhancement Proposals +- Meta information + - Reporting bugs + - Release Notes + - About NumPy + - License + +Ideas for follow-up +------------------- + +Besides rewriting the current documentation to some extent, it would be ideal +to have a technical infrastructure that would allow more contributions from the +community. For example, if Jupyter Notebooks could be submitted as-is as +tutorials or How-Tos, this might create more contributors and broaden the NumPy +community. + +Similarly, if people could download some of the documentation in Notebook +format, this would certainly mean people would use less outdated material for +learning NumPy. + +It would also be interesting if the new structure for the documentation makes +translations easier. + +Discussion +========== + + +References and Footnotes +======================== + +.. [1] `What nobody tells you about documentation <https://www.divio.com/blog/documentation/>`_ + +.. [2] `Preparing to Teach <https://carpentries.github.io/instructor-training/15-lesson-study/index.html>`_ (from the `Software Carpentry <https://software-carpentry.org/>`_ Instructor Training materials) + +.. [3] `NumPy for absolute beginners Tutorial <https://numpy.org/devdocs/user/absolute_beginners.html>`_ by Anne Bonner + +Copyright +========= + +This document has been placed in the public domain. |