Skip to content

Google Season of Docs 2021 Project Ideas

Vít Tuček edited this page Mar 17, 2021 · 3 revisions

Welcome, and thank you for taking an interest in NumPy! On this page, we will first provide some context about NumPy and the current state of its documentation, and then describe a couple of project ideas in detail. These ideas are not the only ones possible; we'd love to talk to you if you have your own ideas about a project that you're excited about and that you think would help improve NumPy's documentation or online presence. Please note that Season of Docs is a program for writers with previous experience to show for the application. If you are a student, please consider Google Summer of Code instead.

About NumPy

NumPy is very widely used in pretty much every field of science and engineering. Over 32,000 packages on GitHub depend on NumPy, and 6 million users visit our website every month. Its user base spans from beginner coders to experienced researchers doing state-of-the-art scientific and industrial R&D. NumPy is the universal standard for working with numerical data in Python and is at the core of the scientific Python and PyData ecosystems. It provides ndarray, a homogeneous n-dimensional array object with methods to efficiently operate on it. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image, and most other data science and scientific Python packages. The API and concepts are also replicated in deep learning frameworks (e.g., Tensorflow, PyTorch) and in array computing libraries for other programming languages.

The state of NumPy's documentation

NumPy has been a 100% volunteer project until the first half of 2018 (we now have a few part-time paid developers). Recently, the project has been awarded two grants from the Chan Zuckerberg Initiative (through its Essential Open Source Software for Science (EOSS) program). In 2020, the focus was to improve the project's governance structure and allow some people involved in the project to focus on documentation and community building. In 2021, our focus on documentation continues, and we want to solidify the Documentation Team.

NumPy has participated in Google Season of Docs in 2019 and 2020. Last year, we had two technical writers working on Tutorials on our new Jupyter Notebook repository, numpy-tutorials, and they are both now part of our Documentation Team. We also saw improvements to our reference documentation, but NumPy is a big project, and there is still much work to do. Preliminary results from our 2020 User Survey point to documentation as one of the main points of improvement for NumPy, and there are currently over 160 open documentation issues in our GitHub repository.

Following the ideas outlined in the NumPy Enhancement Proposal (NEP) 44, which describes our plans for the future of NumPy's documentation, we consider the documentation to be divided into four parts: Reference, Tutorials, How-tos and Explanations (which we call NumPy Fundamentals). Although we have mostly complete reference documentation for each function and class exposed to users, there is a lack of usage examples for some of them. Also, many explanations are mixed in with the reference documentation, and users would benefit greatly from an expansion of the NumPy Fundamentals section. There is also some duplication of content in our User Guide, which hinders searchability and usability. Finally, there is a clear demand for tutorials and how-tos for users with different experience levels and domain-specific knowledge. Improving the quality of NumPy documentation will be very valuable to our millions of users!

How NumPy's documentation is built

All our documentation and websites are built with Sphinx. Sphinx generates static websites (making them easy to deploy) and provides extensive functionality to transform plain-text reStructuredText documents to html and extract and cross-link documentation automatically from docstrings in Python source code. Reference documentation follows the NumPy docstring standard. You can find a detailed guide on how to document functions, classes, and other objects here, and instructions on how to build them here. Recently, we have created a new repository for Jupyter Notebooks, numpy-tutorials. This is built separately using tools from the JupyterBook project.

NumPy's approach to documentation work

The documentation and development teams drive and decide on doc changes as they are proposed. Documentation tasks and issues are maintained on our GitHub issue tracker. Changes to the documentation are made via pull requests on GitHub, and reviewed with our standard review process which is the same for documentation and code (see our contributing guide). For any new features added to NumPy, comprehensive reference documentation must be added at the same time as code, including usage examples. New educational content, such as Tutorials and How-Tos, can be proposed directly as issues or discussed on the mailing list first. We also welcome ideas around accessibility, usability and inclusion when creating documentation for NumPy. The Documentation Team holds bi-weekly meetings to openly discuss goals and projects with the NumPy community.

Current documentation:

Contact

As a community driven project we try to have all conversations about NumPy in public. The main venue for discussions related to the development of NumPy (which includes GSoD) is the numpy-discussion mailing list: https://mail.python.org/mailman/listinfo/numpy-discussion. Please register and post to that list for discussing a GSoD proposal or idea. In case you want to pre-discuss something in private first, please contact the NumPy GSoD coordinators at numpy-scipy-gsod@googlegroups.com. You are also welcome to join the Documentation Team meetings (announced on the mailing list) if you want to know more about the project.

Project idea: High-level restructuring and end-user focus

NumPy serves many kinds of users: students new to programming or Python, educators, researchers, domain experts in one of the areas that NumPy covers, data scientists, library developers, packagers, and more. And NumPy's documentation is huge (the pdf version of the last release is over 1500 pages). The challenge: provide ways to guide those users to the parts of the documentation most relevant to them. We would love to work with a technical writer that can help us address this challenge.

Possible topics include:

  • Creating high-level documentation, such as Tutorials and How-Tos, covering topics that are missing from the official documentation.
  • Adding domain specific Tutorials to the NumPy Tutorials repository.
  • Populating the "NumPy Fundamentals" section, organizing the content currently scattered through the reference documentation.
  • Rewriting a section of the User Guide (that can then serve as a template for other/new sections).
  • Removing duplication from the User Guide to improve searchability and discoverability for users.
  • Adding non-textual images or graphics to enhance the textual explanations.
  • Updating out-of-date references and refactoring content to latest best practices.
  • Identifying the scope of the NumPy documentation shipped with the code.
  • Integrating the documentation more cleanly with the growing body of online literature on scientific computing, data science, resources for learning Python, NumPy, and performance considerations when writing code.
  • Consult the SciPy user survey (conducted in 2019) and the NumPy user survey (conducted in 2020) to get an overview of the most common features / improvements the community would like to see implemented in the future.

We assume that the technical writer who will take on this project is not yet familiar with NumPy and its community. This means time will need to be built into the project plan at the start to get familiar with it. Mentors will provide walkthroughs, set up time for discussing with or interviewing different kinds of end-users and content experts. We are open to listening to writers' suggestions and contributions.

Project idea: identify and create pathways for domain experts

Many domain experts (scientists, engineers and people who use NumPy in their work) have specific needs when it comes to documentation. While they may use NumPy regularly, they may be looking for best practices and optimal usage of advanced techniques. In addition, some of the existing code and tutorials around the web might be outdated or not relevant to current versions of NumPy. Specifically, we can cite numpy.random and numpy.f2py for which some of the online guides and answers are invalid or outdated.

This project could involve:

  • Creating an index of essential advanced NumPy concepts and techniques
  • Creating new documentation aimed at domain experts
  • Documenting best practices and How-tos with answers to common questions
  • Documenting differences between legacy and modern NumPy code

This project will require some domain knowledge and technical writing skills. An interest in networking and building connections with other writers, users and educators will also be very helpful for this project.

Relevant material that is not yet linked above:

(Org application deadline: March 26 2021, Technical writer exploration phase Apr 16 - May 17, 2021)