Skip to content

Google Season of Docs 2021: NumPy Case Study

Melissa Weber Mendonça edited this page Nov 30, 2021 · 1 revision

GSoD NumPy case study

Project: High-Level restructuring and end-user Focus

  • Organization: NumPy
  • Organization Description: NumPy is the universal standard for working with numerical data in Python and is at the core of the scientific Python and PyData ecosystems. It provides ndarray, a homogeneous n-dimensional array object with methods to efficiently operate on it. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image, and most other data science and scientific Python packages.

Problem statement

Following the ideas outlined in the NumPy Enhancement Proposal (NEP) 44, which describes our plans for the future of NumPy's documentation, we consider the documentation to be divided into four parts: Reference, Tutorials, How-tos and Explanations (which we call NumPy Fundamentals). Although we have mostly complete reference documentation for each function and class exposed to users, there is a lack of usage examples for some of them. Also, many explanations are mixed in with the reference documentation, and we would like to expand the NumPy Fundamentals section. We also wish to reduce duplication of content in our User Guide, which hinders searchability and usability. Finally, preliminary results from our 2020 User Survey point to documentation as one of the main points of improvement for NumPy, and there is a clear demand for tutorials and how-tos for users with different experience levels and domain-specific knowledge.

Proposal abstract

NumPy serves many kinds of users: students new to programming or Python, educators, researchers, domain experts in one of the areas that NumPy covers, data scientists, library developers, packagers, and more. And NumPy's documentation is huge (the pdf version of the last release is over 1500 pages). The challenge: provide ways to guide those users to the parts of the documentation most relevant to them.

Our priorities for this project are:

  • Creating high-level documentation, in the Tutorial or How-to format, covering topics that are missing from the official documentation.
  • Populating the "NumPy Fundamentals" section, organizing the content currently scattered through the reference documentation.
  • Removing duplication from the User Guide to improve searchability and discoverability for users.

The entire proposal can be found at Google Season of Docs 2021: Submitted Project.

Project Description

Creating the proposal

This proposal was created with a clear goal of both continuing with the documentation reorganization our project has been implementing since NEP 44 was published, and creating new documents that would benefit our users. This proposal was shared openly with members of our community, including the NumPy Documentation Team, and collected feedback was incorporated into it. To prioritize actions for the proposal and make sure it was viable in the timeframe of the Season of Docs program, we were partially inspired by our users' inputs on the NumPy 2020 User Survey, which helped us in understanding common pain points and improvement opportunities.

Budget

We requested a full budget because we understood this was the best way to make sure our technical writer would have the time to be effectively onboarded into NumPy and would be able to do their best work. We did not face any challenges with the budget. Both mentors (Ross Barnowski, Melissa Mendonça) were partially funded by other sources to work on NumPy during the Season of Docs period, and were able to dedicate time and resources towards reviewing and mentoring the technical writer.

Participants

Our technical writer (@mukulikaa) applied through the Season of Docs program, and her Statement of Interest was complete and resonated with our own ideas about ways forward. Our previous experience with the Google Season of Docs program allowed us to understand our own dynamics and how to best estimate potential work for the technical writer to be hired, and we felt like the timeline presented was adequate and reasonable.

Effectively, we had our technical writer (@mukulikaa) and both mentors (@rossbar and @melissawm) working during the Season of Docs period. However, other maintainers also participated in reviewing and guidance. This allowed for the technical writer to have feedback and even do several valuable contributions that were not initially part of her project. We communicated often and the project went smoothly, with no issues or problems.

Timeline

The timeline presented in the original project was followed closely, with no unexpected problems or issues. Currently, there is one document in review, but we expect this to be merged in the next 7 days.

Results

The following pull requests are directly associated with this project:

This has resulted in several new/reorganized pages for the NumPy documentation, as well as several other valuable contributions not directly related to the project. These documents have been requested by users and other developers for some time, and a lot of duplicate content has been removed which improves discoverability and usability for our documentation.

As mentioned before, one last PR is in review (link to tutorial).

Metrics

In our original project, we mentioned we would consider the project successful if:

  • At least two pages are added to NumPy Fundamentals.
  • Either two How-tos or one How-to and one Tutorial are created.

Both goals have been achieved, with an extra positive result of the technical writer being effectively onboarded into our community and doing contributions such as responding to issues and reviewing pull requests.

Analysis

The constant and clear communication with the technical writer was the main reason for the success of this proposal. Mukulika was always very organized and indicated problems or blockers quickly, which allowed us to follow the timeline with no issues.

We consider this project very successful, both in terms of the content produced and in terms of the onboarding of our technical writer into the community.

Summary

Overall, this was a highly positive experience for NumPy. Our technical writer was able to produce high-quality work, showing independence and strong technical skills. She has also expressed interest in continuing to contribute to NumPy, and has demonstrated independence and great communication skills.

In terms of planning, we feel like the timeline was adequate and the main goals of the project have been achieved. NumPy is a complicated project, and working on its documentation required heavier technical skills than other projects. This requires a longer onboarding time, which we planned for and hope to repeat in the future.

In the future, we hope to be able to have a clearer developer documentation which allows for technical writers which are not necessarily as heavily technically-minded to be able to contribute. This is in our plans and could definitely help us create different projects, focused on other aspects such as concrete use cases and applications of NumPy or interoperability with other libraries, which are topics of interest for current users and developers of NumPy.