-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy for docstring consistency among graph classes #5584
Comments
I agree that consistency among the method docstrings for the various classes should be the goal. Based on my current understanding of how docstrings work with inheritance, there are two approaches, each with their own advantages and drawbacks:
Option 1. has the advantage that there is only one docstring that ever needs to be modified, and the method docstrings in any derived class are automatically in sync as they are inherited from the base class. The downside is then that extra effort needs to be made to make sure the docstring is generic enough to apply to all of the subclasses. This will likely have consequences in terms of typing (e.g. parameter/return types), and there will likely be cases where a docstring may have to have extra information that is only relevant for a subset of the derived classes (e.g. in the Examples section). Option 2. has the advantage that it is completely flexible - each docstring can be perfectly tailored to the individual class. The downside is that it is more difficult to maintain consistency between the docstrings, as illustrated by #5529. For example, if there is a necessary change that should apply to the method docstring for all of the graph classes, it needs to be explicitly changed in 4 separate docstrings. There is another option which I didn't enumerate, which is to try to combine the two approaches by developing a docstring generator that takes in a template docstring and automatically modifies it for each class. See e.g. #5416. Generally I think this really only works well for simple cases (e.g. Without taking a detailed look at the method docstrings for the inherited classes (which is not only the graph classes by the way... this also applies to the various views etc.) I don't have a sense for whether one approach is better than another. In the case of
and evaluate the procedure: how much information is generic? how significant is the improvement when the information is customized on a per-class basis? How much effort/review goes into ensuring that the docstrings are all consistent? This exercise may also unlock new/better ideas involving templating/auto-generating docstrings. |
Nice summary and statement of the problem and some possible solutions. The current doc_strings were created as "have a single doc_string for all 4 graph cases and copy any changes to all 4 supposedly identical versions". Of course, over time they have diverged -- but I think only slightly. The idea behind copying the identical doc_strings was so that someone reading the code had the doc_string right there with the code. And no special code was needed to attache the inherited doc_strings to the class methods. When the views and example subclasses arrived, no global effort was completed to update/unify the doc_strings. So, it is not just I think we didn't go with option 2 because
Now that we have the wonderful CI-magic, we could construct a test to ensure that we know every time a doc_string from one class is changed when the others were not updated. This would enforce synchrony without removing the docs from the code. For option 2, where the doc_strings differ, is there a CI-magic solution to ensure that they have similar style and explain the differences between subclass methods rather than just what one method does? I like the idea of doing a case study with |
This is a good point and something I didn't consider above - docstring inheritance works for the interactive-help use case (e.g. |
It looks like PR #5529 didn't fix the problem for |
I’m adding a pointer to #5699 because it relates strongly to this conversation. It started by updating the doc_string of |
This discussion was originally brought up in #5529 in the context of updating the
degree
method docstring for the MultiGraph class. The question then becomes: what should be done about thedegree
method for the other classes (Graph
,DiGraph
, etc.)? What is the best approach for keeping the class+method docstrings in sync (and correct from a typing perspective) while taking into account the differences between the classes. Original discussion from @dschult below:The 4 graph classes have overlapping and intentionally repetitive doc_strings across the classes. Whenever we change one, we need to change -- or at least be aware of -- the others. If we improve one, we should make that change to the others. If we put special comments about one graph class into one doc_string, we should try to keep that comment easily identified so that readers can recognize the rest as being the same. For example, adding a "Note" to describe the differences rather than just changing one or two words in a paragraph in the doc_string.
There is a tension between keeping the doc_strings in all graph classes close to being the same, and adding the tweaks to make them specialized for each class. When the docs are the same for a method that belongs to e.g. Graph, DiGraph, MultiGraph and MultiDiGraph, there is a unified presentation. Maintainers have fewer doc_strings to parse (but more to remember to update), and users can more easily recognize the docs when they look at the same method in another class and understand that the methods do the same thing. When the docs differ, the presentation diverges over time but can be tailored to each class. Maintainers don't have to remember to go and change all 4 copies of the doc_string when one is changed (though maybe they should because it is probably better for all the classes). And users get docs that are specialized for that class. But they lose the ability to see what is the same across the classes.
You can see our attempts to keep the doc_strings the same in the Examples section just below the changes in this PR.
>>> G = nx.Graph() # or nx.DiGraph() or nx.MuiltiGraph() or nx.MultiDiGraph()
appears in all 4 graph classes. This way the user can understand that they all would give this same output after just reading one doc_string. We could move toward making each graph class use the examples with that graph class. But I would claim that the doc_string would be ever so slightly less useful.
Perhaps we should change the return type to
MultiDegreeView or DegreeView or OutDegreeView or MultiOutDegreeView or int
. But that certainly doesn't seem clear. And it might mess up typing. We could try to add another line to the doc_string stating that the return type varies from one class to another but that they are basically the same.If we choose to make the doc_strings differ by graph class, then we should maybe make the examples differ by graph class as well. That would test all the graph classes for the example in this doc_string (which we currently don't do). But someone will have to keep track of the doc_strings in the 4 classes and make sure that improvements in one get updated in the others.
What is the better philosophy about the methods for doc_strings of similar subclasses? I am reminded of the 7 subclasses in scipy.sparse for the different types of sparse representations.
The text was updated successfully, but these errors were encountered: