Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize reporting known inconsistencies in dask.dataframe API #9187

Open
avriiil opened this issue Jun 15, 2022 · 4 comments
Open

Standardize reporting known inconsistencies in dask.dataframe API #9187

avriiil opened this issue Jun 15, 2022 · 4 comments
Labels
documentation Improve or add to documentation good first issue Clearly described and easy to accomplish. Good for beginners to the project.

Comments

@avriiil
Copy link
Contributor

avriiil commented Jun 15, 2022

Current situation
Portions of the dask.dataframe API are copied over directly from pandas using the @derived_from decorator. This pulls in the pandas docstring for the corresponding object and adds the "This docstring was copied from pandas. Some inconsistencies with the Dask version may exist."

Problem
New users often find the vagueness in this docstring confusing and intimidating -- Are there really inconsistencies? How many of them? What are they?

Proposed Situation

  • Adopt a standardised method for reporting known inconsistencies. I've made a first attempt in Include known inconsistency in DataFrame str.split accessor docstring  #9177 - this format is very much open to discussion.
  • Launch an organised effort to comb through the dask.dataframe API systematically to report known inconsistencies wherever possible. I'm not really sure what the best way to do this would be - perhaps a meta-issue or item in the overall Dask project board (if that exists?)

This process could be replicated for dask.array

@github-actions github-actions bot added the needs triage Needs a response from a contributor label Jun 15, 2022
@pavithraes pavithraes added documentation Improve or add to documentation discussion Discussing a topic with no specific actions yet and removed needs triage Needs a response from a contributor labels Jun 15, 2022
@jrbourbeau
Copy link
Member

Thanks @rrpelgrim. Optionally enumerating important discrepancies between pandas and Dask in auto-generated docstrings seems like a nice addition. We use the @derived_from decorator for automatically copying docstrings into Dask methods

def derived_from(original_klass, version=None, ua_args=None, skipblocks=0):

One option would be to add a new inconsistencies= keyword which takes a list of known discrepancies

@derived_from(
    pd.DataFrame,
    inconsistencies=["This is different", "this other thing is also different"],
)
def method(*args, **kwargs):
    ...

@avriiil
Copy link
Contributor Author

avriiil commented Jun 16, 2022

@jrbourbeau -- I looked at this for a while today but can't quite figure out the correct location in the code where I should add the inconsistencies keyword.

Here's a draft PR that does not build correctly yet: #9192

Should it be added to the ignore_warnings function, the _derived_from helper or the derived_from function itself? Or all three?

@jrbourbeau
Copy link
Member

I would add the new keyword to derived_from and forward it down through any necessary functions so it can be used with the "Some inconsistencies with the Dask version may exist" part of the docstring

@avriiil
Copy link
Contributor Author

avriiil commented Jun 23, 2022

The inconsistencies keyword has been added here: #9192

@pavithraes pavithraes added good first issue Clearly described and easy to accomplish. Good for beginners to the project. and removed discussion Discussing a topic with no specific actions yet labels Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve or add to documentation good first issue Clearly described and easy to accomplish. Good for beginners to the project.
Projects
None yet
Development

No branches or pull requests

3 participants