Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure dispersion plot spans full text length #3158

Merged
merged 1 commit into from
Dec 17, 2023

Conversation

dlukes
Copy link
Contributor

@dlukes dlukes commented May 23, 2023

By default, Matplotlib autoscales the x-axis based on where hits are found. This may result in an unintuitive visualization of the dispersion which "zooms in" on the portion of the text with hits, failing to represent the preceding and/or following stretches with no hits:

from nltk.draw import dispersion_plot
text = ["foo"] * 10_000 + (["bar"] + ["foo"] * 100) * 5 + ["foo"] * 10_000
ax = dispersion_plot(text, ["bar"])

image

A more intuitive visualization is one where the x-axis always spans the full length of the text. The most straightforward way to achieve this is via ax.set_xlim(0, len(text) - 1), but since this disables autoscaling, it unfortunately also gets rid of the margins added during autoscaling, which add aesthetically pleasing breathing room to the plot:

ax.set_xlim(0, len(text) - 1)

image

So instead, we can explicitly set the data limits via ax.dataLim, and re-apply autoscaling:

ax.dataLim.x0, ax.dataLim.x1 = 0, len(text) - 1
ax.autoscale(axis="x")

image

This is the solution suggested by this PR.

By default, Matplotlib autoscales the x-axis based on where hits are
found. This may result in an unintuitive visualization of the dispersion
which "zooms in" on the portion of the text with hits, failing to
represent the preceding and/or following stretches with no hits. A more
intuitive visualization is one where the x-axis always spans the full
length of the text.

The most straightforward way to achieve this is via `ax.set_xlim(0,
len(text) - 1)`, but since this disables autoscaling, it unfortunately
also gets rid of the margins added during autoscaling, which add
aesthetically pleasing breathing room to the plot. So instead, we can
explicitly set the data limits via `ax.dataLim`, and re-apply
autoscaling.
@dlukes
Copy link
Contributor Author

dlukes commented May 23, 2023

I don't think I'm on the hook for the CI failure with Python 3.11 on macos-latest:

/Users/runner/work/_temp/46fec604-9745-4658-9616-404ec3453b66.sh: line 1: pytest: command not found
Error: Process completed with exit code 127.

But please let me know if I'm misinterpreting this! :)

@stevenbird stevenbird merged commit 59a1dbc into nltk:develop Dec 17, 2023
20 checks passed
@stevenbird
Copy link
Member

Thanks @dlukes. Sorry for the delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants