Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate GhostScript dependency in favor of svgling #2875

Closed
tomaarsen opened this issue Oct 29, 2021 · 8 comments · Fixed by #2863
Closed

Deprecate GhostScript dependency in favor of svgling #2875

tomaarsen opened this issue Oct 29, 2021 · 8 comments · Fixed by #2863

Comments

@tomaarsen
Copy link
Member

Hello!

This is following up from discussion from #1887. I would like to suggest to deprecate the dependency on GhostScript from Tree's _repr_png_:

nltk/nltk/tree.py

Lines 812 to 823 in f12df18

subprocess.call(
[
find_binary(
"gs",
binary_names=["gswin32c.exe", "gswin64c.exe"],
env_vars=["PATH"],
verbose=False,
)
]
+ "-q -dEPSCrop -sDEVICE=png16m -r90 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dSAFER -dBATCH -dNOPAUSE -sOutputFile={} {}".format(
out_path, in_path
).split()

We could start supporting SVG's rather than simply PNG's, by moving to using svgling. This module has been written with NLTK support in mind, so I presume that using it won't be too much of a hastle.

It would move the dependency from some downloaded third party tool which requires subprocess, temporarily files, etc. to a purely Python module.

Beyond how this third party tool is executed, the output also slightly differs, as is expected.
This is the current output in any jupyter notebook:
image

This is the output when simply importing svgling:
image

I would also like to invite the author of svgling, @rawlins, for their opinion on fully deprecating gs in favour of their module. Are there any issues that we should be aware of?

What's everyone's thoughts on this?

  • Tom Aarsen
@stevenbird
Copy link
Member

Great idea!

@iliakur
Copy link
Contributor

iliakur commented Oct 30, 2021

sounds good to me too!

@rawlins
Copy link

rawlins commented Oct 30, 2021

Thanks for proposing this @tomaarsen, I'd be happy for this to happen for my own selfish reasons. No doubt this dependency would stress test svgling in a new way, but I've gotten the impression from the feedback I get that people have found it to be pretty reliable so far in its lifespan.

  • I'm committed to maintaining svgling indefinitely. My time for actively adding big features varies quite a lot though. But I think it is pretty feature complete right now for nltk's (constituency tree) purposes.
  • svgling is not currently in anaconda (it is in pypi). However, I've been thinking of figuring out how to put together a conda package for it and submit it to conda-forge, this change to nltk would probably push that up in my todo list.
  • Compatibility across browsers, including mobile and tablet, is as far as I know very good. Compatibility with svg editors is a little varied/untested (in particular inkscape requires a compatibility mode in the current svgling version, and previously didn't work at all). But I don't know that this is a big issue for nltk users.

@tomaarsen
Copy link
Member Author

I like the sound of all that @rawlins. Perhaps we can take advantage of your experience to best guide an implementation. Here are some ideas I had, and I'd like to get some feedback on them:

Idea 1:

Simply import svgling at the top of the tree.py file. Although this solution is the simplest, it has some downsides:

  • Circular dependenies
  • Users who don't use trees in the context of jupyter notebooks will have to install the dependency.
  • The code responsible for rendering trees is "hidden" behind this (seemingly unused) import.

I would rather define a _repr_svg_ method, so people who read the code can understand the source of the graph.

Idea 2:

A second idea is to simply remove _repr_png_ and implement a _repr_svg_ like so:

    def _repr_svg_(self):
        from svgling import draw_tree

        return draw_tree(self)._repr_svg_()

This is the simplest implementation I could see working well. It tackles all 3 issues I had with idea 1.

Idea 3:

Beyond that, there are some iterations possible on idea 2:

    def _repr_svg_(self):
        from svgling.core import TreeLayout

        return TreeLayout(self)._repr_svg_()
    def _repr_svg_(self):
        from svgling.core import TreeLayout, TreeOptions

        return TreeLayout(self, options=TreeOptions())._repr_svg_()
    def _repr_svg_(self):
        from svgling.core import TreeLayout, nltk_tree_options

        return TreeLayout(self, options=nltk_tree_options)._repr_svg_()

Although possible, I don't feel like passing along this options value adds much, as users can't really modify it regardless.

Beyond these ideas, there might be other implementations that work better in conjunction with svgling. I'm definitely open to suggestions and discussion.


Beyond that, I would be interested in supporting the ability to use the original tool instead. There's a few options:

  1. Expose an e.g. use_legacy_plot function, which when called supports the existing _repr_png_.
  2. Add a plot function which relies on both _repr_png_ and _repr_svg_. We would rename _repr_png_ so no environment takes it as the default plotter, and then this new plot method has a legacy parameter defaulted to False. Through this new plot function, users could still access the old plotting function. However, this method would be very confusing for users who don't use trees in jupyter notebooks etc.
  3. Simply recognise that this new method ought to be better, and there shouldn't be need to still support the legacy _repr_png_.

I fear that option 1 will be extraordinarily hacky, and that option 2 would completely unnecessarily add a new method. I'm not sure if there is a simple and intuitive way to support multiple plotting methods. So I'm leaning towards option 3. After all, trees already support the draw method, which opens a tkinter GUI with the tree plotted using the old method.

I'm open to discussion on this.

  • Tom Aarsen

@rawlins
Copy link

rawlins commented Oct 31, 2021

Totally agree that idea 1 is non-ideal, I would actually prefer to eventually remove the side-effect monkeypatching code from svgling if nltk makes this change. It exists as it currently stands to make life as simple as possible for people who do want to use svgling with nltk, but there's no need for it if this change happens in nltk.

I don't know that I have strong preferences between your options 2-3; I think the main question (for you basically) is whether there is a need for passing rendering options somehow when looking at nltk.Tree objects in jupyter, or whether it's enough to just instruct users to call svgling.draw_tree (etc) directly if they want to customize. I added the global svgling.core.nltk_tree_options originally in case people wanted to do that, e.g. if they hate the default font or something, but I have no real sense if people do actually use it. An options object could could also be made available to modify as a simple dict on the nltk side and passed to draw_tree using the ** syntax, but I probably wouldn't do that unless there's demand.

Re adjusting the png code, one thing to keep in mind is that jupyter attempts to call all reprs regardless of which it displays. I might recommend keeping it as a _repr function but doing better wrapping in try blocks, e.g. along the lines of the nltk_tree_wrapper function seen in this commit (except returning None on the except blocks): rawlins/lambda-notebook@7f151a0. That would make it fail gracefully, which was one of the issues originally in #1887. Probably the crucial thing here if I'm remembering right is to catch tkinter.TclError.

Then, if users have everything installed and don't want the default they could choose a preferred _repr using the display function (e.g. display(tree, include=['image/png']) should do it, or various other approaches using IPython display objects). SVG should take precedence by default over png if both are there for both classic notebook, lab, and I think nbconvert. (I think this might even be configurable.) The one downside to this is that the additional overhead for people who do have the supporting packages installed may be non-negligible if they don't want png.

@stevenbird
Copy link
Member

@tomaarsen I favour the simplest possible solution, since that is also what is easiest to maintain in the long term. Complexities should "pay their way", ie justify the additional effort. NB people have access to the source code for old versions of NLTK if they want legacy functionality.

@tomaarsen
Copy link
Member Author

@rawlins

Totally agree that idea 1 is non-ideal, I would actually prefer to eventually remove the side-effect monkeypatching code from svgling if nltk makes this change. It exists as it currently stands to make life as simple as possible for people who do want to use svgling with nltk, but there's no need for it if this change happens in nltk.

Eventually, you might be able to use nltk.get_version() to check whether to still apply the monkeypatching. That way it works for both new and older NLTK versions.

I don't know that I have strong preferences between your options 2-3; I think the main question (for you basically) is whether there is a need for passing rendering options somehow when looking at nltk.Tree objects in jupyter, or whether it's enough to just instruct users to call svgling.draw_tree (etc) directly if they want to customize. I added the global svgling.core.nltk_tree_options originally in case people wanted to do that, e.g. if they hate the default font or something, but I have no real sense if people do actually use it. An options object could could also be made available to modify as a simple dict on the nltk side and passed to draw_tree using the ** syntax, but I probably wouldn't do that unless there's demand.

I'd rather not rely on nltk_tree_options, as it's an additional dependency which isn't really necessary. Regarding the last bit - do you know if it's possible to e.g. pass additional keyword arguments to display(), so that users can do e.g. display(tree, font_size=12)? If that's not possible, then it might be best to just rely on the default TreeOptions() that's created within svgling.

And as @stevenbird mentioned, it's likely best to keep this simple, and just remove the old outdated _repr_png_.

@rawlins
Copy link

rawlins commented Nov 1, 2021

Regarding the last bit - do you know if it's possible to e.g. pass additional keyword arguments to display(), so that users can do e.g. display(tree, font_size=12)?

Interesting question, I'll look into this, I haven't paid much attention to the extra display arguments before. Though on reflection I think the cleanest solution here would be for svgling to simply provide a means of setting default options at the package level in a general way, rather than specifically for nltk calls.

tomaarsen added a commit to tomaarsen/nltk that referenced this issue Nov 4, 2021
@tomaarsen tomaarsen linked a pull request Nov 7, 2021 that will close this issue
@tomaarsen tomaarsen self-assigned this Nov 7, 2021
stevenbird pushed a commit that referenced this issue Nov 12, 2021
* Resolved un-pickling issue with ParentedTrees for Python 3.7+

* Split tree.py up into its own module

* Fixed improper import from tree/probabilistic.py

* Import tree in the list of packages, instead of modules

* The label for a node doesn't need to be a str, it can be Any

* Renamed
ltk.tree.parse to
ltk.tree.parsing to avoid accidental overwrites

* Moved TreePrettyPrinter to tree package

The old import still works, but throws a warning. Perfect!

* Improved import issue with prettyprinter

* Moved treetransforms functions to tree package

* Heavily shrunk treetransforms.py

* Prevent shallow copy on ParentedTree - added doctest w. documentation

* Deprecate GhostScript in favor of svling

See #2875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants