Improve efficiency of entry-point parsing #283

jaraco · 2021-02-21T15:50:09Z

In #281, this project added support for uniqueness of distributions when parsing entry points. This change introduced some degradation in the performance when parsing entry points (due to need to load/inspect the metadata for every project). In that PR, a couple of suggestions were made to improve the performance:

rely on the discovery name for the distribution rather than the proper name in metadata to disambiguate distributions
~~Rely on a custom, optimized parser instead of ConfigParser for parsing the entry points themselves.~~

Let's consider those two suggestions.

anntzer · 2021-02-21T16:08:05Z

You need to decide how well you want to handle malformed metadata (is misparsing them OK? is throwing some random unhelpful error OK?); obviously the more you want to take care of them the slower things will be. If you are OK with "we make no guarantees for malformed metadata", then the second patch at #281 (comment) should basically be usable as-is.

Likewise, it is up to you to decide how to handle distributions with different non-normalized names but identical normalized names. If you're fine with confusing them, then the first patch in the linked issue should likewise be close to usable.

jaraco · 2021-02-22T00:11:46Z

Given that most metadata is mechanically generated, I'm okay with weak error handling, but I also have a good deal of respect for regularity in parsing. That is, I'd like to avoid routines that are heavily imperative and difficult to reason about.

The normalized-names challenge concerns me more, mainly because it's going to demand consideration for distributions that don't present normalized names at all. It's currently not part of the protocol, but simply an implementation detail of PathDistributions that they have a normalized name. Distributions from another source might not have a normalized name at all. I'm less worried about uniqueness variance between PathDistributions with normalized and non-normalized names. If there's a difference, that's going to cause trouble elsewhere. Also, by relying on the normalized name as found in the filesystem, it adds a new dependency on that form, making it more difficult to later change that implementation detail. The only specified, reliable place to retrieve the name is through the metadata. This makes me wonder if maybe there's another approach that could optimize the loading of the proper name from the distribution. I think it's anything but straightforward.

anntzer · 2021-02-22T00:54:28Z

OK, let's deal with the simpler (first) problem first. I'd say the simple parser I wrote in the other thread is really as simple as it gets ("skip non-empty lines; if a line starts with a bracket it's a new group (record it); else it should have shape '{key} = {value}' and corresponds to a (group, key, value) entry"). It's actually simpler to follow than the ConfigParser-based parser (unless you are super-familiar with ConfigParser details, like what optionxform does...), and also shorter (as you can also delete the unused _from_config now).

…nce. Fixes #283.

anntzer mentioned this issue Feb 23, 2021

Use a hand-written parser for entry points. #285

Merged

jaraco mentioned this issue Mar 13, 2021

data/config path entry_points with minimal examples jupyter/jupyter_core#209

Closed

4 tasks

jaraco mentioned this issue Mar 29, 2021

entry_points parsing fails on comment lines #297

Closed

jaraco added a commit that referenced this issue May 27, 2021

Use normalized names to distinguish unique distributions for performa…

ebe1333

…nce. Fixes #283.

jaraco mentioned this issue May 27, 2021

Use normalized names to distinguish unique distributions for performance #317

Merged

jaraco closed this as completed in #317 May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of entry-point parsing #283

Improve efficiency of entry-point parsing #283

jaraco commented Feb 21, 2021 •

edited

anntzer commented Feb 21, 2021

jaraco commented Feb 22, 2021

anntzer commented Feb 22, 2021

Improve efficiency of entry-point parsing #283

Improve efficiency of entry-point parsing #283

Comments

jaraco commented Feb 21, 2021 • edited

anntzer commented Feb 21, 2021

jaraco commented Feb 22, 2021

anntzer commented Feb 22, 2021

jaraco commented Feb 21, 2021 •

edited