Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New html parser #888

Closed
q0w opened this issue Feb 6, 2022 · 7 comments
Closed

New html parser #888

q0w opened this issue Feb 6, 2022 · 7 comments
Labels
⭐ enhancement Improvements for existing features
Milestone

Comments

@q0w
Copy link
Contributor

q0w commented Feb 6, 2022

Is your feature request related to a problem? Please describe.

As I mentioned earlier, html5lib will be removed from pip (it already does not switch to html5lib by default). So maybe rewrite to html.parser or add 3th-party lib parser, like html5lib.
Or use a faster parser, like selectolax to improve performance.

Describe the solution you'd like

I think, html.parser is slow, as html5lib too. So adding selectolax can be a solution.

@q0w q0w added the ⭐ enhancement Improvements for existing features label Feb 6, 2022
@frostming
Copy link
Collaborator

IMO html parser shouldn't be the performance bottleneck unless you can provide some proof.

@frostming
Copy link
Collaborator

At present, PDM is reusing the ability of pip for index parsing so there isn't room for customizing the parser. But we are in the process of dropping pip and third-party HTML parsers may be worth considering.

@q0w
Copy link
Contributor Author

q0w commented Feb 6, 2022

Do you think dropping pip in pdm will be earlier than pip will fully remove html5lib from vendors?

@frostming frostming added this to the version 2.0 milestone May 7, 2022
@abersheeran
Copy link
Contributor

https://peps.python.org/pep-0691/

Maybe don’t need new html parser

@q0w
Copy link
Contributor Author

q0w commented Jun 27, 2022

But existing api would not be deprecated soon

@frostming
Copy link
Collaborator

On PDM 2.0 we switched from pip to unearth, which uses html.parser.

@frostming
Copy link
Collaborator

Please test it on 2.0.0a1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐ enhancement Improvements for existing features
Projects
None yet
Development

No branches or pull requests

3 participants