Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protego differs from reppy in handling of wildcards for GET-params #29

Open
gertjanol opened this issue Sep 16, 2022 · 4 comments
Open
Labels
bug Something isn't working

Comments

@gertjanol
Copy link

gertjanol commented Sep 16, 2022

I'm looking to replace Reppy with something that is easier to install and maintain. We have some unit tests for our usage of Reppy. Some of these test that wildcards are handled correctly (whatever 'correct' may mean here). One test that is failing, tests behavior of wildcards in GET-parameters. Reppy disallows that URL, while Protego allows it.

Could you shed some light on this? Is this something that should and can be fixed in Protego?

In [1]: from reppy.robots import Robots

In [2]: from protego import Protego

In [3]: robots_txt = """User-agent: *
   ...: Disallow: /*s=
   ...: """

In [4]: reppy = Robots.parse('', robots_txt)

In [5]: protego = Protego.parse(robots_txt)

In [6]: urls = ['https://mysite/', 'https://mysite/s/', 'https://mysite/?s=asd']

In [7]: [reppy.allowed(url, '*') for url in urls]
Out[7]: [True, True, False]

In [8]: [protego.can_fetch(url, '*') for url in urls]
Out[8]: [True, True, True]
@Gallaecio Gallaecio added the bug Something isn't working label Sep 16, 2022
@PLPeeters
Copy link

I just tested this in Google's robots.txt testing tool and according to their implementation Reppy's behaviour is the correct one.

@Gallaecio It would be nice to see this addressed as I'm also looking to replace Reppy and stumbled upon this while considering Protego as a candidate.

@Gallaecio
Copy link
Member

It may take a while for someone from the core team to get around to this one, so feel free to open a PR if you have the time and motivation.

@gertjanol
Copy link
Author

@Gallaecio Can you or someone from the core team provide some pointers to what needs to be done? That might help to entice someone to give this a go.

@Gallaecio
Copy link
Member

Gallaecio commented Nov 6, 2023

You provided a great test case already, and the code of protego is 500 lines of code in a single file. I think that’s enough for a starting point for many people. To provide more I would need to spend some time (I don’t have right now) on this 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants