You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking to replace Reppy with something that is easier to install and maintain. We have some unit tests for our usage of Reppy. Some of these test that wildcards are handled correctly (whatever 'correct' may mean here). One test that is failing, tests behavior of wildcards in GET-parameters. Reppy disallows that URL, while Protego allows it.
Could you shed some light on this? Is this something that should and can be fixed in Protego?
In [1]: fromreppy.robotsimportRobotsIn [2]: fromprotegoimportProtegoIn [3]: robots_txt="""User-agent: * ...: Disallow: /*s= ...: """In [4]: reppy=Robots.parse('', robots_txt)
In [5]: protego=Protego.parse(robots_txt)
In [6]: urls= ['https://mysite/', 'https://mysite/s/', 'https://mysite/?s=asd']
In [7]: [reppy.allowed(url, '*') forurlinurls]
Out[7]: [True, True, False]
In [8]: [protego.can_fetch(url, '*') forurlinurls]
Out[8]: [True, True, True]
The text was updated successfully, but these errors were encountered:
I just tested this in Google's robots.txt testing tool and according to their implementation Reppy's behaviour is the correct one.
@Gallaecio It would be nice to see this addressed as I'm also looking to replace Reppy and stumbled upon this while considering Protego as a candidate.
You provided a great test case already, and the code of protego is 500 lines of code in a single file. I think that’s enough for a starting point for many people. To provide more I would need to spend some time (I don’t have right now) on this 😅.
I'm looking to replace Reppy with something that is easier to install and maintain. We have some unit tests for our usage of Reppy. Some of these test that wildcards are handled correctly (whatever 'correct' may mean here). One test that is failing, tests behavior of wildcards in GET-parameters. Reppy disallows that URL, while Protego allows it.
Could you shed some light on this? Is this something that should and can be fixed in Protego?
The text was updated successfully, but these errors were encountered: