Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid URL error when running sugarcoat pipeline on any public url #23

Open
AndreiCBogdan opened this issue Sep 12, 2022 · 1 comment

Comments

@AndreiCBogdan
Copy link

I have managed to install the sugarcoat tool and download the paragraph binary successfully but I'm having issues with actually using the tool on any url.
I am using the command in the readme:
npm run sugarcoat-pipeline -- -b <PATH_TO_PAGEGRAPH_BINARY> -u <URL> -t <SECS_TO_RUN_PAGEGRAPH> -l <FILTERLISTS>

This command opens the given url in brave without any issues but the tool later fails with an 'Invalid URL' message on the terminal, weirdly I don't think any error is thrown.

I have tried to search for 'Invalid URL' throughout the code and the only place I could find it is jammed in a pagegraph graphml file which looks impossible to decipher.

Device:
MacbookPro Intel
OS:
macOS Monteray

Any guidance or ideas are much appreciated.

@AndreiCBogdan
Copy link
Author

The initial hypothesis around the error was that the PageGraph crawler was not working, however, I have separately generated the GraphML files and parsed them into the sugarcoat pipeline using the -g flag and still receiving the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant