Skip to content

robbytx/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Single domain web crawler

Crawl a single domain outputting the website's sitemap as a list of pages with each page's static assets and links to other pages.

Running it

  1. Install the dependencies from requirements.txt:
$ pip install -r requirements.txt
  1. And then run crawl.py:
$ python crawl.py

About

Crawl a single domain outputting the website's sitemap.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages