Skip to content

Scrapers to scrape art images and details from Wikiart and Wikimedia Commons

Notifications You must be signed in to change notification settings

sidhantls/scrape_art

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Scrape Art

Scrapers to scrape art images and metadata from Wikiart and Wikimedia Commons. The purpose is to help contribute an art dataset to academia for non-commerical machine learning research- for example in image captioning, image generation, or image classiifcation. Metadata of art genre, style, title, and art images will allow for a diverse scope of machine leanring research

How to run

Download the selenium ChromeDriver and move it to the repo's root
notebook_wikiart_scraper.ipynb- Python notebook that carrys out the crawling. Notebook format is useful to debug and develop
python wikiart_scraper.py- To run the crawler with multiprocessing. Faster to retrieve images and metadata

Requirements

  • bs4 (BeautifulSoup)
  • urllib
  • selenium
  • regex
  • tqdm

About

Scrapers to scrape art images and details from Wikiart and Wikimedia Commons

Resources

Stars

Watchers

Forks

Releases

No releases published