Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 1.15 KB

README.md

File metadata and controls

14 lines (12 loc) · 1.15 KB

Cheapo Tumblr Backup

A scraper for largely-text tumblr blogs.

What you need

  • A working Python 2.x (or 3.x) installation.
  • A Tumblr API key. You'll use the consumer key for this utility (see below)
  • The API URL for the blog you want to scrape.

Running

  • Use pip to install the contents of requirements.txt: pip install -r requirements.txt. Ideally you should use virtualenv to prevent installing these packages globally, where they may conflict with future/past Python software.
  • Create a config.yml file in the same directory as the scrape.py script. It should contain the keys:
    • api_key: Must be a quoted string equalling the API consumer key you got from Tumblr.
    • url: The API URL for the blog you want to scrape. This is optional, and can be overridden by the --user option on the script.
  • Run the scrape.py script. It will go away for awhile and generate a huge html file called posts.html containing all text content of your posts. The content of all photo posts will be dumped into the same directory.