Skip to content

aprovent/twitter-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Scraper

Twitter Scraper is a simple scraper tool based on playwright. It can scrap multiple tweets and generate for each a beautiful screenshot and a pdf.

Examples:

example.jpg

and

example.pdf

Installation

Before all, you need NodeJs on your computer. Download it on https://nodejs.org/

Clone the project then execute :

npm i

Configuration

Edit the twitter-links.js file and add all the tweet urls you want to scrap.

const urls = [
  "https://twitter.com/DalaiLama/status/1559111653450727425",
  "https://twitter.com/DalaiLama/status/2",
  "https://twitter.com/DalaiLama/status/3",
  "https://twitter.com/DalaiLama/status/4"
];

Execution

npm run scrap

Pdf and Screenshots will be exported in the /tweets directory

Configuration

You can configure the scraper in the index.js file :

Change export directory:

const exportDir = "tweets";

Enable/Disable screenshot or pdf export :

const exportScreenshot = true;
const exportPdf = true;

Increase or decrease scraper timeout :

const timeout = 30; // in seconds

Increase or decrease screenshot/pdf number of page :

const pageCount = 40; // number of page (page height 1024px)

Knwown problems

Timeout

On some big tweets or slow computers, you may encounter a timeout.

In that case, you could increase the timeout

Replacing

const timeout = 30; // in seconds

by (or more)

const timeout = 60; // in seconds

Todo

  • Pass configuration to index.js by arguments
  • Best timeout handle

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published