Skip to content

Tom1380/Subsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subsearch

Subsearch is an API for indexing Youtube subtitles and searching them.

How it works

Given a Youtube video, the TTML subtitles are downloaded via yt-dlp.

They are parsed, translated to a JSON document and fed into Elasticsearch.

When searching for a phrase, the ES index is queried. Looking at the ID, highlights and timestamps, the relevant video and timestamp is found and a link is built.

There's also an external crawler that uses Google Trends to find hot topics and keywords to feed to the API.

Functionality

First of all, start the API.

./api.py

Searching for a phrase

curl "localhost:2000/search/emancipate"

Requesting downloads

Videos

curl -XPOST "localhost:2000/request_download/jNQXAC9IVRw"

Channels

curl -XPOST "localhost:2000/request_download/@TheOffice"

Youtube search results

This downloads the subs from the first 10 results for the query query. You can change the 10 if you want to download a different number of videos.

curl -XPOST "localhost:2000/request_download/ytsearch10:query"

Running the crawler

If you want to keep growing the index you need to constantly supply the API with new videos.

I made a crawler to get hot topics and keywords from Google Trends and feed them to the API.

To run it:

./crawler.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published