youtube-audio-and-transcript-extract

split audio_segmentation with corresponding transcript for youtube datasets

how can i handle youtube dataset with the indian accent. then segmented with a correct transcript?

first downloading .mp3 playlist for youbute indian speakers with .vtt subtitle file.

.vtt file format like starting-ending timing with the audio transcript. i was segmenting that youtube audiofile with Start-End time.

and i applied some preprocessing like data cleaning, wav file format 16bit 16khz mono, and then use it deepspeech training.

step 1: create youtube speakers playlist text file.

youtube_news.txt

step 2: downloading .mp3 playlist for youbute indian speakers with .vtt subtitle file

python3 youtube_download.py

step 3: segmenting that youtube audiofile with Start-End time

python3 text1.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
text1.py		text1.py
train_demo1.csv		train_demo1.csv
youtube_download.py		youtube_download.py
youtube_news.txt		youtube_news.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

text1.py

text1.py

train_demo1.csv

train_demo1.csv

youtube_download.py

youtube_download.py

youtube_news.txt

youtube_news.txt

Repository files navigation

youtube-audio-and-transcript-extract

how can i handle youtube dataset with the indian accent. then segmented with a correct transcript?

step 1: create youtube speakers playlist text file.

step 2: downloading .mp3 playlist for youbute indian speakers with .vtt subtitle file

step 3: segmenting that youtube audiofile with Start-End time

About

Releases

Packages

Languages

Rajdeep97/youtube-audio-and-transcript-extract

Folders and files

Latest commit

History

Repository files navigation

youtube-audio-and-transcript-extract

how can i handle youtube dataset with the indian accent. then segmented with a correct transcript?

step 1: create youtube speakers playlist text file.

step 2: downloading .mp3 playlist for youbute indian speakers with .vtt subtitle file

step 3: segmenting that youtube audiofile with Start-End time

About

Resources

Stars

Watchers

Forks

Languages