Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining example from readme fails in Colab #1402

Open
AndisDraguns opened this issue May 8, 2024 · 3 comments
Open

Pretraining example from readme fails in Colab #1402

AndisDraguns opened this issue May 8, 2024 · 3 comments
Assignees
Labels
3rd party bug Something isn't working pre-training

Comments

@AndisDraguns
Copy link

Running the pretraining example from GitHub fails when run in Google Colab.

!pip install 'litgpt[all]'

!mkdir -p custom_texts
!curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
!curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a tokenizer
!litgpt download \
  --repo_id EleutherAI/pythia-160m \
  --tokenizer_only True

# 2) Pretrain the model
!litgpt pretrain \
  --model_name pythia-160m \
  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
  --data TextFiles \
  --data.train_data_path "custom_texts/" \
  --train.max_tokens 10_000_000 \
  --out_dir out/custom-model

# 3) Chat with the model
!litgpt chat \
  --checkpoint_dir out/custom-model/final

@awaelchli

@rasbt
Copy link
Collaborator

rasbt commented May 8, 2024

Hi there,
I was just running this code on a A10G in a Lightning Studio, and it worked fine. Do you have the error message you got or more explanation on how or why it failed? One hypothesis is that the memory in Colab may not be sufficient depending on the GPU. However, based on my test, it should only require 8.62 GB on a GPU that supports bfloat16.

@awaelchli awaelchli self-assigned this May 8, 2024
@awaelchli
Copy link
Member

@rasbt we created the issue together at iclr. I will look into it. This is in colab, not studios.

@awaelchli
Copy link
Member

There is a cache path resolution issue in LitData, it needs to be fixed there. Thanks for the repro.
Lightning-AI/litdata#126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party bug Something isn't working pre-training
Projects
None yet
Development

No branches or pull requests

4 participants