Add phi-3 checkpoint #1341

rasbt · 2024-04-23T15:14:22Z

Andrei-Aksionov · 2024-04-23T15:19:42Z

There is a modeling_*.py file.
Good luck 🙂.

rasbt · 2024-04-23T20:27:22Z

There is a modeling_*.py file.
Good luck 🙂.

Haha, I finally get the weights loaded but of course it's never easy ... of course it's generating gibberish

⚡ phi-3-checkpoint ~/litgpt litgpt chat --checkpoint_dir checkpoints/microsoft/Phi-3-mini-4k-instruct
Now chatting with Phi-3-mini-4k-instruct.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: What do llamas eat?
>> Reply: epsonniformes }).selves }).SSIONunicívo }). EverythingFormsћassaiejalphutureievediennesenticaciónicaciónMilMinigh ninassaselvesselves exhaustselvesonnselvesktionΗracheracheionedΗ Avenoted Bij_+versionsmastevosepsselvesmobileselvesilleryassaucealphasseestoreselvesférFormsiej Mu Kaiser oppienngnatteversionsionedionedversionsSSIONectionaccoossFormassaselves_+uminatesonoSSIONológissancecenteecause_+ienn选uraleʋ Stepalphigosionaliilonverte }).ienn }).ativo Sternsonoiejuralassawnkademselves│uraleativaionedvos_+utschversionsponiej_+icacióniejiewerológvoasonverte shoutioned位ionedIdentmobi

Let the easter egg hunt begin 😭

rasbt · 2024-04-24T13:17:49Z

Some more tidbits via Daniel Han:

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to
@UnslothAI
:

Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
Upcasted RoPE? Like Gemma?
Dynamic RoPE for 128K context lengths
Fused MLP & QKV - need to unfuse
MMLU evals are very different betw the Phi team Llama-3 team - why?

Andrei-Aksionov · 2024-04-24T14:26:42Z

Ok, it's becoming more interesting.
Somewhat I expected from LlaMA 3, but it didn't deliver.

carmocca · 2024-04-24T14:35:53Z

litgpt/model.py

@@ -298,6 +298,20 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.proj(x)


+class Phi3MLP(nn.Module):


It should be possible to not need this class at all and instead reshape the weights for LLaMAMLP in the checkpoint conversion.

100% agree. I was thinking the same thing. Similar to OLMo, I was hoping to get it working first and then simplify from there.

New models by Apple have if-else statements for this case: https://huggingface.co/apple/OpenELM-270M-Instruct/blob/main/modeling_openelm.py#L405-L462

For simplicity we definitely shouldn't do the same.

rasbt · 2024-04-24T17:21:18Z

Looks like the sliding window number was a typo: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/commit/b043e05a86cfc77f8d53eb0edf6a33e39afbcb5e

Andrei-Aksionov · 2024-04-25T14:07:37Z

Current code is an ugly state, but at least the model produces the same output as HF one.
The most notable change is that now Phi3 model doesn't use parallel_residual in contrast to Phi1.5 and Phi2.

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens.
If I'm not mistaken, the current code doesn't add these tokens.

litgpt/scripts/convert_hf_checkpoint.py

litgpt/model.py

rasbt · 2024-04-25T16:10:47Z

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens.
If I'm not mistaken, the current code doesn't add these tokens.

Yeah, that sounds about right based on the Phi-3 paper:

To best benefit the open source community, phi-3-mini is built upon a similar block structure as Llama-2 [TLI+23] and uses the same tokenizer with vocabulary size of 320641

litgpt/prompts.py

Add phi-3 checkpoint

93f3024

rasbt requested review from awaelchli, carmocca and lantiga as code owners April 23, 2024 15:14

rasbt marked this pull request as draft April 23, 2024 15:14

rasbt added 2 commits April 23, 2024 16:28

progress

1012eaf

weight loading works

581e27f

Dev-Khant mentioned this pull request Apr 24, 2024

Add support for phi-3-mini #1345

Closed

carmocca reviewed Apr 24, 2024

View reviewed changes

Andrei-Aksionov added 7 commits April 25, 2024 12:06

Convert Phi3 qkv into an interleaved one

40fe01f

Config: Phi3 doesn't use parallel residual

0322ecd

Fix layer shapes in Phi3MLP

7f33850

Config: update vocab size

1b217ba

Add prompt

6fc4a7c

Merge branch 'main' into phi-3-checkpoint

ba1c930

Add test for Phi3 model

2ee1e0d

rasbt commented Apr 25, 2024

View reviewed changes

litgpt/scripts/convert_hf_checkpoint.py Show resolved Hide resolved

rasbt commented Apr 25, 2024

View reviewed changes

litgpt/model.py Show resolved Hide resolved

carmocca reviewed Apr 29, 2024

View reviewed changes

litgpt/prompts.py Outdated Show resolved Hide resolved

rasbt commented May 3, 2024

View reviewed changes

litgpt/prompts.py Outdated Show resolved Hide resolved

rasbt added 2 commits May 3, 2024 13:12

Update litgpt/prompts.py

29760ab

Merge branch 'main' into phi-3-checkpoint

6c4cd25

Andrei-Aksionov mentioned this pull request May 7, 2024

Fix issues with LitGPT Tokenizer (SentencePiece and HF Tokenizers) #1396

Draft

rasbt mentioned this pull request May 12, 2024

'Phi-3-mini-4k-instruct' is not a supported config name #1412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add phi-3 checkpoint #1341

Add phi-3 checkpoint #1341

rasbt commented Apr 23, 2024 •

edited

Andrei-Aksionov commented Apr 23, 2024

rasbt commented Apr 23, 2024

rasbt commented Apr 24, 2024

Andrei-Aksionov commented Apr 24, 2024

carmocca Apr 24, 2024

rasbt Apr 24, 2024

Andrei-Aksionov Apr 24, 2024

rasbt commented Apr 24, 2024

Andrei-Aksionov commented Apr 25, 2024

rasbt commented Apr 25, 2024

		@@ -298,6 +298,20 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
		return self.proj(x)


		class Phi3MLP(nn.Module):

Add phi-3 checkpoint #1341

Are you sure you want to change the base?

Add phi-3 checkpoint #1341

Conversation

rasbt commented Apr 23, 2024 • edited

Andrei-Aksionov commented Apr 23, 2024

rasbt commented Apr 23, 2024

rasbt commented Apr 24, 2024

Andrei-Aksionov commented Apr 24, 2024

carmocca Apr 24, 2024

Choose a reason for hiding this comment

rasbt Apr 24, 2024

Choose a reason for hiding this comment

Andrei-Aksionov Apr 24, 2024

Choose a reason for hiding this comment

rasbt commented Apr 24, 2024

Andrei-Aksionov commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt commented Apr 23, 2024 •

edited