LitGPTSDPABenchmark runs incorrect configs #317

vedaanta · 2024-04-30T20:33:50Z

🐛 Bug

LitGPTSDPABenchmark, located here, does not correctly run the litgpt config provided to it.
This is due using NanoGPTConfig underneath, located here.

This creates multiple issues when benchmarking, as incorrect sdpa operation is launched. For example:

litgpt never has dropout ON. But as NanoGPTConfig has default dropout of 0.1, all sdpa operations have dropout ON.
Running GQA sizes, with config.n_query_groups, it is not propagated into the model.

To Reproduce

from thunder.benchmarks import LitGPTSDPABenchmark, Benchmark
import thunder
import torch

bench: Benchmark = LitGPTSDPABenchmark(
        config="Llama-2-7b-hf", device="cuda:0", dtype=thunder.bfloat16, requires_grad=True
    )

args, kwargs = bench.make_batch()
torch.cuda.synchronize()
fn = bench.fn()

jfn = thunder.jit(fn)
jfn(*args, **kwargs)

print(thunder.last_traces(jfn)[-1])

Outputs:

# Constructed by Delete Last Used (took 0 milliseconds)
import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast
def augmented_forward_fn(q, k, v):
  # q: "cuda:0 bf16[16, 32, 4096, 128]"
  # k: "cuda:0 bf16[16, 32, 4096, 128]"
  # v: "cuda:0 bf16[16, 32, 4096, 128]"
  (t0, t1, t2, t3, _, _, t4, t5, _) = sdpafx_grad_forward_scaled_dot_product_efficient_attention(q, k, v, 0.1, True, scale=None)
  return {'output': t0, 'flat_args': [q, k, v], 'flat_output': (t0,)}, ((k, q, t0, t1, t2, t3, t4, t5, v), (True, 0.1))

Notice the is_causal=True, and dropout_p=0.1.
litgpt model has no code path to enable both of them. code

Expected behavior

LitGPTSDPABenchmark runs the exact parameters specified by litgpt configs.

cc @crcrpar @carmocca

The text was updated successfully, but these errors were encountered:

mruberry · 2024-05-06T19:09:39Z

triage review — @riccardofelluga, is this something you'd like to look at?

riccardofelluga · 2024-05-07T10:57:15Z

Sure! If it's not urgent I can give it a look as soon as I have a minute

carmocca · 2024-05-07T15:06:32Z

I can help here too

carmocca · 2024-05-07T15:15:55Z

Running GQA sizes, with config.n_query_groups, it is not propagated into the model.

Note that litgpt expands this dimension during training for flash attention support https://github.com/Lightning-AI/litgpt/blob/main/litgpt/model.py#L226-L231. So n_query_groups argument doesn't impact the SDPA call.

vedaanta added bug Something isn't working benchmarking cudnn sdpa labels Apr 30, 2024

mruberry added triage review and removed triage review labels May 6, 2024

carmocca mentioned this issue May 7, 2024

Remove dropout from LitGPTSDPABenchmark #378

Merged

riccardofelluga assigned riccardofelluga and carmocca and unassigned riccardofelluga May 8, 2024

t-vi closed this as completed in #378 May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LitGPTSDPABenchmark runs incorrect configs #317

LitGPTSDPABenchmark runs incorrect configs #317

vedaanta commented Apr 30, 2024 •

edited by github-actions bot

mruberry commented May 6, 2024

riccardofelluga commented May 7, 2024

carmocca commented May 7, 2024

carmocca commented May 7, 2024 •

edited

LitGPTSDPABenchmark runs incorrect configs #317

LitGPTSDPABenchmark runs incorrect configs #317

Comments

vedaanta commented Apr 30, 2024 • edited by github-actions bot

🐛 Bug

To Reproduce

Expected behavior

mruberry commented May 6, 2024

riccardofelluga commented May 7, 2024

carmocca commented May 7, 2024

carmocca commented May 7, 2024 • edited

vedaanta commented Apr 30, 2024 •

edited by github-actions bot

carmocca commented May 7, 2024 •

edited