pytorch-labs / gpt-fast Public

Notifications You must be signed in to change notification settings
Fork 464
Star 5.2k

Code
Issues 57
Pull requests 35
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/gpt-fast

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

57 Open 29 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Missing Keys in state_dict

#172 opened May 6, 2024 by bjohn22

Tensor Parallel Inside notebook

#167 opened Apr 29, 2024 by nivibilla

mmap issue in bf16 of gpt-fast

#165 opened Apr 28, 2024 by yanbing-j

Naming: n_local_heads -> n_kv_heads

#162 opened Apr 23, 2024 by ad8e

INT4 quantization not working on MI210

#154 opened Apr 8, 2024 by yafehlis

int8 Woq raise Codegen Error with --compile_prefill

#144 opened Mar 22, 2024 by yanbing-j

Question about large sequence length attention kernels

#140 opened Mar 19, 2024 by loubbrad

CUDA error if enabling compile_prefill for quantization model (int8)

#137 opened Mar 14, 2024 by yanboliang

int4/int4-gptq support in Mixtral 8x7B

#129 opened Mar 11, 2024 by yanbing-j

Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization

#127 opened Mar 8, 2024 by daniyal214

index out of range: No transformer config could be loaded

#126 opened Mar 8, 2024 by SinanAkkoyun

Int4 perplexity

#125 opened Mar 7, 2024 by SinanAkkoyun

Question about the gennerated code of WeightOnlyInt8Linear

#114 opened Feb 29, 2024 by feiyuvl

batching/dynamic batching

#112 opened Feb 27, 2024 by nivibilla

Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation

#111 opened Feb 27, 2024 by duanzhaol

What happens to bias during int8 quantization?

#108 opened Feb 24, 2024 by gchhablani

Questions on Speculative Decoding in gpt-fast generate.py

#107 opened Feb 23, 2024 by hxer7963

Bandwidth achieved for INT8 is much smaller than FP16

#99 opened Feb 6, 2024 by yafehlis

I try to speed up with llava,but this it slower then eager mode,why?

#92 opened Jan 31, 2024 by bleedingfight

Size mismatch error occurs when loading models quantized by GPTQ

#88 opened Jan 23, 2024 by sdc17

RuntimeError: CUDA error: named symbol not found

#87 opened Jan 22, 2024 by ce1190222

torch.compile leads to OOM with different prompts.

#81 opened Jan 10, 2024 by samuelstevens

Code is extremely slow!

#78 opened Jan 9, 2024 by yafehlis

Does gpt-fast work on V100 GPUs?

#72 opened Jan 3, 2024 by RomanKoshkin

Device-side assertions’ error when speculative decoding with different length of prompts.

#69 opened Dec 27, 2023 by ZipECHO

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly