NVIDIA / TensorRT-LLM Public

Notifications
Fork 710
Star 6.8k

Code
Issues 556
Pull requests 92
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 25 Milestones 0

New pull request New

92 Open 156 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix CUDA OOM when creating Mixtral checkpoint

#1629 opened May 19, 2024 by VivekBits2210

Loading…

Fix some issue when build whl for windows

#1621 opened May 17, 2024 by shizidushu

Loading…

enable medusa int8 weight only quantization

#1615 opened May 16, 2024 by XiaobingSuper

Loading…

Add support for non-power-of-two heads with Alibi

#1611 opened May 15, 2024 by vmarkovtsev

Loading…

Fix missing link in perf-best-practices.md

#1587 opened May 13, 2024 by bloodeagle40234

Loading…

Fix the error of Ada traits for fpA_intB.

#1583 opened May 12, 2024 by JamesTheZ

Loading…

[feat]: Support weight only gemm with 2bit triaged

Issue has been triaged by maintainers

#1568 opened May 9, 2024 by gavinchen430

Loading…

Update customAllReduceKernels.cu - line 120's typo was edited

#1558 opened May 8, 2024 by sjbae1999

Loading…

Update perf-best-practices.md

#1545 opened May 6, 2024 by sam-india-007

Loading…

[fix] export failure with CUDA driver < 526 and pynvml>=11.5.0

#1537 opened May 3, 2024 by CoderHam

Loading…

Use first bad_words as extra parameters, and implement min-p

#1536 opened May 2, 2024 by pathorn • Draft

Loading Medusa Safetensors + AWQ Conversion correction triaged

Issue has been triaged by maintainers

#1535 opened May 2, 2024 by Tushar-ml

Loading…

Define hf_config explisitly for convert_hf_mpt_legacy

#1534 opened May 2, 2024 by bloodeagle40234

Loading…

Add note on build Llama v3 neeed more info triaged

Issue has been triaged by maintainers

#1522 opened Apr 29, 2024 by sammcj

Loading…

Update perf-overview.md

#1521 opened Apr 29, 2024 by snowmanwwg

Loading…

Support SDXL and its distributed inference

#1514 opened Apr 28, 2024 by Zars19

Loading…

Remove the <s> token from post_prompt of multimodal

#1508 opened Apr 26, 2024 by yupbank

Loading…

fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference

#1495 opened Apr 24, 2024 by littlefatfat

Loading…

[ModelRunner] Fix stop & bad word list pointer offset.

#1486 opened Apr 22, 2024 by fjosw

Loading…

Support internlm2 triaged

Issue has been triaged by maintainers

#1392 opened Apr 2, 2024 by RunningLeon

Loading…

llama convert add rotary_scaling param in cli_args

#1385 opened Apr 1, 2024 by activezhao

Loading…

[Doc] Fix mistral v0.1 build instructions

#1373 opened Mar 29, 2024 by minwhoo

Loading…

Add SmoothQuant for T5 (decoder only right now)

#1366 opened Mar 27, 2024 by eycheung

Loading…

Relax python dependencies

#1346 opened Mar 24, 2024 by tdeboissiere

Loading…

[feat]: Add Option to convert and run distil-whisper large-v3

#1337 opened Mar 22, 2024 by IbrahimAmin1

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly