Popular repositories Loading
-
-
GPTQModel
GPTQModel PublicForked from ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Python
-
llm-compressor
llm-compressor PublicForked from vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python
-
AQLM
AQLM PublicForked from Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
Python
-
compressed-tensors
compressed-tensors PublicForked from vllm-project/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Python
-
TensorRT-Model-Optimizer
TensorRT-Model-Optimizer PublicForked from NVIDIA/TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
Python
If the problem persists, check the GitHub status page or contact support.

