Explore projects
-
Updated
-
Updated
-
Updated
-
陈洋 / mTuner
Apache License 2.0Updated -
季玮晔 / ultraattn
Apache License 2.0Updated -
Updated
-
黄云飞 / vllm
Apache License 2.0Updated -
Updated
-
Updated
-
Updated
-
Updated
-
Updated
-
李拯先 / omniserve
Apache License 2.0[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Updated -
李拯先 / KernelBench
MIT LicenseKernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Updated -
llmc is an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
Updated -
董西淼 / sd_embed
Apache License 2.0Updated -
Updated
-
lijian / AutoAWQ
MIT LicenseAutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Updated -
李拯先 / vllm
Apache License 2.0A high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
唐适之 / fairscale
BSD 3-Clause "New" or "Revised" LicensePyTorch extensions for high performance and large scale training.
Updated