Explore projects
-
Updated
-
-
Updated
-
Updated
-
董西淼 / sd_embed
Apache License 2.0Updated -
Updated
-
Updated
-
-
Updated
-
李拯先 / KernelBench
MIT LicenseKernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Updated -
Updated
-
唐适之 / fairscale
BSD 3-Clause "New" or "Revised" LicensePyTorch extensions for high performance and large scale training.
Updated -
Updated
-
黄云飞 / vllm
Apache License 2.0Updated -
季玮晔 / ultraattn
Apache License 2.0Updated -
陈洋 / mTuner
Apache License 2.0Updated -
llmc is an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
Updated -
李拯先 / omniserve
Apache License 2.0[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Updated -
李拯先 / vllm
Apache License 2.0A high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
Updated