Showing 1-7 of 7 projects
Novel sparsity technique for LLMs using conditional memory and scalable lookup to reduce model size and improve efficiency.
A sparsity-aware deep learning inference runtime for CPUs, optimized for performance and efficiency.
Optimizes large language models for low-bit precision and sparsity, improving model compression techniques.
SparseML provides a library for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models.
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize machine learning models for deployment, including quantization and pruning.
Neural Network Compression Framework for enhanced OpenVINOโข inference
Get weekly updates on trending AI coding tools and projects.