Showing 41-60 of 133 projects
Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.
An optimized library for efficient multi-GPU communication in deep learning applications.
A fast C++/CUDA neural network framework for high-performance deep learning and rendering.
Fast C++ inference engine for Transformer models, supporting CUDA, MKL, and other optimizations.
A C++ interface for portability across different heterogeneous computing platforms like CUDA and HIP.
A retargetable MLIR-based machine learning compiler and runtime toolkit for AI/ML developers.
Accelerate string operations in C, C++, Python, Rust, and more with SIMD and GPU-powered algorithms.
A CUDA course for developers to learn about GPU computing and parallel processing.
LYGIA is a flexible, multi-language shader library designed for performance, supporting GLSL, HLSL, Metal, WGSL, WEGL, and CUDA.
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
A library of tile primitives for building speedy CUDA kernels, useful for vibe coders working on AI tools.
Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.
A high-performance Transformer library for accelerating AI models on NVIDIA GPUs, including low-precision support.
CUDA Python: A library that brings the power of CUDA to Python, enabling high-performance GPU acceleration.
Open-source neural network chess engine with GPU acceleration and broad hardware support.
PyTorch compiler for NVIDIA GPUs using TensorRT, enabling efficient deep learning inference on CUDA hardware.
A high-performance, auto-diff neural network library for 3D and 4D sparse tensor computations.
This repository provides guidance on optimizing algorithms for CUDA, a framework for parallel computing on NVIDIA GPUs.
RamaLama simplifies local serving of AI models and enables their use for inference in production via containers.
A collection of computer vision and AI projects in Python, C++, and embedded systems for developers.
Get weekly updates on trending AI coding tools and projects.