Showing 41-53 of 53 projects
A Transformer-based sensor fusion model for end-to-end autonomous driving.
An open-source library for training and running state-of-the-art diffusion models in Python.
SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.
Step-Audio 2 is an end-to-end multi-modal large language model for industry-strength audio understanding and speech conversation.
A collection of recent Transformer-based computer vision and related research papers.
Standardized datasets for 2D and 3D biomedical image classification
A TypeScript library for building AI applications with support for various AI models and frameworks.
A comprehensive survey on knowledge distillation techniques for large language models.
A Python library for generating human-centric videos using collaborative multi-modal conditioning.
A curated list of resources for building multi-modal GUI agents using large language models.
A multi-modal and multi-scenario dataset for ground robots research and SLAM applications.
A production-ready GraphRAG platform with multi-modal indexing, AI agents, and scalable Kubernetes deployment.
A Python library for discovering and exploiting image scaling attacks for multi-modal prompt injection.
Get weekly updates on trending AI coding tools and projects.