Explore Projects

Discover 53 open source projects

Active filters (1):
Search: multi-modalร—
Clear all

Showing 41-53 of 53 projects

autonomousvision/transfuser

A Transformer-based sensor fusion model for end-to-end autonomous driving.

1.5K
Stable
Python
Computer Vision
Agents & Orchestration
PyTorch
#autonomous-driving#sensor-fusion#transformers

thu-ml/unidiffuser

An open-source library for training and running state-of-the-art diffusion models in Python.

1.5K
Archived
Python
LLM Frameworks
ML Ops
Python
#diffusion-models#computer-vision#machine-learning

bytedance/SALMONN

SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.

1.4K
Stable
LLM Frameworks
Speech Recognition
#audio-processing#speech-recognition#video-understanding

stepfun-ai/Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model for industry-strength audio understanding and speech conversation.

1.4K
Stable
Python
LLM Frameworks
AI Voice & Speech
Python
#audio-understanding#speech-conversation#multi-modal

DirtyHarryLYL/Transformer-in-Vision

A collection of recent Transformer-based computer vision and related research papers.

1.3K
Archived
Computer Vision
Vision Transformers
PyTorch
#computer-vision#deep-learning#transformer

MedMNIST/MedMNIST

Standardized datasets for 2D and 3D biomedical image classification

1.3K
Archived
Python
PyTorch
#biomedical-image-classification#medical-image-analysis#image-processing

vercel/modelfusion

A TypeScript library for building AI applications with support for various AI models and frameworks.

1.3K
Archived
TypeScript
LLM Frameworks
React
#AI#Machine Learning#Open Source

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

A comprehensive survey on knowledge distillation techniques for large language models.

1.3K
Experimental
LLM Frameworks
Tutorials & Courses
#knowledge-distillation#large-language-model#survey

Phantom-video/HuMo

A Python library for generating human-centric videos using collaborative multi-modal conditioning.

1.2K
Active
Python
Computer Vision
AI Image & Video
Python
#video-generation#multi-modal#machine-learning

showlab/Awesome-GUI-Agent

A curated list of resources for building multi-modal GUI agents using large language models.

1.1K
Stable
Agents & Orchestration
Component Libraries (React)
React
#gui-agents#llm-agent#awesome

SJTU-ViSYS/M2DGR

A multi-modal and multi-scenario dataset for ground robots research and SLAM applications.

1.1K
Experimental
Computer Vision
API Frameworks
#robotics#slam#dataset

apecloud/ApeRAG

A production-ready GraphRAG platform with multi-modal indexing, AI agents, and scalable Kubernetes deployment.

1.1K
Active
Python
Agents & Orchestration
RAG Frameworks
Python
#agents#context-engineering#graphrag

trailofbits/anamorpher

A Python library for discovering and exploiting image scaling attacks for multi-modal prompt injection.

1.0K
Active
Python
Computer Vision
Security Research
Python
#image-scaling#prompt-injection#security-research

Stay in the loop

Get weekly updates on trending AI coding tools and projects.