Explore Projects

Discover 53 open source projects

Active filters (1):

Search: multi-modal×

Clear all

Showing 41-53 of 53 projects

autonomousvision/transfuser

A Transformer-based sensor fusion model for end-to-end autonomous driving.

1.5K

Stable

Python

Computer Vision

Agents & Orchestration

PyTorch

#autonomous-driving#sensor-fusion#transformers

thu-ml/unidiffuser

An open-source library for training and running state-of-the-art diffusion models in Python.

1.5K

Archived

Python

LLM Frameworks

ML Ops

Python

#diffusion-models#computer-vision#machine-learning

bytedance/SALMONN

SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.

1.4K

Stable

LLM Frameworks

Speech Recognition

#audio-processing#speech-recognition#video-understanding

stepfun-ai/Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model for industry-strength audio understanding and speech conversation.

1.4K

Stable

Python

LLM Frameworks

AI Voice & Speech

Python

#audio-understanding#speech-conversation#multi-modal

DirtyHarryLYL/Transformer-in-Vision

A collection of recent Transformer-based computer vision and related research papers.

1.3K

Archived

Computer Vision

Vision Transformers

PyTorch

#computer-vision#deep-learning#transformer

MedMNIST/MedMNIST

Standardized datasets for 2D and 3D biomedical image classification

1.3K

Archived

Python

PyTorch

#biomedical-image-classification#medical-image-analysis#image-processing

vercel/modelfusion

A TypeScript library for building AI applications with support for various AI models and frameworks.

1.3K

Archived

TypeScript

LLM Frameworks

React

#AI#Machine Learning#Open Source

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

A comprehensive survey on knowledge distillation techniques for large language models.

1.3K

Experimental

LLM Frameworks

Tutorials & Courses

#knowledge-distillation#large-language-model#survey

Phantom-video/HuMo

A Python library for generating human-centric videos using collaborative multi-modal conditioning.

1.2K

Active

Python

Computer Vision

AI Image & Video

Python

#video-generation#multi-modal#machine-learning

showlab/Awesome-GUI-Agent

A curated list of resources for building multi-modal GUI agents using large language models.

1.1K

Stable

Agents & Orchestration

Component Libraries (React)

React

#gui-agents#llm-agent#awesome

SJTU-ViSYS/M2DGR

A multi-modal and multi-scenario dataset for ground robots research and SLAM applications.

1.1K

Experimental

Computer Vision

API Frameworks

#robotics#slam#dataset

apecloud/ApeRAG

A production-ready GraphRAG platform with multi-modal indexing, AI agents, and scalable Kubernetes deployment.

1.1K

Active

Python

Agents & Orchestration

RAG Frameworks

Python

#agents#context-engineering#graphrag

trailofbits/anamorpher

A Python library for discovering and exploiting image scaling attacks for multi-modal prompt injection.

1.0K

Active

Python

Computer Vision

Security Research

Python

#image-scaling#prompt-injection#security-research

1 2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.