Explore Projects

Discover 53 open source projects

Active filters (1):
Search: multi-modality×
Clear all

Showing 1-20 of 53 projects

haotian-liu/LLaVA

LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.

24.5K
Archived
Python
Computer Vision
LLM Frameworks
PyTorch
#llava#gpt-4#instruction-tuning

OpenBMB/MiniCPM-o

On-device multimodal LLM for vision, speech, and live streaming on phones

24.0K
Active
Python
Inference
Local Inference Engines
llama.cpp-omni
#minicpm-o#multimodal-llm#on-device-ai

lss233/kirara-ai

A customizable, multi-modal AI chatbot that can be integrated with various chat platforms and leverages LLMs like ChatGPT, Bard, and GPT-3.

18.4K
Experimental
Python
LLM Frameworks
React
#chatbot#llm#ai-assistant

agentscope-ai/agentscope

AgentScope is a Python library for building applications using agent-oriented programming and large language models.

17.6K
Active
Python
LLM Frameworks
React
#agent#chatbot#llm

BradyFU/Awesome-Multimodal-Large-Language-Models

A collection of multimodal large language models and their latest advances.

17.4K
Active
React
#large-language-models#multimodal-chain-of-thought#in-context-learning

HKUDS/RAG-Anything

An all-in-one Retrieval-Augmented Generation (RAG) framework for building multi-modal AI applications.

14.0K
Active
Python
RAG & Vector
Python
#multi-modal-ai#retrieval-augmented-generation#rag-framework

mlfoundations/open_clip

Open source implementation of CLIP, a contrastive learning model for multi-modal tasks like zero-shot classification.

13.5K
Stable
Python
Computer Vision
PyTorch
#computer-vision#contrastive-learning#pretrained-model

jina-ai/clip-as-service

Scalable embedding, reasoning, ranking for images and sentences with CLIP

12.8K
Archived
Python
React
#authentication#streaming#real-time

TEN-framework/ten-framework

Open-source framework for building conversational voice AI agents

10.2K
Active
Python
React
#authentication#real-time#type-safe

OpenGVLab/InternVL

An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.

9.9K
Stable
Python
LLM Frameworks
Python
#gpt#llm#multimodal

activeloopai/deeplake

Versatile database for AI, supporting storage, querying, versioning, and visualization of any AI data.

9.0K
Active
C++
LLM Frameworks
Vector Databases
PyTorch
#ai#data-storage#vector-database

modelscope/modelscope

ModelScope is an open-source AI framework that brings the notion of Model-as-a-Service to life, providing a comprehensive suite of tools for building, deploying, and managing AI models.

8.8K
Active
Python
LLM Frameworks
Computer Vision
Python
#machine-learning#deep-learning#computer-vision

enricoros/big-AGI

AI suite with advanced AI/AGI functions, including personas, multi-model chats, text-to-image, voice, and more.

6.9K
Active
TypeScript
LLM Frameworks
Agents & Orchestration
TypeScript
#agi#ai-agents#ai-suite

zai-org/CogVLM

A state-of-the-art open visual language model for multimodal pretraining and applications.

6.7K
Archived
Python
LLM Frameworks
Computer Vision
Python
#cross-modality#language-model#multi-modal

datajuicer/data-juicer

A Python library for processing and analyzing data with foundation models and large language models.

6.0K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#data-processing#data-analysis#foundation-models

OFA-Sys/Chinese-CLIP

Chinese version of CLIP for cross-modal retrieval and representation generation

5.8K
Stable
Jupyter Notebook
Computer Vision
LLM Frameworks
PyTorch
#chinese#clip#computer-vision

valhalla/valhalla

Open-source routing engine for OpenStreetMap that provides advanced navigation features like directions, isochrones, and traveling salesman.

5.5K
Active
C++
API Frameworks
Databases
#routing#directions#openstreetmap

hiyouga/EasyR1

EasyR1 is an efficient and scalable multi-modality reinforcement learning training framework based on veRL.

4.7K
Active
Python
Reinforcement Learning
#reinforcement-learning#ai#deepseek

zjunlp/DeepKE

An open toolkit for knowledge graph extraction and construction in Python.

4.3K
Experimental
Python
LLM Frameworks
Knowledge Graphs
PyTorch
#knowledge-graph#information-extraction#natural-language-processing

VectorSpaceLab/OmniGen

OmniGen is a unified image generation library that supports diffusion models, multi-modal and multi-task learning.

4.3K
Stable
Jupyter Notebook
Computer Vision
Image & Video
Jupyter Notebook
#diffusion#image-generation#multi-modal

Stay in the loop

Get weekly updates on trending AI coding tools and projects.