Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalร—
Clear all

Showing 101-120 of 178 projects

cambrian-mllm/cambrian

Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.

2.0K
Stable
Python
LLM Frameworks
Computer Vision
Python
#chatbot#computer-vision#large-language-models

microsoft/Magma

Magma is a foundation model for building multimodal AI agents, enabling next-gen AI applications.

1.9K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#foundation-model#multimodal-ai#computer-vision

kyegomez/BitNet

Implementation of a 1-bit Transformer model for large language models in PyTorch.

1.9K
Active
Python
LLM Frameworks
API Frameworks
PyTorch
#artificial-intelligence#deep-learning#transformers

showlab/Show-o

A powerful multimodal transformer for combining language, vision, and other modalities in AI applications.

1.9K
Active
Python
LLM Frameworks
Multimodal
PyTorch
#multimodal-ai#language-models#vision-models

genieincodebottle/generative-ai

Comprehensive resources for developers working with Generative AI, including projects, use cases, and interview prep.

1.9K
Active
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
Jupyter Notebook
#generative-ai#llm#ai-coding

YangLing0818/RPG-DiffusionMaster

A powerful text-to-image diffusion model that can be used for recaptioning, planning, and generating with multimodal LLMs.

1.8K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#text-to-image#image-editing#diffusion

alan-ai/alan-sdk-android

The Alan AI SDK for Android provides a conversational AI platform for building voice assistants and chatbots.

1.8K
Experimental
Prompt Engineering
React
#conversational-ai#voice-assistant#chatbots

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K
Experimental
C++
Computer Vision
Document Intelligence
#ocr#document-recognition#document-understanding

facebookresearch/MetaCLIP

A research project from Facebook that explores multimodal AI models for computer vision and language tasks.

1.8K
Stable
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal-ai#computer-vision#language-models

hymie122/RAG-Survey

A survey paper that proposes a taxonomy of Retrieval-Augmented Generation (RAG) techniques for AI-Generated Content (AIGC).

1.8K
Archived
RAG & Vector
Tutorials & Courses
#aigc#diffusion-models#llm

FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, with outstanding singing lyrics recognition.

1.8K
Active
Python
Speech Recognition
API Frameworks
Python
#asr#speech-recognition#conformer

apple/ml-4m

4M: Massively Multimodal Masked Modeling, a Python library for large-scale multimodal language models

1.8K
Experimental
Python
LLM Frameworks
Databases
None
#multimodal#language-model#dataset

alan-ai/alan-sdk-flutter

The Alan AI SDK for Flutter enables building conversational AI-powered apps and voice interfaces.

1.8K
Experimental
Ruby
AI Voice & Speech
Component Libraries (Flutter)
Flutter
#conversational-ai#voice-assistant#speech-recognition

baaivision/Emu

An open-source library for building generative multimodal AI models, with a focus on foundation models, in-context learning, and multimodal pretraining.

1.8K
Active
Python
LLM Frameworks
Multimodal Pretraining
Python
#foundation-models#generative-pretraining#multimodal

potamides/DeTikZify

A Python library for generating TikZ graphics programs to create scientific figures and sketches with AI-powered tools.

1.7K
Stable
Python
LLM Frameworks
Charts & Visualization
#draw#graph#llm

facebookresearch/multimodal

A PyTorch library for training state-of-the-art multimodal multi-task models at scale.

1.7K
Active
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal#multi-task#computer-vision

2U1/Qwen-VL-Series-Finetune

An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.

1.7K
Active
Python
Fine-tuning
Vision-Language
Python
#multimodal#qwen2-5-vl#qwen2-vl

alan-ai/alan-sdk-ionic

A self-coding system for Ionic apps using AI-powered chatbot and voice assistant SDK.

1.7K
Experimental
TypeScript
React
#ionic#chatbot#conversational-ai

RQLuo/MixTeX-Latex-OCR

A Python-based multimodal OCR tool for efficient offline processing of LaTeX, ZhEn, and tables on Windows.

1.6K
Experimental
Python
Computer Vision
OCR
Python
#ocr#computer-vision#latex

pixeltable/pixeltable

A Python-based data infrastructure platform for declarative, incremental multimodal AI workloads.

1.6K
Active
Python
Vector Databases
Vector Databases
#ai#data-infrastructure#vector-database
1...57...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.