Explore Projects

Discover 21 open source projects

Active filters (1):
Search: vision-language-modelร—
Clear all

Showing 1-20 of 21 projects

haotian-liu/LLaVA

LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.

24.5K
Archived
Python
Computer Vision
LLM Frameworks
PyTorch
#llava#gpt-4#instruction-tuning

OpenGVLab/InternVL

An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.

9.9K
Stable
Python
LLM Frameworks
Python
#gpt#llm#multimodal

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other powerful models.

8.3K
Active
Python
Computer Vision
ML Ops
Python
#artificial-intelligence#computer-vision#image-annotation

QwenLM/Qwen-VL

Qwen-VL is a large vision language model proposed by Alibaba Cloud for AI-powered coding and development.

6.5K
Archived
Python
LLM Frameworks
Computer Vision
Python
#large-language-model#vision-language-model#alibaba-cloud

volcengine/MineContext

MineContext is an AI-powered platform that provides proactive, context-aware assistance for developers building with AI tools.

5.0K
Active
Python
LLM Frameworks
AI Coding Agents
React
#context-aware#proactive-ai#agent

JIA-Lab-research/MGM

An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.

3.3K
Archived
Python
LLM Frameworks
Computer Vision
Python
#generation#large-language-models#vision-language-model

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatgpt#gpt-4#multimodal

illuin-tech/colpali

Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.

2.5K
Active
Python
LLM Frameworks
RAG & Vector
Python
#information-retrieval#vision-language-model#colpali

BAAI-Agents/Cradle

A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.

2.5K
Archived
Python
Agents & Orchestration
LLM Frameworks
Python
#ai-agents#llm#general-computer-control

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K
Experimental
C++
Computer Vision
Document Intelligence
#ocr#document-recognition#document-understanding

showlab/ShowUI

Open-source end-to-end vision-language-action model for GUI agents and computer usage analysis.

1.7K
Active
Python
Agents & Orchestration
Component Libraries (React)
React
#agent#computer-use#gui-agent

2U1/Qwen-VL-Series-Finetune

An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.

1.7K
Active
Python
Fine-tuning
Vision-Language
Python
#multimodal#qwen2-5-vl#qwen2-vl

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#multimodal-ai#vision-language-model#large-language-model

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K
Stable
Python
LLM Frameworks
ETL & Pipelines
Python
#document-processing#large-language-models#multimodal

NVlabs/describe-anything

An implementation for detailed localized image and video captioning using large multimodal models.

1.5K
Experimental
Python
Computer Vision
LLM Frameworks
Python
#describe-anything#detailed-localized-captioning#large-multimodal-models

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K
Stable
Python
LLM Frameworks
Vision-Language Model
Python
#chatbot#llama3#multimodal

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K
Active
TypeScript
LLM Frameworks
Generative AI
#japanese#llm#foundation-models

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K
Archived
Python
React
#vision-language-model#multi-task-learning#image-captioning

gokayfem/awesome-vlm-architectures

A curated list of famous vision-language models and their architectures for developers working with AI tools.

1.2K
Active
Markdown
LLM Frameworks
Frontend Frameworks
React
#vision-language-models#multimodal#llm

SkalskiP/vlms-zero-to-hero

This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.

1.2K
Archived
Jupyter Notebook
LLM Frameworks
Computer Vision
#natural-language-processing#computer-vision#vision-language-model
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.