Explore Projects

Discover 21 open source projects

Active filters (1):

Search: vision-language-model×

Clear all

Showing 1-20 of 21 projects

haotian-liu/LLaVA

LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.

24.5K

Archived

Python

Computer Vision

LLM Frameworks

PyTorch

#llava#gpt-4#instruction-tuning

OpenGVLab/InternVL

An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.

9.9K

Stable

Python

LLM Frameworks

Python

#gpt#llm#multimodal

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other powerful models.

8.3K

Active

Python

Computer Vision

ML Ops

Python

#artificial-intelligence#computer-vision#image-annotation

QwenLM/Qwen-VL

Qwen-VL is a large vision language model proposed by Alibaba Cloud for AI-powered coding and development.

6.5K

Archived

Python

LLM Frameworks

Computer Vision

Python

#large-language-model#vision-language-model#alibaba-cloud

volcengine/MineContext

MineContext is an AI-powered platform that provides proactive, context-aware assistance for developers building with AI tools.

5.0K

Active

Python

LLM Frameworks

AI Coding Agents

React

#context-aware#proactive-ai#agent

JIA-Lab-research/MGM

An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.

3.3K

Archived

Python

LLM Frameworks

Computer Vision

Python

#generation#large-language-models#vision-language-model

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatgpt#gpt-4#multimodal

illuin-tech/colpali

Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.

2.5K

Active

Python

LLM Frameworks

RAG & Vector

Python

#information-retrieval#vision-language-model#colpali

BAAI-Agents/Cradle

A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.

2.5K

Archived

Python

Agents & Orchestration

LLM Frameworks

Python

#ai-agents#llm#general-computer-control

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K

Experimental

C++

Computer Vision

Document Intelligence

#ocr#document-recognition#document-understanding

showlab/ShowUI

Open-source end-to-end vision-language-action model for GUI agents and computer usage analysis.

1.7K

Active

Python

Agents & Orchestration

Component Libraries (React)

React

#agent#computer-use#gui-agent

2U1/Qwen-VL-Series-Finetune

An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.

1.7K

Active

Python

Fine-tuning

Vision-Language

Python

#multimodal#qwen2-5-vl#qwen2-vl

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K

Experimental

Jupyter Notebook

LLM Frameworks

Computer Vision

Jupyter Notebook

#multimodal-ai#vision-language-model#large-language-model

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K

Stable

Python

LLM Frameworks

ETL & Pipelines

Python

#document-processing#large-language-models#multimodal

NVlabs/describe-anything

An implementation for detailed localized image and video captioning using large multimodal models.

1.5K

Experimental

Python

Computer Vision

LLM Frameworks

Python

#describe-anything#detailed-localized-captioning#large-multimodal-models

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K

Stable

Python

LLM Frameworks

Vision-Language Model

Python

#chatbot#llama3#multimodal

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K

Active

TypeScript

LLM Frameworks

Generative AI

#japanese#llm#foundation-models

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K

Archived

Python

React

#vision-language-model#multi-task-learning#image-captioning

gokayfem/awesome-vlm-architectures

A curated list of famous vision-language models and their architectures for developers working with AI tools.

1.2K

Active

Markdown

LLM Frameworks

Frontend Frameworks

React

#vision-language-models#multimodal#llm

SkalskiP/vlms-zero-to-hero

This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.

1.2K

Archived

Jupyter Notebook

LLM Frameworks

Computer Vision

#natural-language-processing#computer-vision#vision-language-model

Stay in the loop

Get weekly updates on trending AI coding tools and projects.