Explore Projects

Discover 34 open source projects

Active filters (1):

Search: vision-language×

Clear all

Showing 1-20 of 34 projects

Vision-CAIR/MiniGPT-4

MiniGPT-4 and MiniGPT-v2 for vision-language tasks

25.8K

Archived

Python

Computer Vision

LLM Wrappers & SDKs

PyTorch

#vision-language#llm-wrapper#computer-vision

IDEA-Research/GroundingDINO

Official implementation of the paper 'Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection'.

9.8K

Archived

Python

Computer Vision

Python

#object-detection#open-world#open-world-detection

rednote-hilab/dots.ocr

A multilingual document layout parsing model that can extract text, images, and structure from documents in a single vision-language model.

7.9K

Stable

Python

Computer Vision

Component Libraries (React)

React

#document-parsing#ocr#layout-extraction

apple/ml-fastvlm

This repository contains an efficient implementation of a vision encoding model for vision-language models.

7.2K

Experimental

Python

Computer Vision

LLM Frameworks

Python

#computer-vision#vision-language-models#efficient-encoding

OFA-Sys/Chinese-CLIP

Chinese version of CLIP for cross-modal retrieval and representation generation

5.8K

Stable

Jupyter Notebook

Computer Vision

LLM Frameworks

PyTorch

#chinese#clip#computer-vision

salesforce/BLIP

PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

5.7K

Archived

Jupyter Notebook

React

#vision-language#pre-training#unified-vision-language

deepseek-ai/DeepSeek-VL2

A powerful mixture-of-experts vision-language model for advanced multimodal understanding.

5.2K

Experimental

Python

LLM Frameworks

Computer Vision

Python

#computer-vision#multimodal-ai#language-models

PKU-YuanGroup/Video-LLaVA

A large-scale vision-language model for video understanding and generation.

3.5K

Archived

Python

LLM Frameworks

Computer Vision

Python

#large-vision-language-model#video-understanding#multi-modal

JIA-Lab-research/MGM

An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.

3.3K

Archived

Python

LLM Frameworks

Computer Vision

Python

#generation#large-language-models#vision-language-model

SkyworkAI/Skywork-R1V

An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.

3.2K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#multimodal#vision-language#reasoning

jonyzhang2023/awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.

2.6K

Active

Computer Vision

Agents & Orchestration

#embodied-ai#vision-language-action#vision-language-navigation

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K

Archived

Python

LLM Frameworks

Computer Vision

PyTorch

#pretrained-models#multimodal#vision-language

illuin-tech/colpali

Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.

2.5K

Active

Python

LLM Frameworks

RAG & Vector

Python

#information-retrieval#vision-language-model#colpali

OmniSVG/OmniSVG

An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.

2.4K

Active

Python

LLM Frameworks

Animation & Motion

Python

#svg-generation#vision-language-models#multimodal-ai

KaiyangZhou/CoOp

A prompt learning framework for vision-language models.

2.2K

Archived

Python

React

#prompt-engineering#multimodal-learning#foundation-models

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K

Experimental

C++

Computer Vision

Document Intelligence

#ocr#document-recognition#document-understanding

Turbo1123/roubao

An Android automation tool based on vision-language models that allows developers to automate mobile app interactions.

1.8K

Active

Kotlin

Computer Vision

Android

Kotlin

#android-automation#vision-language-models#mobile-agents

salesforce/ALBEF

A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.

1.8K

Archived

Python

Computer Vision

Representation Learning

Python

#contrastive-learning#image-text#weakly-supervised-learning

2U1/Qwen-VL-Series-Finetune

An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.

1.7K

Active

Python

Fine-tuning

Vision-Language

Python

#multimodal#qwen2-5-vl#qwen2-vl

ZJU4HealthCare/HealthGPT

Official repository for a paper on a large vision-language model for medical applications

1.6K

Stable

Python

LLM Frameworks

Computer Vision

Python

#medical-ai#vision-language-model#icml

Stay in the loop

Get weekly updates on trending AI coding tools and projects.