Explore Projects

Discover 34 open source projects

Active filters (1):
Search: vision-languageร—
Clear all

Showing 1-20 of 34 projects

Vision-CAIR/MiniGPT-4

MiniGPT-4 and MiniGPT-v2 for vision-language tasks

25.8K
Archived
Python
Computer Vision
LLM Wrappers & SDKs
PyTorch
#vision-language#llm-wrapper#computer-vision

IDEA-Research/GroundingDINO

Official implementation of the paper 'Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection'.

9.8K
Archived
Python
Computer Vision
Python
#object-detection#open-world#open-world-detection

rednote-hilab/dots.ocr

A multilingual document layout parsing model that can extract text, images, and structure from documents in a single vision-language model.

7.9K
Stable
Python
Computer Vision
Component Libraries (React)
React
#document-parsing#ocr#layout-extraction

apple/ml-fastvlm

This repository contains an efficient implementation of a vision encoding model for vision-language models.

7.2K
Experimental
Python
Computer Vision
LLM Frameworks
Python
#computer-vision#vision-language-models#efficient-encoding

OFA-Sys/Chinese-CLIP

Chinese version of CLIP for cross-modal retrieval and representation generation

5.8K
Stable
Jupyter Notebook
Computer Vision
LLM Frameworks
PyTorch
#chinese#clip#computer-vision

salesforce/BLIP

PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

5.7K
Archived
Jupyter Notebook
React
#vision-language#pre-training#unified-vision-language

deepseek-ai/DeepSeek-VL2

A powerful mixture-of-experts vision-language model for advanced multimodal understanding.

5.2K
Experimental
Python
LLM Frameworks
Computer Vision
Python
#computer-vision#multimodal-ai#language-models

PKU-YuanGroup/Video-LLaVA

A large-scale vision-language model for video understanding and generation.

3.5K
Archived
Python
LLM Frameworks
Computer Vision
Python
#large-vision-language-model#video-understanding#multi-modal

JIA-Lab-research/MGM

An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.

3.3K
Archived
Python
LLM Frameworks
Computer Vision
Python
#generation#large-language-models#vision-language-model

SkyworkAI/Skywork-R1V

An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.

3.2K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#multimodal#vision-language#reasoning

jonyzhang2023/awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.

2.6K
Active
Computer Vision
Agents & Orchestration
#embodied-ai#vision-language-action#vision-language-navigation

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#pretrained-models#multimodal#vision-language

illuin-tech/colpali

Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.

2.5K
Active
Python
LLM Frameworks
RAG & Vector
Python
#information-retrieval#vision-language-model#colpali

OmniSVG/OmniSVG

An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.

2.4K
Active
Python
LLM Frameworks
Animation & Motion
Python
#svg-generation#vision-language-models#multimodal-ai

KaiyangZhou/CoOp

A prompt learning framework for vision-language models.

2.2K
Archived
Python
React
#prompt-engineering#multimodal-learning#foundation-models

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K
Experimental
C++
Computer Vision
Document Intelligence
#ocr#document-recognition#document-understanding

Turbo1123/roubao

An Android automation tool based on vision-language models that allows developers to automate mobile app interactions.

1.8K
Active
Kotlin
Computer Vision
Android
Kotlin
#android-automation#vision-language-models#mobile-agents

salesforce/ALBEF

A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.

1.8K
Archived
Python
Computer Vision
Representation Learning
Python
#contrastive-learning#image-text#weakly-supervised-learning

2U1/Qwen-VL-Series-Finetune

An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.

1.7K
Active
Python
Fine-tuning
Vision-Language
Python
#multimodal#qwen2-5-vl#qwen2-vl

ZJU4HealthCare/HealthGPT

Official repository for a paper on a large vision-language model for medical applications

1.6K
Stable
Python
LLM Frameworks
Computer Vision
Python
#medical-ai#vision-language-model#icml
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.