Explore Projects

Discover 34 open source projects

Active filters (1):

Search: vision-language×

Clear all

Showing 21-34 of 34 projects

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K

Experimental

Jupyter Notebook

LLM Frameworks

Computer Vision

Jupyter Notebook

#multimodal-ai#vision-language-model#large-language-model

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K

Stable

Python

LLM Frameworks

ETL & Pipelines

Python

#document-processing#large-language-models#multimodal

mbzuai-oryx/Video-ChatGPT

A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.

1.5K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatbot#video-conversation#vision-language

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K

Active

TypeScript

LLM Frameworks

Generative AI

#japanese#llm#foundation-models

zhaochen0110/Awesome_Think_With_Images

A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.

1.3K

Stable

LLM Frameworks

Multimodal Reasoning & Visual Reasoning

#large-vision-language-models#multimodal-reasoning#visual-reasoning

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K

Archived

Python

React

#vision-language-model#multi-task-learning#image-captioning

OpenDriveLab/DriveLM

DriveLM is a graph visual question answering model for autonomous driving tasks, built using large language models.

1.3K

Experimental

HTML

LLM Frameworks

Vision-Language

PyTorch

#autonomous-driving#graph-based-models#visual-question-answering

gokayfem/awesome-vlm-architectures

A curated list of famous vision-language models and their architectures for developers working with AI tools.

1.2K

Active

Markdown

LLM Frameworks

Frontend Frameworks

React

#vision-language-models#multimodal#llm

MoonshotAI/Kimi-VL

Kimi-VL is a multimodal AI model for advanced vision-language understanding and reasoning.

1.2K

Experimental

LLM Frameworks

Agents & Orchestration

#multimodal-ai#vision-language#reasoning

SkalskiP/vlms-zero-to-hero

This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.

1.2K

Archived

Jupyter Notebook

LLM Frameworks

Computer Vision

#natural-language-processing#computer-vision#vision-language-model

yuewang-cuhk/awesome-vision-language-pretraining-papers

A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.

1.2K

Archived

Computer Vision

LLM Frameworks

#vision-language#multimodal#pretrained-models

AIDC-AI/Awesome-Unified-Multimodal-Models

A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.

1.1K

Active

LLM Frameworks

Computer Vision

#multimodal-models#text-to-image-generation#vision-language-model

CASIA-LMC-Lab/AnomalyGPT

AnomalyGPT is a powerful tool for detecting industrial anomalies using large vision-language models.

1.1K

Archived

Python

Computer Vision

LLM Frameworks

Python

#computer-vision#anomaly-detection#industrial-ai

OFA-Sys/ONE-PEACE

A general representation model for cross-modal learning across vision, audio, and language.

1.1K

Archived

Python

LLM Frameworks

Representation Learning

Python

#multimodal#contrastive-learning#foundation-models

Stay in the loop

Get weekly updates on trending AI coding tools and projects.