Explore Projects

Discover 34 open source projects

Active filters (1):
Search: vision-languageร—
Clear all

Showing 21-34 of 34 projects

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#multimodal-ai#vision-language-model#large-language-model

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K
Stable
Python
LLM Frameworks
ETL & Pipelines
Python
#document-processing#large-language-models#multimodal

mbzuai-oryx/Video-ChatGPT

A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.

1.5K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatbot#video-conversation#vision-language

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K
Active
TypeScript
LLM Frameworks
Generative AI
#japanese#llm#foundation-models

zhaochen0110/Awesome_Think_With_Images

A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.

1.3K
Stable
LLM Frameworks
Multimodal Reasoning & Visual Reasoning
#large-vision-language-models#multimodal-reasoning#visual-reasoning

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K
Archived
Python
React
#vision-language-model#multi-task-learning#image-captioning

OpenDriveLab/DriveLM

DriveLM is a graph visual question answering model for autonomous driving tasks, built using large language models.

1.3K
Experimental
HTML
LLM Frameworks
Vision-Language
PyTorch
#autonomous-driving#graph-based-models#visual-question-answering

gokayfem/awesome-vlm-architectures

A curated list of famous vision-language models and their architectures for developers working with AI tools.

1.2K
Active
Markdown
LLM Frameworks
Frontend Frameworks
React
#vision-language-models#multimodal#llm

MoonshotAI/Kimi-VL

Kimi-VL is a multimodal AI model for advanced vision-language understanding and reasoning.

1.2K
Experimental
LLM Frameworks
Agents & Orchestration
#multimodal-ai#vision-language#reasoning

SkalskiP/vlms-zero-to-hero

This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.

1.2K
Archived
Jupyter Notebook
LLM Frameworks
Computer Vision
#natural-language-processing#computer-vision#vision-language-model

yuewang-cuhk/awesome-vision-language-pretraining-papers

A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.

1.2K
Archived
Computer Vision
LLM Frameworks
#vision-language#multimodal#pretrained-models

AIDC-AI/Awesome-Unified-Multimodal-Models

A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.

1.1K
Active
LLM Frameworks
Computer Vision
#multimodal-models#text-to-image-generation#vision-language-model

CASIA-LMC-Lab/AnomalyGPT

AnomalyGPT is a powerful tool for detecting industrial anomalies using large vision-language models.

1.1K
Archived
Python
Computer Vision
LLM Frameworks
Python
#computer-vision#anomaly-detection#industrial-ai

OFA-Sys/ONE-PEACE

A general representation model for cross-modal learning across vision, audio, and language.

1.1K
Archived
Python
LLM Frameworks
Representation Learning
Python
#multimodal#contrastive-learning#foundation-models
1

Stay in the loop

Get weekly updates on trending AI coding tools and projects.