Showing 21-34 of 34 projects
A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.
A Python library that helps developers extract structured data from tricky documents using vision-language models.
A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.
Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.
A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
DriveLM is a graph visual question answering model for autonomous driving tasks, built using large language models.
A curated list of famous vision-language models and their architectures for developers working with AI tools.
Kimi-VL is a multimodal AI model for advanced vision-language understanding and reasoning.
This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.
A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.
A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.
AnomalyGPT is a powerful tool for detecting industrial anomalies using large vision-language models.
A general representation model for cross-modal learning across vision, audio, and language.
Get weekly updates on trending AI coding tools and projects.