Showing 1-20 of 21 projects
LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.
An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.
Effortless data labeling with AI support from Segment Anything and other powerful models.
Qwen-VL is a large vision language model proposed by Alibaba Cloud for AI-powered coding and development.
MineContext is an AI-powered platform that provides proactive, context-aware assistance for developers building with AI tools.
An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.
A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
Open-source end-to-end vision-language-action model for GUI agents and computer usage analysis.
An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.
A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.
A Python library that helps developers extract structured data from tricky documents using vision-language models.
An implementation for detailed localized image and video captioning using large multimodal models.
A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.
Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
A curated list of famous vision-language models and their architectures for developers working with AI tools.
This Jupyter Notebook series covers the fundamentals of NLP and Computer Vision, leading to cutting-edge Vision-Language Models.
Get weekly updates on trending AI coding tools and projects.