Showing 1-20 of 34 projects
MiniGPT-4 and MiniGPT-v2 for vision-language tasks
Official implementation of the paper 'Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection'.
A multilingual document layout parsing model that can extract text, images, and structure from documents in a single vision-language model.
This repository contains an efficient implementation of a vision encoding model for vision-language models.
Chinese version of CLIP for cross-modal retrieval and representation generation
PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
A powerful mixture-of-experts vision-language model for advanced multimodal understanding.
A large-scale vision-language model for video understanding and generation.
An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.
An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.
A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.
Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.
Open-source library for training and running inference with ColVision models for vision-language retrieval and generation.
An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.
A prompt learning framework for vision-language models.
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
An Android automation tool based on vision-language models that allows developers to automate mobile app interactions.
A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.
An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.
Official repository for a paper on a large vision-language model for medical applications
Get weekly updates on trending AI coding tools and projects.