Showing 1-12 of 12 projects
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.
A PyTorch tutorial for building an image captioning model using the Show, Attend, and Tell technique.
Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.
Caption-Anything is a versatile AI-powered tool for generating tailored image captions with diverse controls.
Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.
Simple Swift class to provide configurations for custom camera views in iOS apps.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
A Python-based tool for managing and captioning image datasets, with support for various AI models and frameworks.
An AI-powered image captioning and image-text search platform for developers building with AI tools.
Unofficial PyTorch implementation of Self-critical Sequence Training for Image Captioning.
Get weekly updates on trending AI coding tools and projects.