Showing 1-12 of 12 projects
Comprehensive resource for generative AI research, interviews, and courses
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.
A Vision-and-Language Transformer model for multimodal tasks without the need for convolution or region supervision.
Multimodal-GPT is a powerful library for building AI-powered applications that leverage multimodal data like text, images, and more.
Real-time and accurate open-vocabulary end-to-end object detection library for computer vision applications.
Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.
Aria is an open-source multimodal AI framework for building vision and language models.
A general representation model for cross-modal learning across vision, audio, and language.
An AI-powered image captioning and image-text search platform for developers building with AI tools.
Get weekly updates on trending AI coding tools and projects.