Showing 1-4 of 4 projects
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
Official implementation of the paper 'Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection'.
PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
Get weekly updates on trending AI coding tools and projects.