Showing 1-2 of 2 projects
Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.
A curated list of research papers on visual grounding, a key technique for multimodal AI.
Get weekly updates on trending AI coding tools and projects.