Explore Projects

Discover 7 open source projects

Active filters (1):

Search: vqa×

Clear all

Showing 1-7 of 7 projects

facebookresearch/mmf

A modular deep learning framework for multimodal AI research and applications from Facebook AI Research (FAIR).

5.6K

Active

Python

LLM Frameworks

Computer Vision

PyTorch

#deep-learning#multimodal#captioning

open-compass/VLMEvalKit

Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.

3.9K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

PyTorch

#chatgpt#llm#multi-modal

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K

Archived

Python

LLM Frameworks

Agents & Orchestration

React

#chatgpt#draggan#imagebind

BDBC-KG-NLP/QA-Survey-CN

A comprehensive survey of various question answering (QA) systems, including KBQA, TextQA, TableQA, VisualQA, and MRC.

1.8K

Archived

Natural Language Processing

Tutorials & Courses

#qa#question-answering#cqa

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.

1.5K

Archived

Jupyter Notebook

Computer Vision

ML Ops

Caffe

#image-captioning#visual-question-answering#faster-rcnn

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K

Archived

Python

React

#vision-language-model#multi-task-learning#image-captioning

microsoft/Oscar

An AI-powered image captioning and image-text search platform for developers building with AI tools.

1.1K

Archived

Python

Computer Vision

Fine-tuning

Python

#image-captioning#image-text-search#vision-and-language

Stay in the loop

Get weekly updates on trending AI coding tools and projects.