Explore Projects

Discover 7 open source projects

Active filters (1):
Search: vqa×
Clear all

Showing 1-7 of 7 projects

facebookresearch/mmf

A modular deep learning framework for multimodal AI research and applications from Facebook AI Research (FAIR).

5.6K
Active
Python
LLM Frameworks
Computer Vision
PyTorch
#deep-learning#multimodal#captioning

open-compass/VLMEvalKit

Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.

3.9K
Active
Python
LLM Frameworks
LLM Wrappers & SDKs
PyTorch
#chatgpt#llm#multi-modal

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K
Archived
Python
LLM Frameworks
Agents & Orchestration
React
#chatgpt#draggan#imagebind

BDBC-KG-NLP/QA-Survey-CN

A comprehensive survey of various question answering (QA) systems, including KBQA, TextQA, TableQA, VisualQA, and MRC.

1.8K
Archived
Natural Language Processing
Tutorials & Courses
#qa#question-answering#cqa

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.

1.5K
Archived
Jupyter Notebook
Computer Vision
ML Ops
Caffe
#image-captioning#visual-question-answering#faster-rcnn

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K
Archived
Python
React
#vision-language-model#multi-task-learning#image-captioning

microsoft/Oscar

An AI-powered image captioning and image-text search platform for developers building with AI tools.

1.1K
Archived
Python
Computer Vision
Fine-tuning
Python
#image-captioning#image-text-search#vision-and-language

Stay in the loop

Get weekly updates on trending AI coding tools and projects.