Explore Projects

Discover 12 open source projects

Active filters (1):
Search: vision-and-languageร—
Clear all

Showing 1-12 of 12 projects

aishwaryanr/awesome-generative-ai-guide

Comprehensive resource for generative AI research, interviews, and courses

25.1K
Active
HTML
LLM Wrappers & SDKs
Tutorials & Courses
#generative-ai#llms#interview-prep

salesforce/LAVIS

LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.

11.2K
Archived
Jupyter Notebook
Vision-Language Transformer
PyTorch
#deep-learning#multimodal-learning#vision-language

salesforce/ALBEF

A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.

1.8K
Archived
Python
Computer Vision
Representation Learning
Python
#contrastive-learning#image-text#weakly-supervised-learning

dandelin/ViLT

A Vision-and-Language Transformer model for multimodal tasks without the need for convolution or region supervision.

1.5K
Archived
Python
Computer Vision
LLM Frameworks
Python
#vision-language#multimodal#transformer

open-mmlab/Multimodal-GPT

Multimodal-GPT is a powerful library for building AI-powered applications that leverage multimodal data like text, images, and more.

1.5K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal#gpt#llama

om-ai-lab/OmDet

Real-time and accurate open-vocabulary end-to-end object detection library for computer vision applications.

1.4K
Archived
Python
Computer Vision
API Frameworks
Python
#computer-vision#object-detection#real-time

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K
Active
TypeScript
LLM Frameworks
Generative AI
#japanese#llm#foundation-models

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K
Archived
Python
React
#vision-language-model#multi-task-learning#image-captioning

yuewang-cuhk/awesome-vision-language-pretraining-papers

A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.

1.2K
Archived
Computer Vision
LLM Frameworks
#vision-language#multimodal#pretrained-models

rhymes-ai/Aria

Aria is an open-source multimodal AI framework for building vision and language models.

1.1K
Archived
Jupyter Notebook
Agents & Orchestration
Computer Vision
Jupyter Notebook
#multimodal#vision-and-language#mixture-of-experts

OFA-Sys/ONE-PEACE

A general representation model for cross-modal learning across vision, audio, and language.

1.1K
Archived
Python
LLM Frameworks
Representation Learning
Python
#multimodal#contrastive-learning#foundation-models

microsoft/Oscar

An AI-powered image captioning and image-text search platform for developers building with AI tools.

1.1K
Archived
Python
Computer Vision
Fine-tuning
Python
#image-captioning#image-text-search#vision-and-language

Stay in the loop

Get weekly updates on trending AI coding tools and projects.