Explore Projects

Discover 47 open source projects

Active filters (1):
Search: captioningร—
Clear all

Showing 21-40 of 47 projects

ttengwang/Caption-Anything

Caption-Anything is a versatile AI-powered tool for generating tailored image captions with diverse controls.

1.8K
Archived
Python
Computer Vision
LLM Frameworks
React
#image-captioning#controllable-generation#chatgpt

salesforce/ALBEF

A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.

1.8K
Archived
Python
Computer Vision
Representation Learning
Python
#contrastive-learning#image-text#weakly-supervised-learning

abb128/LiveCaptions

A Linux desktop application that provides live captioning, useful for accessibility and inclusive communication.

1.7K
Experimental
C
Animation & Motion
CLI Tools
#accessibility#inclusive#captioning

BMSVieira/moovie.js

A JavaScript library for building customizable HTML5 video players with support for captions, controls, and streaming.

1.7K
Archived
JavaScript
Component Libraries (JavaScript)
Video Players
JavaScript
#player#captions#controls

yawiii/ComfyUI-Prompt-Assistant

A comprehensive prompt assistant that enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning.

1.6K
Active
JavaScript
Prompt Engineering
MCP Frameworks
React
#prompt#translation#expansion

jcjohnson/densecap

A dense image captioning library in Torch for developers working on computer vision AI projects.

1.6K
Archived
Jupyter Notebook
Computer Vision
Torch
#computer-vision#image-captioning#deep-learning

google/live-transcribe-speech-engine

Live Transcribe is an Android app that provides real-time captioning for people who are deaf or hard of hearing.

1.5K
Archived
Java
AI Voice & Speech
#accessibility#captioning#transcription

ruotianluo/ImageCaptioning.pytorch

This PyTorch-based repository provides tools for developing image captioning models.

1.5K
Archived
Python
Computer Vision
PyTorch
#machine-learning#computer-vision#image-captioning

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.

1.5K
Archived
Jupyter Notebook
Computer Vision
ML Ops
Caffe
#image-captioning#visual-question-answering#faster-rcnn

NVlabs/describe-anything

An implementation for detailed localized image and video captioning using large multimodal models.

1.5K
Experimental
Python
Computer Vision
LLM Frameworks
Python
#describe-anything#detailed-localized-captioning#large-multimodal-models

rmokady/CLIP_prefix_caption

A simple image captioning model built using the CLIP neural network for generating captions for images.

1.4K
Archived
Jupyter Notebook
Computer Vision
LLM Frameworks
PyTorch
#image-captioning#clip#computer-vision

royshil/obs-localvocal

An OBS plugin that enables real-time speech recognition and captioning using AI models like OpenAI Whisper.

1.4K
Active
C++
AI Voice & Speech
Realtime
#live-streaming#realtime-transcription#speech-recognition

imaginary-cloud/CameraManager

Simple Swift class to provide configurations for custom camera views in iOS apps.

1.4K
Archived
Swift
Component Libraries (Swift)
iOS
Swift
#camera#qrcode-reader#video-recording

gielcobben/caption

A cross-platform desktop application for generating captions and subtitles for video content.

1.4K
Archived
JavaScript
Desktop App
General Utilities
Electron
#subtitle#caption#video

IDEA-Research/DINO-X-API

An open-source AI-powered computer vision model for object detection, segmentation, and understanding.

1.3K
Experimental
Python
Computer Vision
API Frameworks
Python
#open-set-object-detection#open-set-object-segmentation#pose-estimation

NVlabs/prismer

Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.

1.3K
Archived
Python
React
#vision-language-model#multi-task-learning#image-captioning

jhc13/taggui

A Python-based tool for managing and captioning image datasets, with support for various AI models and frameworks.

1.3K
Stable
Python
Computer Vision
Component Libraries (React)
#image-tagging#image-captioning#llava

ratwithacompiler/OBS-captions-plugin

A C++ plugin for OBS Studio that adds closed captioning functionality using Google Speech Recognition.

1.3K
Stable
C++
API Frameworks
AI Voice & Speech
#closed-captioning#speech-recognition#obs-studio

tylin/coco-caption

A library for generating captions for images using deep learning models.

1.2K
Archived
Jupyter Notebook
Computer Vision
Frontend Frameworks
Jupyter Notebook
#computer-vision#deep-learning#image-captioning

TheShadow29/awesome-grounding

A curated list of research papers on visual grounding, a key technique for multimodal AI.

1.1K
Stable
Computer Vision
Language Grounding
#computer-vision#language-grounding#multimodal-ai

Stay in the loop

Get weekly updates on trending AI coding tools and projects.