Showing 21-40 of 47 projects
Caption-Anything is a versatile AI-powered tool for generating tailored image captions with diverse controls.
A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.
A Linux desktop application that provides live captioning, useful for accessibility and inclusive communication.
A JavaScript library for building customizable HTML5 video players with support for captions, controls, and streaming.
A comprehensive prompt assistant that enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning.
A dense image captioning library in Torch for developers working on computer vision AI projects.
Live Transcribe is an Android app that provides real-time captioning for people who are deaf or hard of hearing.
This PyTorch-based repository provides tools for developing image captioning models.
Bottom-up attention model for image captioning and visual question answering, built on Faster R-CNN and Visual Genome.
An implementation for detailed localized image and video captioning using large multimodal models.
A simple image captioning model built using the CLIP neural network for generating captions for images.
An OBS plugin that enables real-time speech recognition and captioning using AI models like OpenAI Whisper.
Simple Swift class to provide configurations for custom camera views in iOS apps.
A cross-platform desktop application for generating captions and subtitles for video content.
An open-source AI-powered computer vision model for object detection, segmentation, and understanding.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
A Python-based tool for managing and captioning image datasets, with support for various AI models and frameworks.
A C++ plugin for OBS Studio that adds closed captioning functionality using Google Speech Recognition.
A library for generating captions for images using deep learning models.
A curated list of research papers on visual grounding, a key technique for multimodal AI.
Get weekly updates on trending AI coding tools and projects.