Explore Projects

Discover 47 open source projects

Active filters (1):

Search: captions×

Showing 1-20 of 47 projects

IDEA-Research/Grounded-Segment-Anything

A set of Jupyter Notebooks that combine Grounding DINO, Segment Anything, and Stable Diffusion for automatic detection, segmentation, and generation of anything in images.

17.4K

Archived

Jupyter Notebook

Computer Vision

Jupyter Notebook

#computer-vision#image-segmentation#image-generation

chenyuntc/pytorch-book

PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation

12.8K

Archived

Jupyter Notebook

Frameworks

PyTorch

#deep-learning#neural-networks#computer-vision

instaloader/instaloader

A Python library for downloading photos, videos, and metadata from Instagram.

11.7K

Active

Python

Backend & APIs

#instagram#instagram-downloader#instagram-scraper

salesforce/LAVIS

LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.

11.2K

Archived

Jupyter Notebook

Vision-Language Transformer

PyTorch

#deep-learning#multimodal-learning#vision-language

mwaterfall/MWPhotoBrowser

A simple iOS photo and video browser with grid view, captions and selections.

8.7K

Archived

Objective-C

Component Libraries (React)

iOS

#photo-viewer#video-browser#grid-view

smacke/ffsubsync

Automagically synchronize subtitles with video using audio alignment and speech detection.

7.6K

Stable

Python

AI Audio & Speech

API Frameworks

#audio-alignment#speech-detection#subtitle-synchronization

jdepoix/youtube-transcript-api

A Python API to get YouTube video transcripts without an API key or headless browser

7.0K

Active

Python

API Clients & Testing

Video

#youtube#transcripts#captions

vladmandic/sdnext

All-in-one WebUI for AI generative image and video creation, captioning and processing

7.0K

Active

Python

AI Image & Video

Prompt Engineering

React

#ai-art#stable-diffusion#generative-art

salesforce/BLIP

PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

5.7K

Archived

Jupyter Notebook

React

#vision-language#pre-training#unified-vision-language

facebookresearch/mmf

A modular deep learning framework for multimodal AI research and applications from Facebook AI Research (FAIR).

5.6K

Active

Python

LLM Frameworks

Computer Vision

PyTorch

#deep-learning#multimodal#captioning

karpathy/neuraltalk2

Efficient image captioning code in Torch that runs on GPU for vibe coders working with AI tools.

5.6K

Archived

Jupyter Notebook

Computer Vision

LLM Frameworks

Torch

#image-captioning#torch#gpu

ashnkumar/sketch-code

A Keras model that generates HTML code from hand-drawn website mockups using an image captioning architecture.

5.2K

Archived

Python

Computer Vision

Component Libraries (React)

Keras

#image-processing#deep-learning#code-generation

r0oth3x49/udemy-dl

A Python-based utility to download courses from Udemy for personal offline use across multiple platforms.

4.9K

Archived

Python

CLI Tools

Backend Frameworks

Python

#udemy#download#cross-platform

OpenGVLab/Ask-Anything

An open-source project that enables developers to build chatbots with video understanding using large language models.

3.3K

Archived

Python

React

#chatbots#video-understanding#large-language-models

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K

Archived

Python

LLM Frameworks

Agents & Orchestration

React

#chatgpt#draggan#imagebind

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

A PyTorch tutorial for building an image captioning model using the Show, Attend, and Tell technique.

2.9K

Archived

Python

Computer Vision

Tutorials & Courses

PyTorch

#image-captioning#attention-mechanism#encoder-decoder

1c7/Translate-Subtitle-File

Translate subtitle files (.srt, .ass, .vtt) with customizable API keys for affordable pricing.

2.6K

Stable

Component Libraries (Vue/Svelte)

CLI Tools

Electron

#subtitle#translation#srt

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K

Archived

Python

LLM Frameworks

Computer Vision

PyTorch

#pretrained-models#multimodal#vision-language

stephengpope/no-code-architects-toolkit

A free API toolkit for businesses, creators, and developers to streamline advanced media processing, including video editing, image transformations, and Python code execution.

2.2K

Active

Python

File Storage

CMS & Content

Python

#media-processing#video-editing#image-transformation

krzemienski/awesome-video

A curated list of awesome streaming video tools, frameworks, libraries, and learning resources.

1.8K

Experimental

HTML

Video Streaming

#video#streaming#ffmpeg

2 3

Stay in the loop

Get weekly updates on trending AI coding tools and projects.