Explore Projects

Discover 47 open source projects

Active filters (1):
Search: captions×
Clear all

Showing 1-20 of 47 projects

IDEA-Research/Grounded-Segment-Anything

A set of Jupyter Notebooks that combine Grounding DINO, Segment Anything, and Stable Diffusion for automatic detection, segmentation, and generation of anything in images.

17.4K
Archived
Jupyter Notebook
Computer Vision
Jupyter Notebook
#computer-vision#image-segmentation#image-generation

chenyuntc/pytorch-book

PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation

12.8K
Archived
Jupyter Notebook
Frameworks
PyTorch
#deep-learning#neural-networks#computer-vision

instaloader/instaloader

A Python library for downloading photos, videos, and metadata from Instagram.

11.7K
Active
Python
Backend & APIs
#instagram#instagram-downloader#instagram-scraper

salesforce/LAVIS

LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.

11.2K
Archived
Jupyter Notebook
Vision-Language Transformer
PyTorch
#deep-learning#multimodal-learning#vision-language

mwaterfall/MWPhotoBrowser

A simple iOS photo and video browser with grid view, captions and selections.

8.7K
Archived
Objective-C
Component Libraries (React)
iOS
#photo-viewer#video-browser#grid-view

smacke/ffsubsync

Automagically synchronize subtitles with video using audio alignment and speech detection.

7.6K
Stable
Python
AI Audio & Speech
API Frameworks
#audio-alignment#speech-detection#subtitle-synchronization

jdepoix/youtube-transcript-api

A Python API to get YouTube video transcripts without an API key or headless browser

7.0K
Active
Python
API Clients & Testing
Video
#youtube#transcripts#captions

vladmandic/sdnext

All-in-one WebUI for AI generative image and video creation, captioning and processing

7.0K
Active
Python
AI Image & Video
Prompt Engineering
React
#ai-art#stable-diffusion#generative-art

salesforce/BLIP

PyTorch code for Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

5.7K
Archived
Jupyter Notebook
React
#vision-language#pre-training#unified-vision-language

facebookresearch/mmf

A modular deep learning framework for multimodal AI research and applications from Facebook AI Research (FAIR).

5.6K
Active
Python
LLM Frameworks
Computer Vision
PyTorch
#deep-learning#multimodal#captioning

karpathy/neuraltalk2

Efficient image captioning code in Torch that runs on GPU for vibe coders working with AI tools.

5.6K
Archived
Jupyter Notebook
Computer Vision
LLM Frameworks
Torch
#image-captioning#torch#gpu

ashnkumar/sketch-code

A Keras model that generates HTML code from hand-drawn website mockups using an image captioning architecture.

5.2K
Archived
Python
Computer Vision
Component Libraries (React)
Keras
#image-processing#deep-learning#code-generation

r0oth3x49/udemy-dl

A Python-based utility to download courses from Udemy for personal offline use across multiple platforms.

4.9K
Archived
Python
CLI Tools
Backend Frameworks
Python
#udemy#download#cross-platform

OpenGVLab/Ask-Anything

An open-source project that enables developers to build chatbots with video understanding using large language models.

3.3K
Archived
Python
React
#chatbots#video-understanding#large-language-models

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K
Archived
Python
LLM Frameworks
Agents & Orchestration
React
#chatgpt#draggan#imagebind

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

A PyTorch tutorial for building an image captioning model using the Show, Attend, and Tell technique.

2.9K
Archived
Python
Computer Vision
Tutorials & Courses
PyTorch
#image-captioning#attention-mechanism#encoder-decoder

1c7/Translate-Subtitle-File

Translate subtitle files (.srt, .ass, .vtt) with customizable API keys for affordable pricing.

2.6K
Stable
Component Libraries (Vue/Svelte)
CLI Tools
Electron
#subtitle#translation#srt

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#pretrained-models#multimodal#vision-language

stephengpope/no-code-architects-toolkit

A free API toolkit for businesses, creators, and developers to streamline advanced media processing, including video editing, image transformations, and Python code execution.

2.2K
Active
Python
File Storage
CMS & Content
Python
#media-processing#video-editing#image-transformation

krzemienski/awesome-video

A curated list of awesome streaming video tools, frameworks, libraries, and learning resources.

1.8K
Experimental
HTML
Video Streaming
#video#streaming#ffmpeg

Stay in the loop

Get weekly updates on trending AI coding tools and projects.