Showing 1-20 of 49 projects
This repository is an archived collection of papers and code related to computer vision and machine learning.
A Python library for video inpainting, outpainting, and object removal using propagation and transformer models.
This repository provides a PyTorch implementation for monocular depth estimation from a single image.
LightGlue is a high-performance local feature matching library for computer vision tasks like pose estimation.
Tune-A-Video is a one-shot text-to-video generation tool that fine-tunes image diffusion models.
A powerful text-to-video generation model that can turn prompts into high-quality videos, built for AI-driven developers.
Text recognition (OCR) with deep learning methods, a benchmark for scene text recognition.
Official implementation of a paper on VACE, a video creation and editing tool powered by AI.
An image inpainting model using deep neural networks and attention mechanisms, useful for vibe coders working on AI-powered applications.
A PyTorch implementation of the FCOS one-stage object detection model for computer vision tasks.
A curated collection of ICCV conference papers, code, and interpretations for computer vision developers.
A highly efficient module for temporal modeling in video understanding tasks.
Official implementation of the FastViT research paper, a fast hybrid vision transformer for AI/ML applications.
OminiControl is a minimal and universal control system for diffusion transformer models like DALL-E and Stable Diffusion.
A high-fidelity 3D creation tool from a single image using diffusion models for vibe coders.
ReCamMaster is a novel video generation model that enables camera-controlled generative rendering from a single input video.
PyTorch implementation of a unified framework for human motion imitation, appearance transfer, and novel view synthesis.
Grad-CAM is a deep learning visualization technique for interpreting and explaining CNN-based models.
A PyTorch-based framework for reproducible deep learning studies with 26 knowledge distillation methods.
An open-vocabulary video segmentation model that can track any object in a video, for video editing and processing.
Get weekly updates on trending AI coding tools and projects.