Showing 1661-1680 of 2,275 projects
A Jupyter Notebook project for detecting text in natural images using the Connectionist Text Proposal Network (CTPN) algorithm.
Neuroglancer is a WebGL-based viewer for volumetric data, enabling visualization and analysis of 3D biological and scientific data.
A curated collection of resources on applying Transformers to medical imaging tasks like segmentation, classification, and synthesis.
Official repository for NeuMan, a neural human radiance field model from a single video.
A TypeScript library that generates comic panels using a large language model and SDXL, powered by Hugging Face.
A comprehensive list of papers on World Models, a technique for general video generation, embodied AI, and autonomous driving.
An extension for the AUTOMATIC1111 Stable Diffusion web UI that enables creating videos using img2img and ebsynth.
A powerful math formula OCR tool that supports handwritten, Chinese-mixed formulas and simple symbol reasoning.
A high-performance and accurate license plate detection library built using Yolov5 and ncnn.
Real-time photorealistic talking-head animation system built with Python and deep learning.
A 3D-informed video generation model with precise camera control for high-quality, consistent video content.
A Ruby library for creating interactive art and visuals using the Processing language.
A lightweight adapter that bridges the Segment Anything Model (SAM) with medical imaging applications.
Universal instance perception model for object detection, segmentation, and tracking in videos.
A Python library for accelerating inference of video diffusion models using timestep embedding caching.
A simple PyTorch implementation of Generative Adversarial Networks for generating anime-style faces.
A collection of Caffe models and deployment files for popular machine learning networks like classification, detection, and segmentation.
A differentiable renderer for 3D reasoning and reconstruction, useful for AI-driven 3D applications.
A Streamlit app that demonstrates real-time object detection on the Udacity self-driving-car dataset.
VideoLLaMA 2 is a Python library that advances spatial-temporal modeling and audio understanding in video-based large language models.
Get weekly updates on trending AI coding tools and projects.