NVlabs/describe-anything

An implementation for detailed localized image and video captioning using large multimodal models.

Python
AI & Machine Learning
Computer Vision
Apache-2.0

1.5K

Stars

88

Forks

Apr 4, 2025

Created

Jun 26, 2025

Last Updated

Project Analytics

Stars Growth (1 Month)

+6

+0.4% change

Avg Daily Growth (1 Month)

+0.2

stars per day

Fork/Star Ratio (All Time)

6.1%

Normal engagement

Lifetime Growth

4.3

stars/day over 337 days

Stars Over Time

Forks Over Time

Open Issues Over Time

Pull Requests Over Time

Commits Over Time

AI-Generated Tags

describe-anything
detailed-localized-captioning
large-multimodal-models
vision-language-model
image-captioning
video-captioning

Comments (0)

Sign in to leave a comment or vote

Sign In

No comments yet. Be the first to comment!

Stay in the loop

Get weekly updates on trending AI coding tools and projects.