Showing 1-7 of 7 projects
CLIP is a neural network for zero-shot image-text matching and understanding
The I-JEPA codebase provides a self-supervised learning architecture for joint image-text embedding.
A powerful vision-language pre-training method for tasks like image-text retrieval and captioning.
A large-scale image-text dataset for training AI models, primarily focused on visual AI and multimodal AI tasks.
A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.
An AI-powered image captioning and image-text search platform for developers building with AI tools.
An image-text multimodal deep learning model for object detection and recognition.
Get weekly updates on trending AI coding tools and projects.