Showing 1-4 of 4 projects
Open source implementation of CLIP, a contrastive learning model for multi-modal tasks like zero-shot classification.
Chinese version of CLIP for cross-modal retrieval and representation generation
Macaw-LLM is a multi-modal language modeling framework that integrates image, video, audio, and text data.
Prismer: A Vision-Language Model with Multi-Task Experts for image-captioning and vision-language-model applications.
Get weekly updates on trending AI coding tools and projects.