Showing 1-12 of 12 projects
Scalable embedding, reasoning, ranking for images and sentences with CLIP
A state-of-the-art open visual language model for multimodal pretraining and applications.
Chinese version of CLIP for cross-modal retrieval and representation generation
A Python library for creating Disco Diffusion artworks using a simple one-line interface.
A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.
An official implementation of a time series forecasting model using large language models.
Cross-modal lip reading using 3D convolutional neural networks for speech recognition.
A comprehensive collection of research on knowledge graphs, covering various applications and techniques.
Phantom is a subject-consistent video generation tool that aligns text and video via cross-modal alignment.
Cross-modal image matching framework with large-scale pre-training for AI-powered coding tools.
A general representation model for cross-modal learning across vision, audio, and language.
VideoX is a collection of video cross-modal models for developers working with AI-powered video tools.
Get weekly updates on trending AI coding tools and projects.