Showing 61-80 of 178 projects
A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
An open-source cookbook for getting started with Phi, a family of high-performance small language models from Microsoft.
Code and models for a multimodal large language model that can perform any-to-any tasks
A comprehensive document search and storage platform for building AI applications using Python.
InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.
An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.
MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.
Foundation Architecture for (M)LLMs, a powerful toolkit for building large language models.
A high-quality end-to-end speech interaction model for AI-powered voice applications.
A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.
A Python framework for efficient model inference with omni-modality AI models.
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
Native multimodal model for high-quality image generation with text-to-image capabilities
Multimodal conversational video generation powered by AI, enabling new vibe-coder collaboration experiences.
An extensible, high-performance columnar file format for data storage and processing.
Easily compute CLIP embeddings and build a CLIP-based retrieval system with this Jupyter Notebook library.
Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)
A custom component for controlling Xiaomi Multimode Gateway and Aqara Hub devices on Home Assistant.
MUNIT is a deep learning-based method for multimodal unsupervised image-to-image translation, enabling vibe coders to create diverse and stylized images.
Get weekly updates on trending AI coding tools and projects.