Showing 21-40 of 53 projects
A Python library that provides a simple interface for using popular NLP models like BERT, GPT-2, XLNet, and T5 for various tasks.
Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.
A large-scale vision-language model for video understanding and generation.
An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.
A Python library that provides superfast AI decision-making and intelligent multi-modal data processing.
A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
A library for single- and multi-modal speaker verification, recognition, and diarization.
Project that provides a reasoning-based segmentation method using large language models.
An open source multi-modal trip planner for developers to build with AI tools.
A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.
An open-source multi-modal AI model based on LLaMA-3.8B for vibe coders to build with AI tools.
A Python library for working with large multi-modal models, focusing on image resolution and text labeling.
A PyTorch-based deep learning framework for 2D/3D medical image segmentation.
MotionGPT is a unified motion-language generation model that can generate human motion using large language models.
A robust, all-in-one GPT interface for Discord with chatbot, image generation, moderation, and more
Efficient retrieval augmentation and generation framework for multi-modal information retrieval and question-answering
A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.
VeOmni is a scalable distributed training framework for multi-modal AI models, focused on vibe coders.
Macaw-LLM is a multi-modal language modeling framework that integrates image, video, audio, and text data.
Get weekly updates on trending AI coding tools and projects.