Showing 81-100 of 178 projects
A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.
An AI-powered tool for training supervised models without manual labeling, using foundation models and multimodal learning.
A benchmark for multimodal AI agents to tackle open-ended tasks in real computer environments.
An awesome curated list of medical-related AI/ML resources including LLMs, datasets, and benchmarks.
Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.
An official implementation of a time series forecasting model using large language models.
A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.
A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.
OCRFlux is a powerful PDF-to-Markdown conversion toolkit with advanced layout handling, table parsing, and cross-page content merging.
A Python-based assistant tool for improving group chat experiences using large language models.
A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.
SDK for interacting with the Stability.AI APIs, including Stable Diffusion inference.
A curated list of resources on text-to-image generation and synthesis, useful for AI-focused developers.
An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.
A modular multimodal large language model for advanced document understanding and analysis.
This is an open-source implementation of a multimodal instruction-based editing and generation model.
A video foundation model and dataset for multimodal understanding and video understanding tasks.
A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.
A prompt learning framework for vision-language models.
A component library and registry built on shadcn/ui to help you build multimodal AI agents faster.
Get weekly updates on trending AI coding tools and projects.