Showing 101-120 of 178 projects
Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.
Magma is a foundation model for building multimodal AI agents, enabling next-gen AI applications.
Implementation of a 1-bit Transformer model for large language models in PyTorch.
A powerful multimodal transformer for combining language, vision, and other modalities in AI applications.
Comprehensive resources for developers working with Generative AI, including projects, use cases, and interview prep.
A powerful text-to-image diffusion model that can be used for recaptioning, planning, and generating with multimodal LLMs.
The Alan AI SDK for Android provides a conversational AI platform for building voice assistants and chatbots.
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
A research project from Facebook that explores multimodal AI models for computer vision and language tasks.
A survey paper that proposes a taxonomy of Retrieval-Augmented Generation (RAG) techniques for AI-Generated Content (AIGC).
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, with outstanding singing lyrics recognition.
4M: Massively Multimodal Masked Modeling, a Python library for large-scale multimodal language models
The Alan AI SDK for Flutter enables building conversational AI-powered apps and voice interfaces.
An open-source library for building generative multimodal AI models, with a focus on foundation models, in-context learning, and multimodal pretraining.
A Python library for generating TikZ graphics programs to create scientific figures and sketches with AI-powered tools.
A PyTorch library for training state-of-the-art multimodal multi-task models at scale.
An open-source implementation for fine-tuning Qwen-VL series, a multimodal vision-language model by Alibaba Cloud.
A self-coding system for Ionic apps using AI-powered chatbot and voice assistant SDK.
A Python-based multimodal OCR tool for efficient offline processing of LaTeX, ZhEn, and tables on Windows.
A Python-based data infrastructure platform for declarative, incremental multimodal AI workloads.
Get weekly updates on trending AI coding tools and projects.