Showing 1-6 of 6 projects
LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.
An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.
A Go-based framework for building chatbots and AI-powered assistants using Feishu (Lark) and OpenAI's GPT-4 models.
Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.
An official implementation of a system for improving video understanding and generation with better captions.
An AI agent using GPT-4V(ision) that can interact with web UIs via mouse/keyboard for developer productivity.
Get weekly updates on trending AI coding tools and projects.