Showing 1-10 of 10 projects
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
An open-source AI agent platform for financial analysis using large language models (LLMs)
An official implementation of a time series forecasting model using large language models.
A curated list of resources on text-to-image generation and synthesis, useful for AI-focused developers.
Implementation of a 1-bit Transformer model for large language models in PyTorch.
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
A curation of the latest CVPR (Computer Vision and Pattern Recognition) papers, code, and demos for AI-powered developers.
A flexible package for multimodal deep learning to combine tabular, text, and image data using Wide and Deep models in PyTorch.
A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.
A curated list of research papers on visual grounding, a key technique for multimodal AI.
Get weekly updates on trending AI coding tools and projects.