Showing 41-59 of 59 projects
A collection of machine learning datasets used in tutorials on MachineLearningMastery.com.
A repository with a Chinese dialogue dataset and fine-tuning code for the ChatGLM language model.
A repository of public data sources for building and testing recommender systems.
This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.
A dataset and methods for word-level sign language recognition from video, useful for developers building sign language applications.
A curated collection of open-source datasets for training instruction-following large language models (LLMs) like ChatGPT and LLaMA.
Open-source audio annotation tool for machine learning and speech processing datasets.
A library of chest X-ray datasets and models for medical AI/ML applications.
Open Images is a large dataset of annotated images for computer vision and machine learning research.
A dataset and reinforcement learning algorithm for endowing audio language models with bimodal reasoning abilities.
A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
A dataset of real user questions and answers for training and evaluating question answering systems.
A collection of datasets for deep learning with satellite and aerial imagery.
A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.
A repository containing various NLP datasets collected and organized by the owner.
An open-source corpus dataset for chatbots and question-answering systems in the insurance domain.
A Python library for generating synthetic datasets and tools for computer vision applications.
An infrared object detection dataset and benchmark for few-shot learning.
Get weekly updates on trending AI coding tools and projects.