Datasets

Open datasets and data collections

Showing 41-59 of 59 projects

jbrownlee/Datasets

A collection of machine learning datasets used in tutorials on MachineLearningMastery.com.

1.2K
Archived
Datasets
#machine-learning#datasets#tutorials

hikariming/chat-dataset-baseline

A repository with a Chinese dialogue dataset and fine-tuning code for the ChatGLM language model.

1.2K
Experimental
Jupyter Notebook
LLM Frameworks
Datasets
#chatglm#language-model#fine-tuning

RUCAIBox/RecSysDatasets

A repository of public data sources for building and testing recommender systems.

1.2K
Archived
Python
Datasets
#dataset#recommender-system#recommendation-datasets

datasets/covid-19

This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.

1.2K
Archived
Python
Datasets
#coronavirus#covid-19#data-package

dxli94/WLASL

A dataset and methods for word-level sign language recognition from video, useful for developers building sign language applications.

1.1K
Archived
Python
Computer Vision
Datasets
Python
#sign-language#sign-language-recognition#computer-vision

yaodongC/awesome-instruction-dataset

A curated collection of open-source datasets for training instruction-following large language models (LLMs) like ChatGPT and LLaMA.

1.1K
Archived
LLM Frameworks
Datasets
#instruction-following#llm#chatgpt

midas-research/audino

Open-source audio annotation tool for machine learning and speech processing datasets.

1.1K
Stable
TypeScript
Speech Processing
Datasets
TypeScript
#audio-annotation#speech-processing#machine-learning

mlmed/torchxrayvision

A library of chest X-ray datasets and models for medical AI/ML applications.

1.1K
Stable
Jupyter Notebook
Computer Vision
Datasets
PyTorch
#chest-radiographs#chest-xray#medical-imaging

cvdfoundation/open-images-dataset

Open Images is a large dataset of annotated images for computer vision and machine learning research.

1.1K
Archived
Computer Vision
Datasets
#computer-vision#machine-learning#dataset

xid32/SoundMind

A dataset and reinforcement learning algorithm for endowing audio language models with bimodal reasoning abilities.

1.1K
Stable
Python
LLM Frameworks
Datasets
Python
#audio-language-model#audio-reasoning#dataset

google-research-datasets/wit

A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.

1.1K
Archived
Multimodal
NLP
#machine-learning#nlp#multimodal

shaypal5/awesome-twitter-data

A curated list of Twitter datasets and resources for data scientists and social network analysts.

1.1K
Archived
Datasets
Social Network Analysis
#twitter#social-media#data-science

google-research-datasets/natural-questions

A dataset of real user questions and answers for training and evaluating question answering systems.

1.1K
Archived
Python
LLM Frameworks
Datasets
#dataset#question-answering#natural-language-processing

satellite-image-deep-learning/datasets

A collection of datasets for deep learning with satellite and aerial imagery.

1.1K
Active
Computer Vision
Datasets
#earth-observation#remote-sensing#satellite-data

caserec/Datasets-for-Recommender-Systems

A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.

1.1K
Archived
Jupyter Notebook
Datasets
#data-science#recommender-systems#public-data

liucongg/NLPDataSet

A repository containing various NLP datasets collected and organized by the owner.

1.1K
Archived
Datasets
#nlp#datasets#data-collection

chatopera/insuranceqa-corpus-zh

An open-source corpus dataset for chatbots and question-answering systems in the insurance domain.

1.0K
Experimental
Python
LLM Frameworks
CMS & Content
#chatbot#corpus#dataset

unrealcv/synthetic-computer-vision

A Python library for generating synthetic datasets and tools for computer vision applications.

1.0K
Archived
Python
Computer Vision
Datasets
#computer-vision#dataset#synthetic-data

RuihengZhang/IFSOD-dataset

An infrared object detection dataset and benchmark for few-shot learning.

1.0K
Experimental
Computer Vision
Datasets
#computer-vision#object-detection#few-shot-learning

Stay in the loop

Get weekly updates on trending AI coding tools and projects.