Datasets

Open datasets and data collections

Showing 1-20 of 59 projects

googlecreativelab/quickdraw-dataset

Provides access and documentation for the Quick, Draw! dataset, a large collection of doodles used for machine learning research.

6.7K
Experimental
Computer Vision
Datasets
#dataset#computer-vision#machine-learning

SophonPlus/ChineseNlpCorpus

A repository that collects, organizes, and publishes Chinese natural language processing (NLP) datasets to advance the development of Chinese NLP.

6.5K
Archived
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
#nlp#chinese#natural-language-processing

niderhoff/nlp-datasets

A curated list of free/public domain text datasets for natural language processing (NLP) tasks.

6.0K
Archived
Datasets
#nlp#text-data#public-datasets

togethercomputer/RedPajama-Data

A repository for preparing large datasets for training large language models (LLMs).

4.9K
Archived
Python
LLM Frameworks
Datasets
Python
#language-models#dataset-preparation#cli-tool

weiaicunzai/pytorch-cifar100

A PyTorch repository for practicing image classification on the CIFAR-100 dataset using various deep learning models.

4.8K
Archived
Python
Computer Vision
Datasets
PyTorch
#cifar100#image-classification#deep-learning

CLUEbenchmark/CLUEDatasetSearch

A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.

4.4K
Archived
Python
Datasets
Tutorials & Courses
Python
#chinese#nlp#datasets

CLUEbenchmark/CLUE

CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.

4.2K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#chinese#nlp#bert

pytorch/text

A PyTorch-powered library for loading and processing text data for natural language processing tasks.

3.6K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#nlp#data-loader#deep-learning

waymo-research/waymo-open-dataset

Waymo Open Dataset is a large-scale dataset for autonomous driving research and development.

3.3K
Active
Python
Computer Vision
Datasets
Python
#autonomous-driving#dataset#computer-vision

zhulf0804/3D-PointCloud

A comprehensive collection of papers and datasets for 3D point cloud processing, useful for developers working on autonomous driving and computer vision.

2.9K
Archived
Python
Computer Vision
Datasets
Python
#point-cloud#autonomous-driving#classification

logpai/loghub

A large collection of system log datasets for AI-driven log analytics.

2.6K
Active
Anomaly Detection
Datasets
#log-analysis#log-intelligence#log-parsing

mdeff/fma

A dataset for music analysis and research, with support for deep learning and reproducible research.

2.6K
Archived
Jupyter Notebook
Datasets
ML Ops
#music-analysis#deep-learning#open-data

FreedomIntelligence/Awesome-AI4Med

An awesome curated list of medical-related AI/ML resources including LLMs, datasets, and benchmarks.

2.6K
Active
LLM Frameworks
Datasets
#medical#llms#datasets

detectRecog/CCPD

A diverse and well-annotated dataset for license plate detection and recognition

2.5K
Archived
Python
Computer Vision
Datasets
#ccpd#dataset#detection

GanjinZero/awesome_Chinese_medical_NLP

A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.

2.5K
Archived
Datasets
NLP Frameworks
#nlp#medical#chinese

colour-science/colour

A comprehensive Python library for color science and color space conversions.

2.5K
Active
Python
Datasets
API Frameworks
Python
#color#color-science#color-space

github/CodeSearchNet

CodeSearchNet provides datasets, tools, and benchmarks for representation learning of code, enabling AI-powered code discovery.

2.4K
Archived
Jupyter Notebook
Machine Learning on Source Code
Datasets
Jupyter Notebook
#machine-learning#nlp#data-science

google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

2.4K
Archived
Python
Datasets
Python
#youtube#dataset#video-understanding

google-research-datasets/Objectron

A dataset of annotated 3D object videos for training computer vision and augmented reality models.

2.3K
Archived
Jupyter Notebook
Computer Vision
Datasets
PyTorch
#3d-vision#object-detection#point-cloud

OpenGVLab/InternVideo

A video foundation model and dataset for multimodal understanding and video understanding tasks.

2.2K
Stable
Python
Computer Vision
Datasets
PyTorch
#video-understanding#multimodal#foundation-models

Stay in the loop

Get weekly updates on trending AI coding tools and projects.