Showing 21-40 of 59 projects
This dataset generates mathematical questions and answers for school-level difficulty, useful for AI/ML research.
A curated list of cybersecurity datasets for security researchers and machine learning practitioners.
A benchmark for evaluating language understanding models and datasets for the Chinese language.
Open-source dataset for training environmental sound classification models.
A curated collection of YOLO object detection projects and datasets for developers working with computer vision and AI.
A dataset and pretrained model for detecting safety helmet wearing, useful for computer vision projects.
This repository contains a dataset of Chinese medical dialogues for NLP and conversational AI research.
The Pile is a large, diverse language model training dataset for use in AI research and development.
A cleaned and curated version of the Alpaca dataset from Stanford, useful for machine learning projects.
A comprehensive benchmark for document parsing and evaluation, designed for CVPR 2025.
A large-scale dataset of raw MRI measurements and clinical MRI images for medical imaging research.
Omniglot dataset for one-shot learning experiments in MATLAB
A large and diverse 3D human motion-language dataset for deep learning and motion generation.
A collection of large datasets for training conversational AI models and agents.
A Python library for detecting and analyzing comparisons to ChatGPT in text, with a corpus of human-written comparisons.
A dataset for fake news detection research using Python.
A question answering dataset for building AI-powered language models and conversational agents.
A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.
This GitHub repository contains a dataset for training a raccoon detector using TensorFlow.
A large-scale image-text dataset for training AI models, primarily focused on visual AI and multimodal AI tasks.
Get weekly updates on trending AI coding tools and projects.