Explore Projects

Discover 382 open source projects

Active filters (1):
Search: datasetร—
Clear all

Showing 81-100 of 382 projects

CLUEbenchmark/CLUEDatasetSearch

A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.

4.4K
Archived
Python
Datasets
Tutorials & Courses
Python
#chinese#nlp#datasets

rom1504/img2dataset

Easily convert large sets of image URLs into a dataset for AI/ML training and experimentation.

4.4K
Stable
Python
Computer Vision
Databases
Python
#dataset#image-processing#big-data

openimages/dataset

The Open Images dataset, a large-scale, diverse dataset of images that are annotated with object bounding boxes, visual relationships, and attributes.

4.4K
Archived
Python
Computer Vision
Databases
#dataset#computer-vision#open-source

whoiskatrin/sql-translator

A TypeScript-based tool for converting natural language queries into SQL using AI.

4.3K
Experimental
TypeScript
LLM Wrappers & SDKs
Databases
TypeScript
#data-analysis#data-engineering#dataquery

wainshine/Chinese-Names-Corpus

A Chinese name corpus and generator for natural language processing and entity recognition.

4.3K
Stable
Databases
CLI Tools
#corpus#dataset#names

adaltas/node-csv

A full-featured CSV parser with a simple API and support for large datasets in Node.js.

4.3K
Stable
JavaScript
API Clients & Testing
API Frameworks
Node.js
#csv#parsing#streaming

CLUEbenchmark/CLUE

CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.

4.2K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#chinese#nlp#bert

TommyZihao/Train_Custom_Dataset

A Jupyter Notebook project that helps developers label their own data and train custom AI models.

4.0K
Active
Jupyter Notebook
Fine-tuning
Inference
#machine-learning#data-labeling#custom-model-training

Charmve/Surface-Defect-Detection

A database and paper collection for surface defect research, useful for developers building with AI tools.

4.0K
Archived
Python
Next.js
#surface-defect-detection#deep-learning#image-segmentation

torchgeo/torchgeo

TorchGeo is a Python library for working with geospatial data using PyTorch, providing datasets, samplers, transforms, and pre-trained models.

3.9K
Active
Python
Computer Vision
Databases
PyTorch
#geospatial#computer-vision#deep-learning

chrieke/awesome-satellite-imagery-datasets

List of satellite image training datasets with annotations for computer vision and deep learning

3.9K
Archived
#satellite-image-datasets#computer-vision#deep-learning

rcoh/angle-grinder

A command-line tool for slicing and dicing log data in Rust, useful for developers who work with large datasets.

3.7K
Stable
Rust
CLI Tools
API Frameworks
#logging#analytics#cli

Belval/TextRecognitionDataGenerator

A synthetic data generator for text recognition, useful for training AI-powered text detection and OCR models.

3.6K
Archived
Python
Computer Vision
Python
#text-recognition#ocr#synthetic-data

Cryakl/Ultimate-RAT-Collection

This is a collection of classic and modern trojan builders, not a developer tool for AI-powered coding.

3.6K
Active
Security Research
Penetration Testing
#backdoor-attacks#backdoors#malware

awslabs/deequ

Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.

3.6K
Active
Scala
ETL & Pipelines
Testing
Spark
#data-quality#unit-testing#apache-spark

pytorch/text

A PyTorch-powered library for loading and processing text data for natural language processing tasks.

3.6K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#nlp#data-loader#deep-learning

jdorfman/awesome-json-datasets

A curated list of awesome JSON datasets that don't require authentication.

3.6K
Archived
JavaScript
Databases
Caching
#json#datasets#data

guillaume-chevalier/LSTM-Human-Activity-Recognition

A TensorFlow-based example of human activity recognition using an LSTM RNN on smartphone sensor data.

3.5K
Archived
Jupyter Notebook
Computer Vision
Tutorials & Courses
TensorFlow
#activity-recognition#deep-learning#lstm

Docta-ai/docta

A Python library that helps diagnose and curate datasets for data-centric AI applications.

3.5K
Archived
Python
LLM Frameworks
Caching
#data-curation#data-diagnosis#language-model

linhandev/dataset

A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.

3.5K
Archived
Databases
Tutorials & Courses
#medical-imaging#ct#mri
1...46...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.