Explore Projects

Discover 382 open source projects

Active filters (1):
Search: dataset×
Clear all

Showing 1-20 of 382 projects

public-apis/public-apis

A collective list of free APIs for developers, including API clients and testing tools.

404.5K
Stable
Python
API Clients & Testing
#api#apis#dataset

fighting41love/funNLP

Comprehensive Chinese NLP resource collection for developers

79.2K
Archived
Python
LLM Frameworks
RAG & Vector
Python
#nlp#chinese-nlp#ai-resources

awesomedata/awesome-public-datasets

Curated list of high-quality public datasets across various domains.

73.3K
Active
Awesome Lists
#datasets#opendata#awesome

microsoft/qlib

AI-powered quantitative investment platform for finance and trading

38.2K
Active
Python
Inference
SaaS Boilerplates
Python
#quantitative-investment#algorithmic-trading#machine-learning

HumanSignal/label-studio

Open-source data labeling tool for AI/ML projects

26.6K
Active
TypeScript
Computer Vision
Testing
#data-labeling#annotation-tool#computer-vision

sebastianruder/NLP-progress

Tracks progress in NLP tasks with datasets and benchmarks

23.0K
Archived
Python
Computer Vision
#nlp#benchmark#datasets

langfuse/langfuse

LLM engineering platform for observability, evaluation, and prompt management

22.7K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
LangChain
#llm-observability#llm-evaluation#prompt-management

huggingface/lerobot

LeRobot provides tools for robotics with PyTorch, including datasets and models for real-world applications.

22.0K
Active
Python
Computer Vision
Robotics
PyTorch
#robotics#pytorch#computer-vision

huggingface/datasets

AI-powered dataset management and preprocessing library for ML projects

21.2K
Active
Python
ML Ops
ETL & Pipelines
HuggingFace
#datasets#ml-ops#data-preprocessing

joke2k/faker

Faker is a Python library that generates fake data for testing and development purposes.

19.2K
Active
Python
Validation
#dataset#fake#fake-data

pytorch/vision

A computer vision library for PyTorch that provides datasets, transforms, and pre-trained models.

17.5K
Active
Python
Computer Vision
PyTorch
#computer-vision#machine-learning#deep-learning

tensorflow/tensor2tensor

A library of deep learning models and datasets to make deep learning more accessible and accelerate ML research.

17.0K
Archived
Python
ML Ops
TensorFlow
#deep-learning#machine-learning#reinforcement-learning

akfamily/akshare

AKShare is a simple and elegant Python library for accessing financial data APIs.

16.8K
Active
Python
Databases
#finance#economic-data#data-analysis

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

16.7K
Active
Java
Databases
#big-data#sql#query

lukas-blecher/LaTeX-OCR

A deep learning model that converts images of mathematical equations into LaTeX code.

16.2K
Archived
Python
Computer Vision
PyTorch
#ocr#latex#math

apache/hadoop

Apache Hadoop is a popular open-source distributed computing framework for processing and storing large datasets.

15.5K
Active
Java
API Frameworks
#distributed-computing#big-data#nosql

cvat-ai/cvat

CVAT is an industry-leading data engine for machine learning, trusted by teams for annotating data at scale.

15.4K
Active
Python
Computer Vision
PyTorch
#annotation#computer-vision#dataset

jindongwang/transferlearning

A comprehensive repository covering papers, codes, datasets, tutorials, and applications for transfer learning, domain adaptation, and more.

14.3K
Experimental
Python
Machine Learning
Python
#transfer-learning#domain-adaptation#domain-generalization

OpenGenus/cosmos

A comprehensive library of algorithms and data structures for developers to explore and contribute to.

13.7K
Archived
C++
CLI Tools
#algorithms#data-structures#open-source

ConardLi/easy-dataset

A powerful JavaScript tool for creating datasets for fine-tuning large language models (LLMs) and retrieval-augmented generation (RAG).

13.5K
Active
JavaScript
LLM Frameworks
JavaScript
#dataset#fine-tuning#llm
2...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.