Explore Projects

Discover 382 open source projects

Active filters (1):
Search: datasetร—
Clear all

Showing 341-360 of 382 projects

Oxen-AI/Oxen

A fast data versioning system for ML datasets, making it easy to version and track changes like code.

1.1K
Active
Rust
Data & Databases
Version Control
Rust
#data-versioning#machine-learning#version-control

qri-io/qri

An open-source platform for building and sharing datasets, focused on trust, privacy, and decentralization.

1.1K
Archived
Go
Databases
CLI Tools
#dataset#ipfs#p2p

vietnh1009/QuickDraw

Implementation of the QuickDraw game, a computer vision and deep learning project focused on image classification.

1.1K
Archived
Python
Computer Vision
Backend Frameworks
PyTorch
#computer-vision#image-classification#deep-learning

engcang/SLAM-application

A comprehensive collection of SLAM (Simultaneous Localization and Mapping) applications and comparisons for robotics and lidar-based navigation.

1.1K
Archived
C++
Robotics
ROS
#lidar#slam#lidar-odometry

firecrawl/fire-enrich

AI-powered data enrichment tool that transforms emails into rich datasets with company profiles, funding data, tech stacks, and more.

1.1K
Stable
TypeScript
Agents & Orchestration
Data & Databases
TypeScript
#data-enrichment#email-processing#company-profiles

spytensor/prepare_detection_dataset

This Python project provides utilities to convert datasets to COCO and VOC formats for object detection tasks.

1.1K
Archived
Python
Computer Vision
CLI Tools
Python
#object-detection#dataset-conversion#coco

cvdfoundation/open-images-dataset

Open Images is a large dataset of annotated images for computer vision and machine learning research.

1.1K
Archived
Computer Vision
Datasets
#computer-vision#machine-learning#dataset

DoneDeal0/superdiff

Superdiff is a high-performance, zero-dependency library for efficiently comparing and diffing arrays and objects.

1.1K
Active
TypeScript
API Frameworks
Validation
React
#array-comparison#object-comparison#object-diff

xid32/SoundMind

A dataset and reinforcement learning algorithm for endowing audio language models with bimodal reasoning abilities.

1.1K
Stable
Python
LLM Frameworks
Datasets
Python
#audio-language-model#audio-reasoning#dataset

Lyken17/Efficient-PyTorch

Efficient PyTorch practices for training large datasets

1.1K
Archived
Python
ML Ops
CLI Tools
PyTorch
#machine-learning#deep-learning#training

kazuto1011/deeplab-pytorch

PyTorch re-implementation of DeepLab v2 for semantic segmentation on COCO-Stuff and PASCAL VOC datasets.

1.1K
Archived
Python
Computer Vision
PyTorch
#semantic-segmentation#deep-learning#computer-vision

google-research-datasets/wit

A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.

1.1K
Archived
Multimodal
NLP
#machine-learning#nlp#multimodal

shaypal5/awesome-twitter-data

A curated list of Twitter datasets and resources for data scientists and social network analysts.

1.1K
Archived
Datasets
Social Network Analysis
#twitter#social-media#data-science

datadreamer-dev/DataDreamer

DataDreamer is a Python library for generating synthetic data, fine-tuning and aligning large language models.

1.1K
Experimental
Python
LLM Frameworks
Fine-tuning
PyTorch
#llms#gpt#instruction-tuning

catboost/tutorials

CatBoost tutorials repository providing hands-on examples and guides for the open-source machine learning library.

1.1K
Active
Jupyter Notebook
Tutorials & Courses
Machine Learning
Jupyter Notebook
#machine-learning#data-science#tutorials

samapriya/awesome-gee-community-datasets

A community-driven catalog of geospatial datasets for use with Google Earth Engine.

1.1K
Active
HTML
Databases
CLI Tools
#geospatial#gis#earth-engine

patrickhulce/third-party-web

A comprehensive dataset on third-party entities and their impact on the web, useful for web performance analysis.

1.1K
Stable
JavaScript
Backend & APIs
CLI Tools
JavaScript
#web-performance#http-archive#javascript

google-research-datasets/natural-questions

A dataset of real user questions and answers for training and evaluating question answering systems.

1.1K
Archived
Python
LLM Frameworks
Datasets
#dataset#question-answering#natural-language-processing

satellite-image-deep-learning/datasets

A collection of datasets for deep learning with satellite and aerial imagery.

1.1K
Active
Computer Vision
Datasets
#earth-observation#remote-sensing#satellite-data

caserec/Datasets-for-Recommender-Systems

A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.

1.1K
Archived
Jupyter Notebook
Datasets
#data-science#recommender-systems#public-data
1...171920

Stay in the loop

Get weekly updates on trending AI coding tools and projects.