Explore Projects

Discover 382 open source projects

Active filters (1):
Search: datasetsร—
Clear all

Showing 41-60 of 382 projects

apache/iceberg

Apache Iceberg is an open-source table format for large analytic datasets, providing a versioned and scalable data lake architecture.

8.6K
Active
Java
Databases
API Frameworks
Apache
#data-lake#versioning#scalable

vaexio/vaex

A high-performance Python library for working with large tabular datasets, offering efficient data manipulation and visualization.

8.5K
Stable
Python
Databases
Caching
Python
#bigdata#data-science#dataframe

lmcinnes/umap

UMAP is a dimension reduction library that can be used for visualization, exploration, and analysis of high-dimensional datasets.

8.1K
Active
Python
Dimensionality Reduction
ML Ops
Python
#dimensionality-reduction#machine-learning#topological-data-analysis

AntixK/PyTorch-VAE

A collection of Variational Autoencoders (VAEs) implemented in PyTorch for deep learning research and applications.

7.6K
Experimental
Python
ML Ops
Computer Vision
PyTorch
#variational-autoencoders#deep-learning#computer-vision

openlm-research/open_llama

Open-source reproduction of Meta's LLaMA 7B language model for AI and machine learning research.

7.5K
Archived
LLM Frameworks
LLM Wrappers & SDKs
#language-model#llama#open-source

imaNNeo/fl_chart

A highly customizable Flutter chart library that supports various chart types like line, bar, pie, scatter, and more.

7.5K
Stable
Dart
Charts & Visualization
Flutter
#charts#visualizations#flutter-widget

PAIR-code/facets

Facets is a set of data visualization tools for machine learning datasets, helping developers explore and understand their data.

7.4K
Archived
Jupyter Notebook
Data Visualization
Frontend Frameworks
Jupyter Notebook
#data-visualization#machine-learning#jupyter-notebook

scikit-learn-contrib/imbalanced-learn

A Python package to tackle the curse of imbalanced datasets in machine learning

7.1K
Stable
Python
Python
#machine-learning#imbalanced-datasets#python-package

open-compass/opencompass

OpenCompass is a comprehensive LLM evaluation platform supporting a wide range of models and datasets.

6.7K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#benchmark#chatgpt#evaluation

googlecreativelab/quickdraw-dataset

Provides access and documentation for the Quick, Draw! dataset, a large collection of doodles used for machine learning research.

6.7K
Experimental
Computer Vision
Datasets
#dataset#computer-vision#machine-learning

SophonPlus/ChineseNlpCorpus

A repository that collects, organizes, and publishes Chinese natural language processing (NLP) datasets to advance the development of Chinese NLP.

6.5K
Archived
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
#nlp#chinese#natural-language-processing

liuruoze/EasyPR

An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations.

6.4K
Archived
C++
Computer Vision
API Frameworks
OpenCV
#computer-vision#machine-learning#plate-recognition

cocodataset/cocoapi

COCO API provides a dataset for developers to build with AI tools.

6.4K
Archived
Jupyter Notebook
React
#dataset#AI#COCO API

apache/pinot

Apache Pinot is a realtime distributed OLAP datastore for fast querying of large datasets.

6.0K
Active
Java
Databases
Realtime
#database#olap#realtime

apache/hive

Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.

6.0K
Active
Java
Databases
API Frameworks
#apache#big-data#database

niderhoff/nlp-datasets

A curated list of free/public domain text datasets for natural language processing (NLP) tasks.

6.0K
Archived
Datasets
#nlp#text-data#public-datasets

Fyrd/caniuse

A comprehensive dataset of browser and feature support data from caniuse.com, useful for web developers.

5.8K
Active
JSON
Frontend Frameworks
Databases
#web-development#browser-compatibility#feature-support

PriorLabs/TabPFN

A foundation model for tabular data that enables advanced machine learning on structured datasets.

5.8K
Active
Python
LLM Frameworks
ML Ops
Python
#tabular-data#foundation-model#machine-learning

oarriaga/face_classification

A Python library for real-time face detection and emotion/gender classification using deep learning and OpenCV.

5.7K
Archived
Python
Computer Vision
API Frameworks
#computer-vision#face-detection#emotion-recognition

mdn/browser-compat-data

Browser compatibility data for Web technologies as displayed on MDN

5.6K
Active
JSON
Frontend Frameworks
API Documentation
#compat#compatibility#data
124...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.