Explore Projects

Discover 20 open source projects

Active filters (1):
Search: data-qualityร—
Clear all

Showing 1-20 of 20 projects

GokuMohandas/Made-With-ML

Learn to build production-grade ML applications with code and best practices

46.6K
Archived
Jupyter Notebook
ML Ops
Tutorials & Courses
Jupyter Notebook
#machine-learning#mlops#data-science

eugeneyan/applied-ml

Curated resources for data science and machine learning in production

28.7K
Archived
ML Ops
Awesome Lists
#machine-learning#data-science#ml-ops

Data-Centric-AI-Community/ydata-profiling

A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.

13.4K
Active
Python
Data Profiling
Python
#data-profiling#exploratory-data-analysis#data-quality

cleanlab/cleanlab

An open-source library for data-centric AI with tools for data quality and machine learning on messy, real-world data.

11.4K
Active
Python
Data Quality
Python
#data-centric-ai#data-quality#data-cleaning

great-expectations/great_expectations

A Python library that helps ensure data quality and reliability through data profiling and testing.

11.2K
Active
Python
ETL & Pipelines
#data-quality#data-testing#data-profiling

voxel51/fiftyone

Refine high-quality datasets and visual AI models with this Python library for active learning and data curation.

10.4K
Active
Python
Computer Vision
Python
#active-learning#data-curation#data-quality

open-metadata/OpenMetadata

A unified metadata platform for data discovery, data observability, and data governance.

8.8K
Active
TypeScript
Data Catalog
Data Governance
TypeScript
#data-discovery#data-lineage#data-quality

evidentlyai/evidently

Evidently is an open-source ML and LLM observability framework to evaluate, test, and monitor AI-powered systems.

7.3K
Active
Jupyter Notebook
MLOps
Data Validation
Jupyter Notebook
#data-quality#data-validation#model-monitoring

feast-dev/feast

An open-source feature store for AI/ML applications

6.8K
Active
Python
React
#feature-store#open-source#AI/ML

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

datafold/data-diff

A Python library for comparing data across databases, supporting various database engines.

3.0K
Archived
Python
Databases
ETL & Pipelines
#data-diffing#data-quality#data-engineering

whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines.

2.8K
Archived
Jupyter Notebook
React
#data-pipeline#machine-learning#open-source

featureform/featureform

The Virtual Feature Store that turns existing data infrastructure into a feature store for machine learning.

2.0K
Experimental
Go
Feature Engineering
Vector Databases
Go
#data-quality#data-science#embeddings

feathr-ai/feathr

Feathr is a scalable, unified data and AI engineering platform for enterprises, with features like feature engineering, feature governance, and a feature marketplace.

1.9K
Archived
Scala
Feature Flags
MLOps
Apache Spark
#data-engineering#feature-engineering#feature-governance

re-data/re-data

A data quality and observability tool for monitoring and fixing data issues before they become problems.

1.6K
Archived
HTML
ETL & Pipelines
CLI Tools
dbt
#data-quality#data-observability#data-monitoring

NVIDIA-NeMo/Curator

Scalable data pre processing and curation toolkit for Large Language Models (LLMs)

1.4K
Active
Python
Python
#data-curation#large-language-models#data-preparation

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

1.4K
Active
Java
Data Discovery
Data Observability
#data-catalog#data-engineering#data-governance

cleanlab/cleanvision

Automatically find issues in image datasets and practice data-centric computer vision.

1.2K
Active
Python
Computer Vision
Data Exploration
Python
#computer-vision#data-quality#data-profiling

daochenzha/data-centric-AI

A curated list of resources for data-centric AI development, including tools, frameworks, and best practices.

1.1K
Archived
LLM Frameworks
Databases
#data-centric-ai#machine-learning#data-science

rstudio/pointblank

Data quality assessment and reporting tool for data frames and database tables in R

1.0K
Active
R
Data Validation
Testing
#data-quality#data-validation#data-testing

Stay in the loop

Get weekly updates on trending AI coding tools and projects.