Explore Projects

Discover 39 open source projects

Active filters (1):
Search: data-pipelinesร—
Clear all

Showing 21-39 of 39 projects

OpenDCAI/DataFlow

LLMs-based Operators and Pipelines for data prep

2.9K
Active
Python
AI Coding Tools
Gradio
#data-science#data-agent#data-cleaning

whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines.

2.8K
Archived
Jupyter Notebook
React
#data-pipeline#machine-learning#open-source

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

2.4K
Active
Python
ETL & Pipelines
API Frameworks
Python
#data-integration#data-pipelines#etl

reugn/go-streams

A lightweight stream processing library for Go developers that supports various streaming platforms.

2.2K
Active
Go
API Frameworks
ETL & Pipelines
#stream-processing#data-pipeline#kafka

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

1.9K
Active
CSS
ETL & Pipelines
Databases
#data-engineering#data-modeling#data-pipelines

feldera/feldera

The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.

1.8K
Active
Rust
Databases
ETL & Pipelines
#data-analytics#data-pipelines#incremental-computation

yobix-ai/extractous

Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.

1.7K
Archived
Rust
ETL & Pipelines
ETL & Pipelines
Rust
#data-extraction#unstructured-data#etl

bytedance/bitsail

Distributed high-performance data integration engine for batch, streaming, and incremental scenarios.

1.7K
Archived
Java
Flink
#authentication#streaming#real-time

Multiwoven/multiwoven

Open-source reverse ETL tool for data activation and customer data platform integration.

1.6K
Active
Ruby
API Frameworks
ETL & Pipelines
React
#data-activation#customer-data-platform#reverse-etl

combust/mleap

MLeap is a library for deploying machine learning pipelines to production using Scala, Python, and Spark.

1.5K
Active
Scala
ML Ops
API Frameworks
Scala
#machine-learning#pipeline#production

pyper-dev/pyper

Concurrent Python made simple, with support for asyncio, multiprocessing, and threading.

1.5K
Experimental
Python
API Frameworks
CLI Tools
Python
#asyncio#concurrency#multiprocessing

superlinked/superlinked

Superlinked is a Python framework for building high-performance search & recommendation apps with structured and unstructured data.

1.5K
Stable
Jupyter Notebook
LLM Frameworks
RAG & Vector
Python
#data-pipeline#embeddings#information-retrieval

bruin-data/bruin

A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.

1.4K
Active
Go
ETL & Pipelines
API Frameworks
Go
#data-pipelines#data-ingestion#data-transformation

GoogleCloudPlatform/data-science-on-gcp

A repository providing data science tools and examples for the Google Cloud Platform.

1.4K
Stable
Jupyter Notebook
React
#data-science#cloud-computing#google-cloud

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1.4K
Active
Jupyter Notebook
MLOps
CLI Tools
Python
#mlops#data-pipelines#automation

damklis/DataEngineeringProject

An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.

1.4K
Archived
Python
ETL & Pipelines
API Frameworks
Django
#data-engineering#data-pipeline#etl

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

1.4K
Active
Java
Data Discovery
Data Observability
#data-catalog#data-engineering#data-governance

amphi-ai/amphi-etl

A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.

1.4K
Active
TypeScript
ETL & Pipelines
Data Analysis
TypeScript
#data-analysis#data-pipelines#data-transformation

datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

1.3K
Active
Go
ETL & Pipelines
Realtime
#cdc#data-pipeline#elt
1

Stay in the loop

Get weekly updates on trending AI coding tools and projects.