ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 141-160 of 310 projects

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

2.4K
Active
Jupyter Notebook
ETL & Pipelines
MLOps
Python
#etl#data-engineering#data-science

Lightning-AI/torchmetrics

Provides machine learning metrics for distributed, scalable PyTorch applications.

2.4K
Active
Python
PyTorch
#machine-learning#metrics#pytorch

malloydata/malloy

Malloy is an open-source language for describing data relationships and transformations.

2.4K
Active
TypeScript
Databases
ETL & Pipelines
TypeScript
#data-modeling#data-transformation#semantic-modeling

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

2.4K
Active
Python
ETL & Pipelines
API Frameworks
Python
#data-integration#data-pipelines#etl

quarylabs/quary

Open-source BI platform for engineers to explore and model large-scale data pipelines.

2.4K
Active
Rust
ORMs & Query Builders
ETL & Pipelines
Rust
#analytics#big-data#data-modeling

rap2hpoutre/fast-excel

A fast and memory-efficient library for importing and exporting Excel files in Laravel applications.

2.3K
Experimental
PHP
API Frameworks
ETL & Pipelines
Laravel
#csv#excel#memory-efficiency

instill-ai/instill-core

Instill Core is an open-source AI infrastructure tool for orchestrating data, models, and pipelines to build AI-powered applications.

2.3K
Active
Python
LLM Frameworks
Agents & Orchestration
Golang
#ai#generative-ai#llm

man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

2.2K
Active
C++
Databases
Caching
Python
#data-analysis#data-science#dataframe

supabase/etl

A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.

2.2K
Active
Rust
ETL & Pipelines
Realtime
#postgres#replication#cdc

tensorflow/tfx

TFX is an end-to-end platform for deploying production ML pipelines.

2.2K
Active
Python
ML Ops
API Frameworks
TensorFlow
#machine-learning#tensorflow#apache-beam

reugn/go-streams

A lightweight stream processing library for Go developers that supports various streaming platforms.

2.2K
Active
Go
API Frameworks
ETL & Pipelines
#stream-processing#data-pipeline#kafka

timeplus-io/proton

Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.

2.2K
Active
C++
ETL & Pipelines
API Frameworks
#sql#etl#stream-processing

minimaxir/facebook-page-post-scraper

A Python scraper for extracting data from Facebook Page posts for statistical analysis.

2.1K
Archived
Python
API Clients & Testing
Backend Frameworks
Python
#facebook#scraper#data-analysis

Jon-Becker/prediction-market-analysis

Framework for collecting and analyzing prediction market data with comprehensive Polymarket/Kalshi datasets.

2.1K
Active
Python
ETL & Pipelines
Example Projects
Python
#prediction-markets#polymarket#kalshi

0xemmkty/QuantMuse

Quantitative trading system with ML analysis, real-time data processing, and risk management

2.0K
Experimental
Python
ML Ops
ETL & Pipelines
Python
#quantitative-trading#algorithmic-trading#machine-learning

moj-analytical-services/splink

Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.

2.0K
Active
Python
Databases
ETL & Pipelines
Python
#data-matching#data-deduplication#entity-resolution

databricks/spark-deep-learning

Deep learning library for Apache Spark that provides high-level APIs and models for building machine learning pipelines.

2.0K
Archived
Python
ML Ops
ETL & Pipelines
Apache Spark
#machine-learning#deep-learning#spark

apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

2.0K
Active
Rust
Databases
ETL & Pipelines
#big-data#dataframe#distributed

shancarter/mr-data-converter

A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.

2.0K
Archived
JavaScript
ETL & Pipelines
CLI Tools
Node.js
#csv#json#xml

bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

2.0K
Experimental
Python
ETL & Pipelines
API Frameworks
Python
#streaming#data-engineering#data-processing
1...79...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.