Explore Projects

Discover 63 open source projects

Active filters (1):
Search: etlร—
Clear all

Showing 21-40 of 63 projects

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.

3.7K
Active
Python
Agents & Orchestration
ETL & Pipelines
Python
#agents#data-pipelines#document-processing

noflo/noflo

Flow-based programming framework for building complex JavaScript applications and services.

3.5K
Archived
JavaScript
Backend Frameworks
CLI Tools
Node
#flow-based-programming#etl-framework#visual-programming

xyflow/awesome-node-based-uis

A curated list of resources for creating node-based UI editors and visual programming tools.

3.5K
Experimental
Component Libraries (React)
CLI Tools
React
#node-based-ui#visual-programming#workflow-editor

blockchain-etl/ethereum-etl

Python scripts for extracting, transforming and loading Ethereum blockchain data into Google BigQuery.

3.1K
Active
Python
ETL & Pipelines
API Frameworks
#blockchain-analytics#erc20#erc721

PeerDB-io/peerdb

Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage

3.0K
Active
Go
ETL & Pipelines
Realtime
#postgres#data-replication#etl

apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

2.9K
Active
Go
ETL & Pipelines
CLI Tools
Go
#devops#data-analysis#data-engineering

TobikoData/sqlmesh

Scalable and efficient data transformation framework with backwards compatibility for dbt.

2.9K
Active
Python
ETL & Pipelines
Databases
Python
#data-engineering#dataops#dbt

ETLCPP/etl

Header-only C++ library providing STL-like containers & algorithms for embedded systems without dynamic memory.

2.9K
Active
C++
General Utilities
Firmware & Drivers
C++
#embedded-template-library#cpp-containers#stl-alternative

datachain-ai/datachain

Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)

2.7K
Active
Python
Computer Vision
ETL & Pipelines
Python
#data-analytics#data-wrangling#embeddings

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

2.4K
Active
Jupyter Notebook
ETL & Pipelines
MLOps
Python
#etl#data-engineering#data-science

instill-ai/instill-core

Instill Core is an open-source AI infrastructure tool for orchestrating data, models, and pipelines to build AI-powered applications.

2.3K
Active
Python
LLM Frameworks
Agents & Orchestration
Golang
#ai#generative-ai#llm

supabase/etl

A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.

2.2K
Active
Rust
ETL & Pipelines
Realtime
#postgres#replication#cdc

reugn/go-streams

A lightweight stream processing library for Go developers that supports various streaming platforms.

2.2K
Active
Go
API Frameworks
ETL & Pipelines
#stream-processing#data-pipeline#kafka

timeplus-io/proton

Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.

2.2K
Active
C++
ETL & Pipelines
API Frameworks
#sql#etl#stream-processing

superglue-ai/superglue

superglue builds integrations and tools from natural language for long-tail and enterprise systems.

2.0K
Active
TypeScript
AI Agents
MCP Frameworks
TypeScript
#agents#ai#api-gateway

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

1.9K
Active
CSS
ETL & Pipelines
Databases
#data-engineering#data-modeling#data-pipelines

san089/Udacity-Data-Engineering-Projects

A collection of Udacity data engineering projects showcasing various tools and technologies.

1.8K
Archived
Python
Airflow
#data-engineering#cloud#infrastructure

thbar/kiba

A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.

1.8K
Active
Ruby
ETL & Pipelines
API Frameworks
#data#etl#ruby

NVIDIA/aistore

AIStore: A scalable, high-performance, and high-availability storage solution for AI applications and workloads.

1.8K
Active
Go
API Frameworks
Databases
Go
#distributed-storage#object-storage#s3-compatible

yobix-ai/extractous

Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.

1.7K
Archived
Rust
ETL & Pipelines
ETL & Pipelines
Rust
#data-extraction#unstructured-data#etl

Stay in the loop

Get weekly updates on trending AI coding tools and projects.