Explore Projects

Discover 65 open source projects

Active filters (1):
Search: data-engineeringร—
Clear all

Showing 21-40 of 65 projects

cocoindex-io/cocoindex

Data transformation framework for AI with ultra-fast, incremental processing capabilities.

6.3K
Active
Rust
LLM Frameworks
ETL & Pipelines
Rust
#ai#data-engineering#data-transformation

evidence-dev/evidence

A business intelligence platform that allows developers to build interactive data visualizations in SQL and Markdown.

6.0K
Stable
JavaScript
Charts & Visualization
Databases
Svelte
#analytics#business-intelligence#dashboard

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads, processing images, audio, video, and structured data at scale.

5.3K
Active
Rust
ML Ops
ETL & Pipelines
Rust
#ai-engineering#data-engineering#distributed

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

dlt-hub/dlt

An open-source Python library that simplifies the process of loading data into data lakes and warehouses.

5.0K
Active
Python
ETL & Pipelines
CLI Tools
Python
#data-engineering#data-loading#data-pipelines

rudderlabs/rudder-server

Rudder Server is a privacy-focused, Segment-alternative customer data platform written in Go and React.

4.4K
Active
Go
Customer Data Platform
ETL & Pipelines
React
#customer-data-platform#customer-data-pipeline#data-integration

whoiskatrin/sql-translator

A TypeScript-based tool for converting natural language queries into SQL using AI.

4.3K
Experimental
TypeScript
LLM Wrappers & SDKs
Databases
TypeScript
#data-analysis#data-engineering#dataquery

adilkhash/Data-Engineering-HowTo

A list of resources to learn Data Engineering from scratch

4.0K
Archived
React
#data-engineering#data-pipeline#distributed-systems

quadratichq/quadratic

A spreadsheet tool with AI capabilities for data analysis, engineering, and visualization.

4.0K
Active
Rust
LLM Frameworks
ETL & Pipelines
Rust
#ai#data-analysis#data-engineering

ruc-datalab/DeepAnalyze

DeepAnalyze is an agentic LLM for autonomous data science, automating data analysis and report generation.

3.8K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#data-analysis#agentic-ai#llm

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

3.7K
Active
Java
ETL & Pipelines
Background Jobs
Java
#data-engineering#batch-processing#workflow-orchestration

hemansnation/AI-Engineer-Headquarters

A comprehensive collection of AI-powered tools, techniques, and resources for building advanced data-driven applications.

3.7K
Stable
Jupyter Notebook
LLM Frameworks
Data Science
Python
#data-science#machine-learning#deep-learning

ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

3.6K
Experimental
Python
ETL & Pipelines
ML Ops
Python
#data-engineering#data-science#pipelines

dathere/qsv

Blazing-fast data wrangling toolkit for AI and data engineering workflows

3.5K
Active
Rust
ETL & Pipelines
Databases
#data-engineering#data-wrangling#etl

superstreamlabs/memphis

Memphis.dev is a highly scalable and effortless data streaming platform

3.4K
Archived
Go
Go
#streaming#data-engineering#golang

gunnarmorling/awesome-opensource-data-engineering

An Awesome List of open-source data engineering projects for developers.

3.0K
Archived
ETL & Pipelines
CLI Tools
#data-engineering#data-pipeline#etl

datafold/data-diff

A Python library for comparing data across databases, supporting various database engines.

3.0K
Archived
Python
Databases
ETL & Pipelines
#data-diffing#data-quality#data-engineering

apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

2.9K
Active
Go
ETL & Pipelines
CLI Tools
Go
#devops#data-analysis#data-engineering

decodingai-magazine/second-brain-ai-assistant-course

Learn to build a Second Brain AI assistant with LLMs, agents, and fine-tuning techniques.

2.6K
Experimental
Jupyter Notebook
React
#authentication#fine-tuning#LLMs

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

2.4K
Active
Jupyter Notebook
ETL & Pipelines
MLOps
Python
#etl#data-engineering#data-science

Stay in the loop

Get weekly updates on trending AI coding tools and projects.