Explore Projects

Discover 65 open source projects

Active filters (1):
Search: data-engineeringร—
Clear all

Showing 41-60 of 65 projects

metarank/metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations.

2.4K
Stable
Scala
ML Ops
API Frameworks
Scala
#automl#personalization#ranking

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

2.4K
Active
Python
ETL & Pipelines
API Frameworks
Python
#data-integration#data-pipelines#etl

bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

2.0K
Experimental
Python
ETL & Pipelines
API Frameworks
Python
#streaming#data-engineering#data-processing

feathr-ai/feathr

Feathr is a scalable, unified data and AI engineering platform for enterprises, with features like feature engineering, feature governance, and a feature marketplace.

1.9K
Archived
Scala
Feature Flags
MLOps
Apache Spark
#data-engineering#feature-engineering#feature-governance

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

1.9K
Active
CSS
ETL & Pipelines
Databases
#data-engineering#data-modeling#data-pipelines

san089/Udacity-Data-Engineering-Projects

A collection of Udacity data engineering projects showcasing various tools and technologies.

1.8K
Archived
Python
Airflow
#data-engineering#cloud#infrastructure

mlrun/mlrun

MLRun is an open-source MLOps platform for building and managing continuous ML applications.

1.7K
Active
Python
MLOps
API Frameworks
Python
#machine-learning#data-engineering#workflow

Hiflylabs/awesome-dbt

A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.

1.6K
Active
ETL & Pipelines
#analytics-engineering#data-engineering#dbt

Multiwoven/multiwoven

Open-source reverse ETL tool for data activation and customer data platform integration.

1.6K
Active
Ruby
API Frameworks
ETL & Pipelines
React
#data-activation#customer-data-platform#reverse-etl

kantord/just-dashboard

A framework-agnostic dashboard library that allows creating dashboards using YAML or JSON files.

1.6K
Archived
JavaScript
Charts & Visualization
CLI Tools
React
#dashboard#data-visualization#yaml

OBenner/data-engineering-interview-questions

This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.

1.5K
Active
Python
Interview Prep
ETL & Pipelines
#data-engineering#interview-questions#interview-prep

pyper-dev/pyper

Concurrent Python made simple, with support for asyncio, multiprocessing, and threading.

1.5K
Experimental
Python
API Frameworks
CLI Tools
Python
#asyncio#concurrency#multiprocessing

san089/goodreads_etl_pipeline

An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.

1.5K
Archived
Python
ETL & Pipelines
Background Jobs
Apache Airflow
#data-engineering#etl-pipeline#data-lake

pyjanitor-devs/pyjanitor

A Python library for cleaning and transforming data, inspired by the R package Janitor.

1.5K
Active
Python
ETL & Pipelines
CLI Tools
#cleaning-data#data-transformation#pandas-extension

GoogleCloudPlatform/data-science-on-gcp

A repository providing data science tools and examples for the Google Cloud Platform.

1.4K
Stable
Jupyter Notebook
React
#data-science#cloud-computing#google-cloud

damklis/DataEngineeringProject

An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.

1.4K
Archived
Python
ETL & Pipelines
API Frameworks
Django
#data-engineering#data-pipeline#etl

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

1.4K
Active
Java
Data Discovery
Data Observability
#data-catalog#data-engineering#data-governance

quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data, built with TypeScript.

1.4K
Active
TypeScript
ETL & Pipelines
Data Versioning
TypeScript
#data-engineering#data-versioning#data-pipeline

Data-Learn/data-engineering

A comprehensive resource for developers to learn and get started with data engineering using Python.

1.3K
Experimental
Python
ETL & Pipelines
Tutorials & Courses
Python
#data-engineering#python#tutorials

kdeldycke/awesome-billing

A curated list of resources for billing and payments knowledge for cloud platforms

1.2K
Active
Payments & Billing
Caching
#billing#payments#cloud

Stay in the loop

Get weekly updates on trending AI coding tools and projects.