Showing 201-220 of 310 projects
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
AWS Glue code samples for building data integration and ETL pipelines on AWS.
MLeap is a library for deploying machine learning pipelines to production using Scala, Python, and Spark.
This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.
Tool for generating high-quality synthetic datasets
A Python library that helps developers extract structured data from tricky documents using vision-language models.
A Python-based scraper and converter for event websites to the Open Event format.
A Python library for simulating the performance of photovoltaic energy systems.
Superlinked is a Python framework for building high-performance search & recommendation apps with structured and unstructured data.
An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
A data workflow tool for data engineers and analysts, similar to 'Make for data'.
A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.
Transporter is a powerful ETL tool that allows developers to sync data between various persistence engines.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
Scalable data pre processing and curation toolkit for Large Language Models (LLMs)
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.
Declaratively construct Apache Airflow DAGs with YAML configuration files, simplifying complex data pipeline management.
ScanTailor Advanced is a C++ library for processing scanned documents, including binarization, book scanning, and digitalization.
Get weekly updates on trending AI coding tools and projects.