Showing 301-310 of 310 projects
A declarative workflow management system for R that enables reproducible research and high-performance computing.
Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.
A collection of open-source Kafka connectors for various data sources and destinations maintained by Lenses.io.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.
A Kotlin library for structured data processing, suitable for data analysis and data science tasks.
A no-code, visual data integration platform for building big data pipelines and workflows.
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
A GitHub language statistics tool that provides insights into programming language usage across GitHub repositories.
A Python library for data migration and transformation in the Blaze project.
Get weekly updates on trending AI coding tools and projects.