Showing 121-140 of 140 projects
A Python library that integrates Scikit-learn into the Apache Spark distributed computing framework.
A Spark accelerator for Apache DataFusion, a SQL query engine written in Rust, aimed at vibe coders.
A comprehensive collection of Nagios plugins for monitoring AWS, Hadoop, Cloud, Kafka, and other popular technologies.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.
A big data platform for analyzing e-commerce user behavior using Hadoop, Spark, and Java.
A collection of Scala and Spark usage examples and related resources for developers.
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
Public runnable examples of using John Snow Labs' NLP for Apache Spark, a popular open-source library for natural language processing.
Deprecated Scikit-learn integration package for Apache Spark, useful for machine learning on big data.
CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.
A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.
A time series library for Apache Spark that provides a high-level API for working with time series data.
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Get weekly updates on trending AI coding tools and projects.