Showing 61-80 of 85 projects
Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.
CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.
A Java-based tool for monitoring and analyzing the performance of MySQL databases.
An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.
An Avro serialization library for JavaScript and TypeScript, used for efficient binary data encoding and schema evolution.
Seamless integration of Scikit-learn with Intellex for AI inference and machine learning applications
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
Open-source massively parallel processing (MPP) database, an alternative to Greenplum.
A Java library for automatically detecting anomalies in large-scale time-series data.
Automated Deep Learning without any human intervention, the first solution for the AutoDL challenge@NeurIPS.
This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.
LakeSail is a Rust-based computation framework that unifies batch processing, stream processing, and AI workloads.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
Hazelcast Jet is a distributed stream and batch processing engine for high-performance applications.
A curated collection of books and resources for various programming languages and technologies, including AI, data, and web development.
TrailDB is an efficient database for storing and querying series of events.
Get weekly updates on trending AI coding tools and projects.