Explore Projects

Discover 85 open source projects

Active filters (1):
Search: big-dataร—
Clear all

Showing 61-80 of 85 projects

dremio/dremio-oss

Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.

1.5K
Stable
Java
Analytics
Big Data
#analytics#big-data#data-analytics

apache/carbondata

CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.

1.4K
Active
Scala
Databases
API Frameworks
Spark
#big-data#hadoop#spark

yahoo/mysql_perf_analyzer

A Java-based tool for monitoring and analyzing the performance of MySQL databases.

1.4K
Archived
Java
API Frameworks
Databases
#mysql#performance-analysis#monitoring

damklis/DataEngineeringProject

An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.

1.4K
Archived
Python
ETL & Pipelines
API Frameworks
Django
#data-engineering#data-pipeline#etl

mtth/avsc

An Avro serialization library for JavaScript and TypeScript, used for efficient binary data encoding and schema evolution.

1.4K
Experimental
JavaScript
API Clients & Testing
Databases
JavaScript
#avro#serialization#binary-format

uxlfoundation/scikit-learn-intelex

Seamless integration of Scikit-learn with Intellex for AI inference and machine learning applications

1.3K
Active
Python
React
#machine-learning#ai-inference#scikit-learn-integration

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

1.3K
Experimental
Jupyter Notebook
Databases
ETL & Pipelines
#big-data#data-algorithms#dataframes

apache/cloudberry

Open-source massively parallel processing (MPP) database, an alternative to Greenplum.

1.2K
Active
C
Databases
OLAP
PostgreSQL
#big-data#data-analysis#data-warehouse

yahoo/egads

A Java library for automatically detecting anomalies in large-scale time-series data.

1.2K
Archived
Java
Anomaly Detection
#anomaly-detection#time-series#big-data

DeepWisdom/AutoDL

Automated Deep Learning without any human intervention, the first solution for the AutoDL challenge@NeurIPS.

1.2K
Archived
Python
AutoML
ML Ops
PyTorch
#automated-machine-learning#automl#data-science

apachecn/spark-doc-zh

This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.

1.2K
Archived
JavaScript
Databases
API Frameworks
#big-data#spark#java

lakehq/sail

LakeSail is a Rust-based computation framework that unifies batch processing, stream processing, and AI workloads.

1.2K
Active
Rust
ML Ops
ETL & Pipelines
#distributed-computing#data-engineering#big-data

apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

1.2K
Active
Java
Databases
API Frameworks
Java
#big-data#hadoop#kubernetes

apache/accumulo

Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.

1.1K
Active
Java
Databases
API Frameworks
#big-data#distributed-computing#database

graphframes/graphframes

GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.

1.1K
Active
Scala
Databases
Caching
#apache-spark#big-data#graph-analysis

mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

1.1K
Active
C#
Databases
CLI Tools
#apache-parquet#big-data#windows-desktop

apache/amoro

Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.

1.1K
Active
Java
Databases
ETL & Pipelines
Flink
#big-data#data-lake#lakehouse

hazelcast/hazelcast-jet

Hazelcast Jet is a distributed stream and batch processing engine for high-performance applications.

1.1K
Archived
Java
API Frameworks
Databases
#distributed-processing#batch-processing#stream-processing

foochane/books

A curated collection of books and resources for various programming languages and technologies, including AI, data, and web development.

1.1K
Archived
Books & Guides
Databases
#programming-books#data-engineering#machine-learning

traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

1.1K
Archived
C
Databases
API Frameworks
#event-data#time-series#big-data

Stay in the loop

Get weekly updates on trending AI coding tools and projects.