Showing 21-40 of 56 projects
A comprehensive guide to big data, covering various tools and technologies for learning and development.
Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.
Political activism documentation on Chinese government censorship, human rights, and censorship circumvention techniques.
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
A Docker image for Apache Hadoop, a popular big data processing framework.
Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.
This repository contains 100 Java ebooks and technical books in PDF format for developers.
Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.
A Docker-based Hadoop cluster for local development and testing of distributed applications.
A large-scale entity and relation database supporting aggregation of properties for big data applications.
A Vue-based chat application with real-time functionality
A simple and easy-to-use web report system for Java, supporting SQL, Hadoop, HBase, and more.
AI on Hadoop for developers to build and deploy machine learning models
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.
This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.
CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.
A native Go client for interacting with the Hadoop Distributed File System (HDFS).
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
A collection of 50+ Docker images for DevOps tools, CI/CD, Hadoop, Kafka, Cassandra, and more.
Get weekly updates on trending AI coding tools and projects.