Explore Projects

Discover 56 open source projects

Active filters (1):

Search: hadoop×

Clear all

Showing 21-40 of 56 projects

MoRan1607/BigDataGuide

A comprehensive guide to big data, covering various tools and technologies for learning and development.

3.1K

Active

React

#bigdata#machine learning#development

apache/nutch

Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.

3.1K

Active

Java

API Frameworks

Backend Frameworks

#apache#crawling#hadoop

cirosantilli/china-dictatorship

Political activism documentation on Chinese government censorship, human rights, and censorship circumvention techniques.

2.9K

Active

HTML

Resource Collections

Privacy Tools

#censorship-circumvention#china-dictatorship#human-rights

geekyouth/SZT-bigdata

This is a big data analysis system for the Shenzhen metro with support for various data processing tools.

2.4K

Archived

Scala

Databases

API Frameworks

Scala

#big-data#data-analysis#metro

big-data-europe/docker-hadoop

A Docker image for Apache Hadoop, a popular big data processing framework.

2.3K

Archived

Shell

docker-hadoop

Docker

#docker-hadoop#hadoop-cluster#big-data

apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

2.3K

Active

Thrift

Databases

#apache#parquet#columnar-storage

dahuoyzs/javapdf

This repository contains 100 Java ebooks and technical books in PDF format for developers.

2.1K

Archived

API Frameworks

Databases

#java#ebooks#technical-books

apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

1.9K

Active

C++

Databases

API Frameworks

#big-data#cplusplus#open-source

kiwenlau/hadoop-cluster-docker

A Docker-based Hadoop cluster for local development and testing of distributed applications.

1.8K

Archived

Shell

API Frameworks

Containerization

#hadoop#docker#distributed-computing

gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties for big data applications.

1.8K

Experimental

Java

Databases

API Frameworks

#big-data#graph-database#hadoop

gege-circle/.github

A Vue-based chat application with real-time functionality

1.8K

Stable

Component Libraries (Vue/Svelte)

Authentication

Vue

#real-time#vue#chat-application

xianrendzw/EasyReport

A simple and easy-to-use web report system for Java, supporting SQL, Hadoop, HBase, and more.

1.8K

Archived

Java

API Frameworks

Databases

#sql#report-generation#web-application

Qihoo360/hbox

AI on Hadoop for developers to build and deploy machine learning models

1.7K

Experimental

Java

MXNet

React

#machine learning#Hadoop#AI

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K

Stable

Shell

Databases

ETL & Pipelines

#bigdata#hadoop#spark

mongodb/mongo-hadoop

A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.

1.6K

Archived

Java

Databases

API Frameworks

#mongodb#hadoop#big-data

OBenner/data-engineering-interview-questions

This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.

1.5K

Active

Python

Interview Prep

ETL & Pipelines

#data-engineering#interview-questions#interview-prep

apache/carbondata

CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.

1.4K

Active

Scala

Databases

API Frameworks

Spark

#big-data#hadoop#spark

colinmarc/hdfs

A native Go client for interacting with the Hadoop Distributed File System (HDFS).

1.4K

Archived

API Frameworks

Databases

#hdfs#hadoop#distributed-file-system

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

1.4K

Active

Java

ETL & Pipelines

API Frameworks

#etl#database#rdbms

HariSekhon/Dockerfiles

A collection of 50+ Docker images for DevOps tools, CI/CD, Hadoop, Kafka, Cassandra, and more.

1.4K

Active

Shell

Containerization

CLI Tools

#docker#kubernetes#devops

Stay in the loop

Get weekly updates on trending AI coding tools and projects.