Explore Projects

Discover 56 open source projects

Active filters (1):
Search: hadoop×
Clear all

Showing 21-40 of 56 projects

MoRan1607/BigDataGuide

A comprehensive guide to big data, covering various tools and technologies for learning and development.

3.1K
Active
React
#bigdata#machine learning#development

apache/nutch

Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.

3.1K
Active
Java
API Frameworks
Backend Frameworks
#apache#crawling#hadoop

cirosantilli/china-dictatorship

Political activism documentation on Chinese government censorship, human rights, and censorship circumvention techniques.

2.9K
Active
HTML
Resource Collections
Privacy Tools
#censorship-circumvention#china-dictatorship#human-rights

geekyouth/SZT-bigdata

This is a big data analysis system for the Shenzhen metro with support for various data processing tools.

2.4K
Archived
Scala
Databases
API Frameworks
Scala
#big-data#data-analysis#metro

big-data-europe/docker-hadoop

A Docker image for Apache Hadoop, a popular big data processing framework.

2.3K
Archived
Shell
docker-hadoop
Docker
#docker-hadoop#hadoop-cluster#big-data

apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

2.3K
Active
Thrift
Databases
#apache#parquet#columnar-storage

dahuoyzs/javapdf

This repository contains 100 Java ebooks and technical books in PDF format for developers.

2.1K
Archived
API Frameworks
Databases
#java#ebooks#technical-books

apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

1.9K
Active
C++
Databases
API Frameworks
#big-data#cplusplus#open-source

kiwenlau/hadoop-cluster-docker

A Docker-based Hadoop cluster for local development and testing of distributed applications.

1.8K
Archived
Shell
API Frameworks
Containerization
#hadoop#docker#distributed-computing

gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties for big data applications.

1.8K
Experimental
Java
Databases
API Frameworks
#big-data#graph-database#hadoop

gege-circle/.github

A Vue-based chat application with real-time functionality

1.8K
Stable
Component Libraries (Vue/Svelte)
Authentication
Vue
#real-time#vue#chat-application

xianrendzw/EasyReport

A simple and easy-to-use web report system for Java, supporting SQL, Hadoop, HBase, and more.

1.8K
Archived
Java
API Frameworks
Databases
#sql#report-generation#web-application

Qihoo360/hbox

AI on Hadoop for developers to build and deploy machine learning models

1.7K
Experimental
Java
MXNet
React
#machine learning#Hadoop#AI

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K
Stable
Shell
Databases
ETL & Pipelines
#bigdata#hadoop#spark

mongodb/mongo-hadoop

A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.

1.6K
Archived
Java
Databases
API Frameworks
#mongodb#hadoop#big-data

OBenner/data-engineering-interview-questions

This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.

1.5K
Active
Python
Interview Prep
ETL & Pipelines
#data-engineering#interview-questions#interview-prep

apache/carbondata

CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.

1.4K
Active
Scala
Databases
API Frameworks
Spark
#big-data#hadoop#spark

colinmarc/hdfs

A native Go client for interacting with the Hadoop Distributed File System (HDFS).

1.4K
Archived
Go
API Frameworks
Databases
#hdfs#hadoop#distributed-file-system

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

1.4K
Active
Java
ETL & Pipelines
API Frameworks
#etl#database#rdbms

HariSekhon/Dockerfiles

A collection of 50+ Docker images for DevOps tools, CI/CD, Hadoop, Kafka, Cassandra, and more.

1.4K
Active
Shell
Containerization
CLI Tools
#docker#kubernetes#devops

Stay in the loop

Get weekly updates on trending AI coding tools and projects.