Explore Projects

Discover 140 open source projects

Active filters (1):
Search: spark×
Clear all

Showing 1-20 of 140 projects

apache/spark

Unified analytics engine for large-scale data processing

42.9K
Active
Scala
ETL & Pipelines
Realtime
Apache
#big-data#spark#data-processing

DataTalksClub/data-engineering-zoomcamp

Free 9-week data engineering course with hands-on modules on pipelines, dbt, Kafka, and Spark

38.9K
Active
Jupyter Notebook
Tutorials & Courses
ETL & Pipelines
dbt
#data-engineering#course#dbt

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K
Archived
Python
Computer Vision
ML Ops
TensorFlow
#data-science#deep-learning#machine-learning

getredash/redash

Redash enables data-driven decisions by connecting to data sources and creating visualizations and dashboards.

28.3K
Active
Python
Analytics & Tracking
Search
Python
#analytics#dashboard#data-visualization

dmlc/xgboost

Distributed gradient boosting library for fast and accurate data science solutions

28.1K
Active
C++
ML Ops
Multi-Purpose
#xgboost#machine-learning#distributed-systems

yeasy/docker_practice

Docker & container tutorial and practice guide for DevOps

25.9K
Active
Go
Containerization
Books & Guides
Docker
#docker#containerization#devops

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K
Active
Python
ML Ops
Agent Coordination
LangChain
#mlflow#ai-models#experiment-tracking

Dujltqzv/Some-Many-Books

A personal collection of books, including PDF downloads, Baidu Cloud, and e-book downloads.

18.3K
Stable
Books & Guides
#books#pdf#ebooks

NirDiamant/agents-towards-production

End-to-end tutorials covering production-grade GenAI agents with reusable patterns and blueprints.

18.0K
Active
Jupyter Notebook
Agents & Orchestration
React
#agent#generative-ai#llms

heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

16.9K
Archived
Java
Databases
#big-data#hadoop#spark

FavioVazquez/ds-cheatsheets

A comprehensive collection of data science cheatsheets for developers and data scientists.

16.2K
Archived
Data Science
#datascience#cheatsheet#python

GaiZhenbiao/ChuanhuChatGPT

A GUI for ChatGPT API and many LLMs with a neat UI, supporting agents, file-based QA, GPT finetuning, and query with web search.

15.4K
Stable
Python
React
#authentication#streaming#real-time

apache/doris

Apache Doris is a high-performance, unified analytics database for real-time data processing.

15.1K
Active
Java
Databases
Spark
#database#olap#real-time

zhisheng17/flink-learning

This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.

15.1K
Experimental
Java
Databases
#stream-processing#flink#kafka

aalansehaiyang/technology-talk

A comprehensive collection of Java-related resources for developers, including interview prep, architecture guides, and popular middleware.

14.7K
Experimental
Tutorials & Courses
Spring
#java#spring#springboot

horovod/horovod

Distributed training framework for deep learning models using popular ML libraries like TensorFlow, Keras, PyTorch, and MXNet.

14.7K
Stable
Python
ML Ops
PyTorch
#distributed-training#deep-learning#machine-learning

deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM

14.2K
Active
Java
ML Ops
Java
#artificial-intelligence#deeplearning#neural-nets

Data-Centric-AI-Community/ydata-profiling

A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.

13.4K
Active
Python
Data Profiling
Python
#data-profiling#exploratory-data-analysis#data-quality

howdyai/botkit

Botkit is an open-source framework for building chatbots, apps, and custom integrations for popular messaging platforms.

11.6K
Archived
TypeScript
Authentication
Node
#bots#chatbots#messaging-platforms

SparkAudio/Spark-TTS

Spark-TTS is an open-source Python library for high-quality text-to-speech inference.

10.9K
Experimental
Python
AI Voice & Speech
#text-to-speech#inference#open-source

Stay in the loop

Get weekly updates on trending AI coding tools and projects.