Showing 1-19 of 19 projects
Portable Python dataframe library for data analysis and manipulation
SynapseML is a simple and distributed machine learning library for building and deploying AI models at scale.
Apache Linkis provides a computation middleware layer to connect, govern, and orchestrate applications with data engines.
Petastorm enables training and evaluation of deep learning models from Apache Parquet datasets.
A curated list of awesome Apache Spark packages and resources for developers.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.
Agile data preparation workflows made easy with popular Python data science libraries.
Provides Jupyter magics and kernels for working with remote Spark clusters, enabling data scientists to easily interact with Spark from Jupyter Notebooks.
A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.
Hopsworks is a feature store and MLOps platform for data-intensive AI and machine learning applications.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
This is a style guide for PySpark code, providing best practices for common situations in PySpark repos.
Starter code for solving real-world text data problems using NLP techniques like Gensim Word2Vec and text classification.
LakeSail is a Rust-based computation framework that unifies batch processing, stream processing, and AI workloads.
A Python library that integrates Scikit-learn into the Apache Spark distributed computing framework.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
Get weekly updates on trending AI coding tools and projects.