Showing 21-40 of 50 projects
C++ DataFrame library for statistical, financial, and machine learning analysis.
A unified framework for large-scale data computation that scales popular Python data tools like NumPy, Pandas, and Scikit-Learn.
A Python library for creating easy-to-use, visually appealing data tables and summaries.
Fastest library to load data from DB to DataFrames in Rust and Python
Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.
A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.
ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.
Tidy Viewer is a cross-platform CLI tool for pretty printing CSV data with customizable column styling.
This repository helps developers learn Python and Machine Learning from scratch.
A GPU-accelerated SQL engine for Python, built on RAPIDS cuDF, for high-performance data processing and analysis.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
A curated list of cybersecurity datasets for security researchers and machine learning practitioners.
In-memory tabular data in Julia, a high-performance language for data manipulation and analysis.
Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
A JavaScript library for efficient querying and transformation of array-backed data tables.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
Provides Jupyter magics and kernels for working with remote Spark clusters, enabling data scientists to easily interact with Spark from Jupyter Notebooks.
pandasql is a Python library that allows developers to use SQL syntax to query Pandas DataFrames.
A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.
Get weekly updates on trending AI coding tools and projects.