Explore Projects

Discover 50 open source projects

Active filters (1):
Search: dataframeร—
Clear all

Showing 21-40 of 50 projects

hosseinmoein/DataFrame

C++ DataFrame library for statistical, financial, and machine learning analysis.

2.9K
Active
C++
Databases
Data Analysis
#data-analysis#statistical-analysis#financial-engineering

mars-project/mars

A unified framework for large-scale data computation that scales popular Python data tools like NumPy, Pandas, and Scikit-Learn.

2.7K
Archived
Python
ML Ops
Caching
Dask
#machine-learning#data-processing#scale

posit-dev/great-tables

A Python library for creating easy-to-use, visually appealing data tables and summaries.

2.6K
Active
Python
ORMs & Query Builders
CLI Tools
#data-visualization#pandas#polars

sfu-db/connector-x

Fastest library to load data from DB to DataFrames in Rust and Python

2.6K
Active
Rust
React
#dataframe#database#sql

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

2.4K
Active
Jupyter Notebook
ETL & Pipelines
MLOps
Python
#etl#data-engineering#data-science

chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

2.3K
Archived
Python
Databases
Backend Frameworks
Python
#pdf#tabula#pandas

man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

2.2K
Active
C++
Databases
Caching
Python
#data-analysis#data-science#dataframe

alexhallam/tv

Tidy Viewer is a cross-platform CLI tool for pretty printing CSV data with customizable column styling.

2.1K
Stable
Rust
CLI Tools
Data Visualization
#cli#csv#data-visualization

Tanu-N-Prabhu/Python

This repository helps developers learn Python and Machine Learning from scratch.

2.1K
Active
Jupyter Notebook
Tutorials & Courses
Machine Learning Algorithms
#python#machine-learning#data-analysis

BlazingDB/blazingsql

A GPU-accelerated SQL engine for Python, built on RAPIDS cuDF, for high-performance data processing and analysis.

2.0K
Archived
C++
GPU Acceleration
Databases
Python
#gpu#sql-engine#data-science

apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

2.0K
Active
Rust
Databases
ETL & Pipelines
#big-data#dataframe#distributed

shramos/Awesome-Cybersecurity-Datasets

A curated list of cybersecurity datasets for security researchers and machine learning practitioners.

1.9K
Archived
Security Research
Datasets
#cybersecurity#dataset#security-research

JuliaData/DataFrames.jl

In-memory tabular data in Julia, a high-performance language for data manipulation and analysis.

1.8K
Active
Julia
#data-frame#tabular-data#in-memory

narwhals-dev/narwhals

Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.

1.5K
Active
Python
Databases
CLI Tools
Python
#dataframes#compatibility#pandas

paradigmxyz/cryo

cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.

1.5K
Archived
Rust
ETL & Pipelines
API Frameworks
#blockchain#ethereum#parquet

uwdata/arquero

A JavaScript library for efficient querying and transformation of array-backed data tables.

1.5K
Experimental
JavaScript
ORMs & Query Builders
CLI Tools
JavaScript
#data-transformation#array-data#query-builder

pyjanitor-devs/pyjanitor

A Python library for cleaning and transforming data, inspired by the R package Janitor.

1.5K
Active
Python
ETL & Pipelines
CLI Tools
#cleaning-data#data-transformation#pandas-extension

jupyter-incubator/sparkmagic

Provides Jupyter magics and kernels for working with remote Spark clusters, enabling data scientists to easily interact with Spark from Jupyter Notebooks.

1.4K
Stable
Python
API Frameworks
Databases
Jupyter
#spark#jupyter-notebook#pyspark

yhat/pandasql

pandasql is a Python library that allows developers to use SQL syntax to query Pandas DataFrames.

1.3K
Archived
Python
ORMs & Query Builders
CLI Tools
Python
#sql#pandas#dataframe

spark-examples/pyspark-examples

A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.

1.3K
Stable
Python
Databases
API Frameworks
Python
#pyspark#spark#big-data

Stay in the loop

Get weekly updates on trending AI coding tools and projects.