Category
Showing 651-700 of 897 trending projects
Python code for causal inference, a book by Miguel Hernán and James Robins.
A Python library that implements database internals from scratch, useful for learning database concepts.
A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.
A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.
A Python library for reading, manipulating, and writing data in various spreadsheet file formats.
Trill is a single-node query processor for temporal or streaming data.
Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.
An embedded time-series database written in Go for storing and querying metrics data.
Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.
A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.
An automatic DBMS configuration tool for optimizing database performance.
A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.
A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.
A scalable, distributed ETL framework for building data lake analytics pipelines.
A comprehensive Julia library for probability distributions and related statistical functions.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
A library for generating MaxMind GeoIP2 databases for China IP addresses.
This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.
Mycelite is a SQLite extension that enables replication between SQLite instances.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.
Compilation of R and Python programming codes for data science and machine learning projects.
Data quality assessment and reporting tool for data frames and database tables in R
A time series library for Apache Spark that provides a high-level API for working with time series data.
LevelDB key/value database in Go for building high-performance data-intensive applications.
A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.
A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.
GridDB is a fast and scalable open-source database for time-series IoT and big data applications.
Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.
A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.
A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.
An Internet-scale distributed database system built on C++, inspired by Google's Bigtable.
Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.
A Python library for processing and visualizing satellite imagery data.
A JavaScript statistical library that provides a wide range of statistical functions for data analysis.
This R library provides historical investment returns analysis for the overall stock market.
ggplot2 is a powerful data visualization library for R that provides elegant and flexible graphics.
A collection of articles and source code on using the pandas data analysis library.
A popular Scala library for parsing and manipulating JSON data in Scala applications.
A collection of efficient Python tricks and tools for data scientists to improve their productivity.
Firebird is a relational database management system (RDBMS) suitable for a wide range of applications from desktop to client-server to large databases.
A powerful 3D visualization library for scientific data in Python.
pandasql is a Python library that allows developers to use SQL syntax to query Pandas DataFrames.
A tool to easily import CSV and JSON data into PostgreSQL databases.
A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.
Get weekly updates on trending AI coding tools and projects.