Category
Showing 801-850 of 897 trending projects
Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.
A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
A Python package for time series classification, useful for developers working with time-series data.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
Open source SQL query assistant service for databases and data warehouses
A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.
This is Facebook's branch of the Oracle MySQL database, including the MyRocks storage engine.
A curated list of community detection research papers with implementations for data science and network analysis.
A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.
Performant probabilistic data structures for processing continuous, unbounded streams in Go.
This repository provides code examples for Oracle's AI-enabled database features and integrations.
Python code for causal inference, a book by Miguel Hernán and James Robins.
A Python library for extracting, transforming, and loading tabular data.
Apache Impala is a high-performance, open-source, SQL query engine that runs on Apache Hadoop and Apache Kudu.
This repository provides code and data for a book on statistics for data scientists.
RRDtool is a time-series database system for efficiently storing and graphing data.
A time series library for Apache Spark that provides a high-level API for working with time series data.
Apache Cassandra is a distributed, wide-column store database system designed for high availability, scalability, and performance.
Lightweight, fast, and reliable key-value database engine in Go for high-throughput applications.
This repository provides Python implementations of exercises from the book 'An Introduction to Statistical Learning'.
A high-performance, highly available, and distributed time series database written in Rust.
A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.
An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.
A Python data analysis library optimized for humans instead of machines.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.
Collaborative offline-first SQLite wrapper for syncing app state across users & devices
Agile data preparation workflows made easy with popular Python data science libraries.
Firebird is a relational database management system (RDBMS) suitable for a wide range of applications from desktop to client-server to large databases.
PumpkinDB is an immutable, ordered key-value database engine written in Rust.
A Python package for handling messy CSV files with improved dialect detection and a command-line interface.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
LevelDB key/value database in Go for building high-performance data-intensive applications.
A next-generation curated knowledge sharing platform for data scientists and other technical professionals.
Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.
An open-source threat hunting platform built on the ELK stack for security researchers and analysts.
Python data structures library focused on serialization, deserialization, and validation of complex data schemas.
A collection of monthly reports on the internals of Alibaba Cloud's database products.
ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
GridDB is a fast and scalable open-source database for time-series IoT and big data applications.
A JavaScript statistical library that provides a wide range of statistical functions for data analysis.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
A lightweight key-value store built with C++ using a skiplist data structure.
A composable data framework for building ambitious web applications using TypeScript.
A functional, type-safe, composable Scala data access library for Postgres databases.
SQLite with Branches - a lightweight, embedded database with version control capabilities.
A Go ORM and query builder for interacting with databases in Go applications.
Python demos for spatial data analytics, geostatistics, and machine learning to support courses.
Get weekly updates on trending AI coding tools and projects.