Category
Showing 101-150 of 897 trending projects
A Python library for downloading, parsing, and analyzing health data from Garmin, FitBit, and MS Health.
Apache Fluss is a real-time streaming storage platform built for big data analytics.
A Rust library for quantitative finance, including tools for machine learning, option pricing, and trading.
networkx is a Python library for creating, manipulating, and studying the structure and dynamics of complex networks.
Apache Arrow is a fast columnar data format and toolset for in-memory analytics and data interchange.
Fast, embeddable key-value database written in Go for building high-performance storage applications.
An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.
Citus is a distributed PostgreSQL database that enables scaling out your Postgres database across multiple nodes.
Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.
Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.
A repository of data science interview questions and answers for developers.
A quantitative research and stock analysis platform for finance professionals.
A comprehensive collection of 150+ Python programs for quantitative finance and stock market data analysis.
A cross-platform TUI database management tool written in Go for developers working with databases.
MongoDB-compatible database engine for cloud-native and open-source workloads with scalability and performance.
Open-source repository for sharing code related to the MIMIC family of critical care databases.
An open-source data catalog platform for building a high-performance, federated metadata lake.
A Python library with most common stock market technical indicators, making it easy to implement quantitative finance and algorithmic trading.
A collection of data science projects in Python using Jupyter Notebook.
DuckLake is an integrated data lake and catalog format written in C++.
A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.
Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.
A high-performance, embeddable key-value storage engine written in Rust for developers building data-intensive applications.
A Python library for scraping soccer data from various sources for sports analytics and data science.
A Python library that provides support for the pgvector vector database, enabling efficient vector search and storage.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.
FoundationDB is an open-source, distributed, transactional key-value store that provides ACID guarantees.
A high-performance NoSQL data store compatible with Apache Cassandra and Amazon DynamoDB.
SciPy is a Python library for scientific and technical computing, providing a wide range of algorithms and tools.
A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.
JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.
Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.
A comprehensive list of learning materials to help developers understand database internals.
A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.
mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.
A curated list of data engineering tools for software developers, not focused on AI coding tools.
A lightweight, document-oriented database optimized for happiness, used as a Python library or CLI.
Database manager for multiple database engines, runs as desktop or web app.
A comprehensive collection of geospatial tools and resources for data analysis, machine learning, and spatial applications.
A high-quality, cross-platform data plotting library for Rust developers, including WebAssembly support.
A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.
A Python library that provides a simple and unified interface for extracting text from any document format.
A modular quantitative trading framework for algorithmic trading, backtesting, and financial analysis.
A Rust-based implementation of an LSM-Tree storage engine (database) for developers to build and learn from.
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
A Python library for 3D plotting and mesh analysis using the Visualization Toolkit (VTK)
Idempotent schema management tool for MySQL, PostgreSQL, SQLite, and SQL Server databases.
A Python script to fetch Garmin health data and populate it in an InfluxDB database for visualization in Grafana.
Get weekly updates on trending AI coding tools and projects.