Category
Showing 51-100 of 897 trending projects
Embeddable, persistent key-value store for fast storage with LSM design
A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.
Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.
dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.
High-performance distributed graph database for real-time use cases
Open-source, free A-share quantitative trading data platform focused on China's stock market
Transporter is a powerful ETL tool that allows developers to sync data between various persistence engines.
Fast, embedded graph database with vector search and full-text search, compatible with Cypher queries.
Apache Doris is a high-performance, unified analytics database for real-time data processing.
Cloud-native distributed SQL database for modern applications
DuckLake is an integrated data lake and catalog format written in C++.
QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.
A Java ORM SQL query builder that supports popular databases like ClickHouse, Impala, MySQL, and Presto.
Official Git mirror of the SQLite source tree, a popular and widely-used embedded database engine.
JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.
dbt enables data analysts and engineers to transform data using software engineering practices.
A high-performance open source query engine for sub-second analytics on data lakehouse.
Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.
Apache Iceberg is an open-source table format for large analytic datasets, providing a versioned and scalable data lake architecture.
A framework-agnostic, datastore-agnostic JavaScript ORM built for ease of use and peace of mind.
ORM for TypeScript and JavaScript with support for multiple databases and platforms.
A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.
Distributed MySQL database system for horizontal scaling
Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.
A Rust-based, Elasticsearch-quality search engine for PostgreSQL, enabling fast, real-time analytics and HTAP use cases.
Apache DataFusion is a powerful SQL query engine written in Rust, designed for big data processing and analysis.
AliSQL is a MySQL branch originated from Alibaba Group, focused on high performance and scalability.
An open-source Python library that simplifies the process of loading data into data lakes and warehouses.
Redis 6.0.20 through 8.0.0 for Windows, a popular open-source in-memory data structure store.
A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.
Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.
A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.
A curated list of data science interview questions and answers for developers.
A database solution that provides better analytics on top of MongoDB and makes it easier to migrate from MongoDB to SQL.
Apache Arrow is a fast columnar data format and toolset for in-memory analytics and data interchange.
SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.
CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.
A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.
Google's Operations Research tools for combinatorial optimization, linear programming, and operations research.
A Python library for data manipulation and analysis, part of the core data science toolkit.
Portfolio analytics library for quantitative finance, built with Python
A large-scale open-access corpus of scientific papers and metadata for researchers and developers.
Get weekly updates on trending AI coding tools and projects.