Trending Projects

Discover the fastest growing open source projects

Showing 51-100 of 897 trending projects

#51

facebook/rocksdb

Embeddable, persistent key-value store for fast storage with LSM design

+1.6K

+5.4%

31.6K

total stars

C++

#52

pymupdf/PyMuPDF

A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.

+1.6K

+20.9%

9.2K

total stars

Python

#53

fluvio-community/fluvio

Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.

+1.6K

+43.5%

5.2K

total stars

Rust

#54

treeverse/dvc

dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.

+1.5K

+10.6%

15.4K

total stars

Python

#55

dgraph-io/dgraph

High-performance distributed graph database for real-time use cases

+1.4K

+7.2%

21.6K

total stars

#56

1nchaos/adata

Open-source, free A-share quantitative trading data platform focused on China's stock market

+1.4K

+54.7%

4.0K

total stars

Python

#57

compose/transporter

Transporter is a powerful ETL tool that allows developers to sync data between various persistence engines.

+1.4K

+2794.0%

1.4K

total stars

#58

influxdata/influxdb

Time-series database for metrics & analytics

+1.3K

+4.5%

31.4K

total stars

Rust

#59

kuzudb/kuzu

Fast, embedded graph database with vector search and full-text search, compatible with Cypher queries.

+1.3K

+55.8%

3.7K

total stars

C++

#60

apache/doris

Apache Doris is a high-performance, unified analytics database for real-time data processing.

+1.3K

+9.7%

15.1K

total stars

Java

#61

pingcap/tidb

Cloud-native distributed SQL database for modern applications

+1.3K

+3.4%

39.9K

total stars

#62

duckdb/ducklake

DuckLake is an integrated data lake and catalog format written in C++.

+1.3K

+109.9%

2.5K

total stars

C++

#63

google/leveldb

Fast key-value storage library for C++

+1.3K

+3.4%

38.9K

total stars

C++

#64

questdb/questdb

QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.

+1.3K

+8.2%

16.7K

total stars

Java

#65

x-ream/sqli

A Java ORM SQL query builder that supports popular databases like ClickHouse, Impala, MySQL, and Presto.

+1.3K

+217.8%

1.9K

total stars

Java

#66

sqlite/sqlite

Official Git mirror of the SQLite source tree, a popular and widely-used embedded database engine.

+1.2K

+15.5%

9.1K

total stars

#67

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.

+1.2K

+10.1%

13.3K

total stars

#68

sqlitebrowser/sqlitebrowser

SQLite database management tool with GUI

+1.2K

+5.4%

23.7K

total stars

C++

#69

dbt-labs/dbt-core

dbt enables data analysts and engineers to transform data using software engineering practices.

+1.2K

+10.8%

12.3K

total stars

Python

#70

StarRocks/starrocks

A high-performance open source query engine for sub-second analytics on data lakehouse.

+1.2K

+11.6%

11.4K

total stars

Java

#71

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

+1.1K

+90.1%

2.4K

total stars

Jupyter Notebook

#72

apache/iceberg

Apache Iceberg is an open-source table format for large analytic datasets, providing a versioned and scalable data lake architecture.

+1.1K

+15.3%

8.6K

total stars

Java

#73

mongodb/mongo

MongoDB database server and tools

+1.1K

+4.2%

28.2K

total stars

C++

#74

js-data/js-data

A framework-agnostic, datastore-agnostic JavaScript ORM built for ease of use and peace of mind.

+1.1K

+225.5%

1.6K

total stars

JavaScript

#75

typeorm/typeorm

ORM for TypeScript and JavaScript with support for multiple databases and platforms.

+1.1K

+3.2%

36.4K

total stars

TypeScript

#76

theOehrly/Fast-F1

A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.

+1.1K

+32.9%

4.5K

total stars

Python

#77

vitessio/vitess

Distributed MySQL database system for horizontal scaling

+1.1K

+5.7%

20.8K

total stars

#78

cockroachdb/cockroach

Distributed SQL database for cloud-native apps

+1.1K

+3.6%

32.0K

total stars

#79

trinodb/trino

Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.

+1.1K

+9.5%

12.6K

total stars

Java

#80

paradedb/paradedb

A Rust-based, Elasticsearch-quality search engine for PostgreSQL, enabling fast, real-time analytics and HTAP use cases.

+1.1K

+14.8%

8.5K

total stars

Rust

#81

apache/datafusion

Apache DataFusion is a powerful SQL query engine written in Rust, designed for big data processing and analysis.

+1.1K

+14.7%

8.5K

total stars

Rust

#82

alibaba/AliSQL

AliSQL is a MySQL branch originated from Alibaba Group, focused on high performance and scalability.

+1.1K

+23.2%

5.8K

total stars

C++

#83

dlt-hub/dlt

An open-source Python library that simplifies the process of loading data into data lakes and warehouses.

+1.1K

+27.4%

5.0K

total stars

Python

#84

redis-windows/redis-windows

Redis 6.0.20 through 8.0.0 for Windows, a popular open-source in-memory data structure store.

+1.1K

+43.4%

3.5K

total stars

Batchfile

#85

mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

+1.1K

+2134.0%

1.1K

total stars

#86

orbitinghail/graft

Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.

+1.0K

+280.1%

1.4K

total stars

Rust

#87

marcboeker/go-duckdb

A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.

+1.0K

+2011.8%

1.1K

total stars

#88

youssefHosni/Data-Science-Interview-Questions-Answers

A curated list of data science interview questions and answers for developers.

+1.0K

+22.8%

5.5K

total stars

#89

torodb/stampede

A database solution that provides better analytics on top of MongoDB and makes it easier to migrate from MongoDB to SQL.

+1.0K

+139.8%

1.8K

total stars

Java

#90

redis/go-redis

Redis client for Go with support for Redis 8.0+

+1.0K

+4.8%

22.0K

total stars

#91

apache/arrow

Apache Arrow is a fast columnar data format and toolset for in-memory analytics and data interchange.

+1.0K

+6.5%

16.6K

total stars

C++

#92

SciRuby/daru

SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.

+1.0K

+1980.4%

1.1K

total stars

Ruby

#93

databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

+1.0K

+1724.1%

1.1K

total stars

Scala

#94

apache/celeborn

Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.

+989

+1978.0%

1.0K

total stars

Java

#95

facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

+988

+1976.0%

1.0K

total stars

Python

#96

CJ-Chen/TBtools-II

A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.

+981

+1962.0%

1.0K

total stars

Shell

#97

google/or-tools

Google's Operations Research tools for combinatorial optimization, linear programming, and operations research.

+971

+8.0%

13.2K

total stars

C++

#98

KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

+969

+1076.7%

1.1K

total stars

Jupyter Notebook

#99

ranaroussi/quantstats

Portfolio analytics library for quantitative finance, built with Python

+968

+16.6%

6.8K

total stars

Python

#100

allenai/s2orc

A large-scale open-access corpus of scientific papers and metadata for researchers and developers.

+968

+1936.0%

1.0K

total stars

Python

13...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.