Trending Projects

Discover the fastest growing open source projects

Showing 51-100 of 897 trending projects

#51
apache/zeppelin

Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.

+516
+8.5%
6.6K
total stars
#52
lux-org/lux

Automatically visualize your pandas dataframes with a single print command, enabling quick EDA.

+507
+10.4%
5.4K
total stars
#53
jackzhenguo/python-small-examples

A collection of Python code examples and tutorials for data science, machine learning, and web development.

+500
+6.6%
8.1K
total stars
#54
apache/kafka

Distributed event streaming platform for data pipelines and real-time apps

+499
+1.6%
32.1K
total stars
#55
PrefectHQ/prefect

Workflow orchestration for resilient data pipelines in Python

+482
+2.3%
21.8K
total stars
#56
beekeeper-studio/beekeeper-studio

Modern SQL client for multiple databases

+468
+2.2%
22.1K
total stars
#57
dragonflydb/dragonfly

Modern in-memory key-value store for caching and data management

+466
+1.6%
30.1K
total stars
#58
1nchaos/adata

Open-source, free A-share quantitative trading data platform focused on China's stock market

+466
+13.1%
4.0K
total stars
#59
open-metadata/OpenMetadata

A unified metadata platform for data discovery, data observability, and data governance.

+459
+5.5%
8.8K
total stars
#60
pingcap/awesome-database-learning

A comprehensive list of learning materials to help developers understand database internals.

+458
+4.5%
10.7K
total stars
#61
qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

+456
+7.6%
6.4K
total stars
#62
juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.

+439
+3.4%
13.3K
total stars
#63
facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

+437
+72.7%
1.0K
total stars
#64
numpy/numpy

Fundamental package for scientific computing with Python

+435
+1.4%
31.6K
total stars
#65
realm/realm-java

Realm is a mobile database that serves as a replacement for SQLite and ORMs.

+431
+3.9%
11.5K
total stars
#66
ujjwalkarn/DataSciencePython

A Python library for common data analysis and machine learning tasks

+429
+8.1%
5.7K
total stars
#67
etcd-io/etcd

Distributed key-value store for critical distributed system data

+428
+0.8%
51.6K
total stars
#68
TurboWay/bigdata_analyse

This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.

+416
+9.0%
5.0K
total stars
#69
CJ-Chen/TBtools-II

A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.

+405
+64.7%
1.0K
total stars
#70
Jon-Becker/prediction-market-analysis

Framework for collecting and analyzing prediction market data with comprehensive Polymarket/Kalshi datasets.

+400
+23.2%
2.1K
total stars
#71
airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

+397
+1.9%
20.8K
total stars
#72
orioledb/orioledb

OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.

+395
+10.9%
4.0K
total stars
#73
dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

+389
+2.6%
15.1K
total stars
#74
databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

+382
+56.5%
1.1K
total stars
#75
chinese-poetry/chinese-poetry

Comprehensive Chinese poetry database with JSON-formatted data for developers

+378
+0.8%
51.0K
total stars
#76
apache/spark

Unified analytics engine for large-scale data processing

+378
+0.9%
42.9K
total stars
#77
jorgerojas26/lazysql

A cross-platform TUI database management tool written in Go for developers working with databases.

+378
+11.9%
3.5K
total stars
#78
tursodatabase/libsql

libSQL is an open-source, open-contribution fork of SQLite, a widely used embedded database.

+376
+2.3%
16.4K
total stars
#79
GoogleTrends/data

An open-source index of Google Trends data, useful for developers building data-driven applications.

+376
+8.5%
4.8K
total stars
#80
gopherdata/gophernotes

The Go kernel for Jupyter notebooks and nteract, enabling data science and numerical computing in Go.

+374
+10.4%
4.0K
total stars
#81
mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

+359
+4.3%
8.7K
total stars
#82
attic-labs/noms

The versioned, forkable, syncable database for developers who need a scalable, distributed data solution.

+355
+5.0%
7.4K
total stars
#83
pymupdf/PyMuPDF

A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.

+354
+4.0%
9.2K
total stars
#84
KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

+353
+50.0%
1.1K
total stars
#85
seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

+341
+10.2%
3.7K
total stars
#86
matplotlib/mplfinance

A Python library for financial data visualization using Matplotlib, focused on candlestick and OHLC charts.

+337
+8.5%
4.3K
total stars
#87
theOehrly/Fast-F1

A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.

+330
+7.9%
4.5K
total stars
#88
zhu-xlab/GlobalBuildingAtlas

GlobalBuildingAtlas is an open global and complete dataset of building polygons, heights and LoD1 3D models.

+323
+19.4%
2.0K
total stars
#89
ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

+320
+9.7%
3.6K
total stars
#90
cockroachdb/cockroach

Distributed SQL database for cloud-native apps

+319
+1.0%
32.0K
total stars
#91
rxin/db-readings

This is a collection of readings and resources related to databases, not a vibe coder platform.

+315
+4.1%
8.0K
total stars
#92
arpanghosh8453/garmin-grafana

A Python script to fetch Garmin health data and populate it in an InfluxDB database for visualization in Grafana.

+313
+12.1%
2.9K
total stars
#93
MariaDB/server

Open-source relational database management system (RDBMS) for building data-driven applications.

+312
+4.5%
7.3K
total stars
#94
apache/gravitino

An open-source data catalog platform for building a high-performance, federated metadata lake.

+312
+12.1%
2.9K
total stars
#95
influxdata/influxdb

Time-series database for metrics & analytics

+308
+1.0%
31.4K
total stars
#96
dbt-labs/dbt-core

dbt enables data analysts and engineers to transform data using software engineering practices.

+305
+2.5%
12.3K
total stars
#97
sqlite/sqlite

Official Git mirror of the SQLite source tree, a popular and widely-used embedded database engine.

+304
+3.4%
9.1K
total stars
#98
pingcap/tidb

Cloud-native distributed SQL database for modern applications

+302
+0.8%
39.9K
total stars
#99
paradedb/paradedb

A Rust-based, Elasticsearch-quality search engine for PostgreSQL, enabling fast, real-time analytics and HTAP use cases.

+301
+3.7%
8.5K
total stars
#100
kayak/pypika

PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.

+298
+11.5%
2.9K
total stars
13...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.