Trending Projects

Discover the fastest growing open source projects

Showing 51-100 of 897 trending projects

#51

apache/zeppelin

Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.

+516

+8.5%

6.6K

total stars

Java

#52

lux-org/lux

Automatically visualize your pandas dataframes with a single print command, enabling quick EDA.

+507

+10.4%

5.4K

total stars

Python

#53

jackzhenguo/python-small-examples

A collection of Python code examples and tutorials for data science, machine learning, and web development.

+500

+6.6%

8.1K

total stars

Python

#54

apache/kafka

Distributed event streaming platform for data pipelines and real-time apps

+499

+1.6%

32.1K

total stars

Java

#55

PrefectHQ/prefect

Workflow orchestration for resilient data pipelines in Python

+482

+2.3%

21.8K

total stars

Python

#56

beekeeper-studio/beekeeper-studio

Modern SQL client for multiple databases

+468

+2.2%

22.1K

total stars

TypeScript

#57

dragonflydb/dragonfly

Modern in-memory key-value store for caching and data management

+466

+1.6%

30.1K

total stars

C++

#58

1nchaos/adata

Open-source, free A-share quantitative trading data platform focused on China's stock market

+466

+13.1%

4.0K

total stars

Python

#59

open-metadata/OpenMetadata

A unified metadata platform for data discovery, data observability, and data governance.

+459

+5.5%

8.8K

total stars

TypeScript

#60

pingcap/awesome-database-learning

A comprehensive list of learning materials to help developers understand database internals.

+458

+4.5%

10.7K

total stars

#61

qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

+456

+7.6%

6.4K

total stars

#62

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.

+439

+3.4%

13.3K

total stars

#63

facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

+437

+72.7%

1.0K

total stars

Python

#64

numpy/numpy

Fundamental package for scientific computing with Python

+435

+1.4%

31.6K

total stars

Python

#65

realm/realm-java

Realm is a mobile database that serves as a replacement for SQLite and ORMs.

+431

+3.9%

11.5K

total stars

Java

#66

ujjwalkarn/DataSciencePython

A Python library for common data analysis and machine learning tasks

+429

+8.1%

5.7K

total stars

Python

#67

etcd-io/etcd

Distributed key-value store for critical distributed system data

+428

+0.8%

51.6K

total stars

#68

TurboWay/bigdata_analyse

This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.

+416

+9.0%

5.0K

total stars

Python

#69

CJ-Chen/TBtools-II

A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.

+405

+64.7%

1.0K

total stars

Shell

#70

Jon-Becker/prediction-market-analysis

Framework for collecting and analyzing prediction market data with comprehensive Polymarket/Kalshi datasets.

+400

+23.2%

2.1K

total stars

Python

#71

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

+397

+1.9%

20.8K

total stars

Python

#72

orioledb/orioledb

OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.

+395

+10.9%

4.0K

total stars

#73

dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

+389

+2.6%

15.1K

total stars

Python

#74

databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

+382

+56.5%

1.1K

total stars

Scala

#75

chinese-poetry/chinese-poetry

Comprehensive Chinese poetry database with JSON-formatted data for developers

+378

+0.8%

51.0K

total stars

JavaScript

#76

apache/spark

Unified analytics engine for large-scale data processing

+378

+0.9%

42.9K

total stars

Scala

#77

jorgerojas26/lazysql

A cross-platform TUI database management tool written in Go for developers working with databases.

+378

+11.9%

3.5K

total stars

#78

tursodatabase/libsql

libSQL is an open-source, open-contribution fork of SQLite, a widely used embedded database.

+376

+2.3%

16.4K

total stars

#79

GoogleTrends/data

An open-source index of Google Trends data, useful for developers building data-driven applications.

+376

+8.5%

4.8K

total stars

JavaScript

#80

gopherdata/gophernotes

The Go kernel for Jupyter notebooks and nteract, enabling data science and numerical computing in Go.

+374

+10.4%

4.0K

total stars

#81

mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

+359

+4.3%

8.7K

total stars

Python

#82

attic-labs/noms

The versioned, forkable, syncable database for developers who need a scalable, distributed data solution.

+355

+5.0%

7.4K

total stars

#83

pymupdf/PyMuPDF

A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.

+354

+4.0%

9.2K

total stars

Python

#84

KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

+353

+50.0%

1.1K

total stars

Jupyter Notebook

#85

seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

+341

+10.2%

3.7K

total stars

#86

matplotlib/mplfinance

A Python library for financial data visualization using Matplotlib, focused on candlestick and OHLC charts.

+337

+8.5%

4.3K

total stars

Python

#87

theOehrly/Fast-F1

A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.

+330

+7.9%

4.5K

total stars

Python

#88

zhu-xlab/GlobalBuildingAtlas

GlobalBuildingAtlas is an open global and complete dataset of building polygons, heights and LoD1 3D models.

+323

+19.4%

2.0K

total stars

Python

#89

ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

+320

+9.7%

3.6K

total stars

Python

#90

cockroachdb/cockroach

Distributed SQL database for cloud-native apps

+319

+1.0%

32.0K

total stars

#91

rxin/db-readings

This is a collection of readings and resources related to databases, not a vibe coder platform.

+315

+4.1%

8.0K

total stars

#92

arpanghosh8453/garmin-grafana

A Python script to fetch Garmin health data and populate it in an InfluxDB database for visualization in Grafana.

+313

+12.1%

2.9K

total stars

Python

#93

MariaDB/server

Open-source relational database management system (RDBMS) for building data-driven applications.

+312

+4.5%

7.3K

total stars

C++

#94

apache/gravitino

An open-source data catalog platform for building a high-performance, federated metadata lake.

+312

+12.1%

2.9K

total stars

Java

#95

influxdata/influxdb

Time-series database for metrics & analytics

+308

+1.0%

31.4K

total stars

Rust

#96

dbt-labs/dbt-core

dbt enables data analysts and engineers to transform data using software engineering practices.

+305

+2.5%

12.3K

total stars

Python

#97

sqlite/sqlite

Official Git mirror of the SQLite source tree, a popular and widely-used embedded database engine.

+304

+3.4%

9.1K

total stars

#98

pingcap/tidb

Cloud-native distributed SQL database for modern applications

+302

+0.8%

39.9K

total stars

#99

paradedb/paradedb

A Rust-based, Elasticsearch-quality search engine for PostgreSQL, enabling fast, real-time analytics and HTAP use cases.

+301

+3.7%

8.5K

total stars

Rust

#100

kayak/pypika

PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.

+298

+11.5%

2.9K

total stars

Python

13...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.