Trending Projects

Discover the fastest growing open source projects

Showing 201-250 of 897 trending projects

#201

tikv/tikv

Distributed transactional key-value database, originally created to complement TiDB

+235

+1.4%

16.6K

total stars

Rust

#202

opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

+234

+29.6%

1.0K

total stars

Python

#203

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, resources and other awesomeness.

+232

+1.6%

14.3K

total stars

#204

garden-co/jazz

A distributed database with CRDT sync, offline support, and end-to-end encryption for vibe coders.

+232

+10.4%

2.5K

total stars

TypeScript

#205

dgraph-io/badger

Fast, embeddable key-value database written in Go for building high-performance storage applications.

+231

+1.5%

15.5K

total stars

#206

pudo/dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

+231

+5.0%

4.9K

total stars

Python

#207

lit26/finvizfinance

A Python library for financial analysis and data scraping from the Finviz platform.

+227

+22.7%

1.2K

total stars

Jupyter Notebook

#208

hannorein/rebound

An open-source N-body simulation library for astrophysics and planetary science.

+226

+27.7%

1.0K

total stars

#209

plotters-rs/plotters

A high-quality, cross-platform data plotting library for Rust developers, including WebAssembly support.

+224

+5.2%

4.5K

total stars

Rust

#210

statsmodels/statsmodels

Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.

+222

+2.0%

11.3K

total stars

Python

#211

orioledb/orioledb

OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.

+222

+5.9%

4.0K

total stars

#212

felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+222

+18.1%

1.4K

total stars

C++

#213

vesoft-inc/nebula

Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.

+221

+1.9%

12.1K

total stars

C++

#214

dineug/erd-editor

An open-source, TypeScript-based Entity-Relationship Diagram (ERD) editor for developers working with databases.

+220

+16.0%

1.6K

total stars

TypeScript

#215

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

+219

+2.6%

8.6K

total stars

Scala

#216

databendlabs/databend

Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.

+218

+2.4%

9.2K

total stars

Rust

#217

wireservice/csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

+218

+3.5%

6.4K

total stars

Python

#218

dpilger26/NumCpp

A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.

+209

+5.6%

3.9K

total stars

C++

#219

nalgeon/redka

A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.

+208

+4.8%

4.5K

total stars

#220

veb-101/Data-Science-Projects

A collection of data science projects in Python using Jupyter Notebook.

+207

+8.8%

2.6K

total stars

Jupyter Notebook

#221

apache/seatunnel

A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.

+204

+2.3%

9.1K

total stars

Java

#222

chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

+204

+9.7%

2.3K

total stars

Python

#223

bruin-data/bruin

A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.

+204

+16.6%

1.4K

total stars

#224

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.

+201

+13.7%

1.7K

total stars

Jupyter Notebook

#225

elastic/kibana

Kibana is an open-source data visualization and management tool for Elasticsearch

+198

+0.9%

21.0K

total stars

TypeScript

#226

benbjohnson/thesecretlivesofdata

A JavaScript library for visualizing and understanding complex data structures.

+198

+5.8%

3.6K

total stars

JavaScript

#227

ngaut/builddatabase

A distributed SQL database built from scratch, not focused on vibe coders or AI tools.

+196

+10.0%

2.1K

total stars

#228

xtensor-stack/xtensor

A C++ library for multidimensional array operations with broadcasting and lazy computing.

+195

+5.5%

3.7K

total stars

C++

#229

brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

+195

+22.3%

1.1K

total stars

Jupyter Notebook

#230

ContextLab/hypertools

A Python toolbox for gaining geometric insights into high-dimensional data, useful for vibe coders working with AI tools.

+194

+11.5%

1.9K

total stars

Python

#231

SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

+189

+0.5%

36.2K

total stars

#232

synthetichealth/synthea

Synthea is an open-source synthetic patient population simulator for generating realistic healthcare data.

+186

+6.6%

3.0K

total stars

Java

#233

dolthub/go-mysql-server

A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.

+186

+7.7%

2.6K

total stars

#234

microsoft/sql-server-samples

This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.

+183

+1.7%

10.9K

total stars

#235

PeerDB-io/peerdb

Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage

+183

+6.5%

3.0K

total stars

#236

hosseinmoein/DataFrame

C++ DataFrame library for statistical, financial, and machine learning analysis.

+183

+6.7%

2.9K

total stars

C++

#237

apache/fluss

Apache Fluss is a real-time streaming storage platform built for big data analytics.

+182

+11.2%

1.8K

total stars

Java

#238

TobikoData/sqlmesh

Scalable and efficient data transformation framework with backwards compatibility for dbt.

+181

+6.6%

2.9K

total stars

Python

#239

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

+180

+3.6%

5.2K

total stars

#240

josonle/Coding-Now

A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.

+180

+20.8%

1.0K

total stars

Python

#241

yougov/mongo-connector

MongoDB data stream pipeline tools for managing real-time data synchronization and replication.

+178

+10.5%

1.9K

total stars

Python

#242

kedro-org/kedro

Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.

+177

+1.7%

10.8K

total stars

Python

#243

huachaohuang/awesome-dbdev

A curated list of awesome materials and resources for database development.

+177

+12.5%

1.6K

total stars

#244

gunrock/gunrock

Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.

+177

+19.9%

1.1K

total stars

C++

#245

TurboWay/bigdata_analyse

This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.

+176

+3.6%

5.0K

total stars

Python

#246

intake/intake

Intake is a lightweight Python package for discovering, investigating, loading and distributing data.

+176

+19.7%

1.1K

total stars

Python

#247

Tencent/wcdb

WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.

+173

+1.5%

11.7K

total stars

#248

PRQL/prql

PRQL is a modern, powerful, and pipelined SQL replacement for transforming data.

+173

+1.6%

10.7K

total stars

Rust

#249

mattn/go-sqlite3

A lightweight SQLite3 driver for Go that implements the database/sql interface.

+173

+2.0%

9.0K

total stars

#250

OSGeo/gdal

GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.

+173

+3.1%

5.8K

total stars

C++

1...46...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.