Trending Projects

Discover the fastest growing open source projects

Showing 351-400 of 897 trending projects

#351

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

+111

+0.7%

16.7K

total stars

Java

#352

rougier/scientific-visualization-book

An open-access book on scientific visualization using Python and Matplotlib for data-driven developers

+111

+1.0%

11.2K

total stars

Python

#353

apache/couchdb

An open-source, scalable, and fault-tolerant NoSQL database with a focus on reliability and offline-first design.

+111

+1.6%

6.8K

total stars

Erlang

#354

orlp/slotmap

A Rust data structure for efficiently storing and accessing data in a sparse set.

+111

+9.4%

1.3K

total stars

Rust

#355

MarcosMeli/FileHelpers

A free and easy-to-use .NET library for reading and writing CSV and fixed-length data files.

+111

+10.6%

1.2K

total stars

#356

apache/paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.

+110

+3.5%

3.2K

total stars

Java

#357

apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

+110

+5.9%

2.0K

total stars

Rust

#358

petewarden/dstk

A collection of open data sets and tools for data science and machine learning tasks.

+110

+10.7%

1.1K

total stars

Ruby

#359

crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

+110

+11.4%

1.1K

total stars

Python

#360

valeriansaliou/sonic

Fast, lightweight search backend alternative to Elasticsearch

+109

+0.5%

21.2K

total stars

Rust

#361

cbailes/awesome-deep-trading

A curated list of resources for machine learning-based algorithmic trading and quantitative finance.

+109

+6.3%

1.8K

total stars

#362

thinh-vu/vnstock

A beginner-friendly Python toolkit for financial data extraction, analysis, and automation.

+109

+10.4%

1.2K

total stars

Python

#363

lerocha/chinook-database

Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2

+108

+4.6%

2.5K

total stars

TSQL

#364

huandu/go-sqlbuilder

A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.

+108

+6.9%

1.7K

total stars

#365

TablePlus/DBngin

DBngin is a free, open-source, cross-platform database management tool for developers.

+108

+9.8%

1.2K

total stars

#366

Automattic/mongoose

Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.

+107

+0.4%

27.5K

total stars

JavaScript

#367

allegro/bigcache

Efficient in-memory cache in Go for storing and retrieving large amounts of data.

+107

+1.3%

8.1K

total stars

#368

rilldata/rill

Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.

+107

+4.5%

2.5K

total stars

#369

duckdb/duckdb-wasm

WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.

+106

+5.8%

1.9K

total stars

C++

#370

feldera/feldera

The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.

+106

+6.2%

1.8K

total stars

Rust

#371

google/tensorstore

A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.

+106

+7.6%

1.5K

total stars

C++

#372

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

+106

+8.2%

1.4K

total stars

Java

#373

linhandev/dataset

A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.

+105

+3.1%

3.5K

total stars

#374

MIT-LCP/mimic-code

Open-source repository for sharing code related to the MIMIC family of critical care databases.

+105

+3.5%

3.1K

total stars

Jupyter Notebook

#375

xiaoxu193/PyTeaser

A Python library that summarizes news articles by extracting the most important sentences.

+105

+9.8%

1.2K

total stars

Python

#376

rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

+105

+10.8%

1.1K

total stars

Python

#377

cyang-kth/fmm

An open-source C++ framework for fast and parallel map matching of GPS trajectories.

+105

+11.4%

1.0K

total stars

C++

#378

sqldelight/sqldelight

SQLDelight - Generates type-safe Kotlin APIs from SQL, enabling easier database management in Kotlin projects.

+104

+1.6%

6.8K

total stars

Kotlin

#379

eddwebster/football_analytics

A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster).

+104

+4.3%

2.5K

total stars

Jupyter Notebook

#380

apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

+104

+4.5%

2.4K

total stars

Jupyter Notebook

#381

supabase/etl

A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.

+104

+5.0%

2.2K

total stars

Rust

#382

biopython/biopython

Biopython is a set of Python modules that provide a wide range of functionality for bioinformatics, including DNA/RNA/protein sequence analysis, phylogenetics, and more.

+103

+2.1%

4.9K

total stars

Python

#383

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

+103

+2.8%

3.7K

total stars

Java

#384

sqlkata/querybuilder

SQL query builder for C# developers, supporting multiple databases and complex queries.

+102

+3.1%

3.3K

total stars

#385

VictoriaMetrics/fastcache

Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.

+102

+4.5%

2.3K

total stars

#386

scrollmapper/bible_databases

This GitHub repository provides a collection of Bible versions and cross-reference databases, but it does not appear to be related to the given developer discovery platform focused on vibe coders.

+102

+7.3%

1.5K

total stars

Python

#387

holistics/dbml

A database modeling language (DBML) that helps define and document database structures.

+101

+2.9%

3.5K

total stars

JavaScript

#388

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

+101

+4.4%

2.4K

total stars

Python

#389

scikit-bio/scikit-bio

A versatile Python library for bioinformatics, providing data structures, algorithms, and educational resources.

+100

+9.4%

1.2K

total stars

Python

#390

arangodb/arangodb

ArangoDB is a multi-model database supporting documents, graphs, and key-values for high-performance applications.

+99

+0.7%

14.1K

total stars

C++

#391

scratchdata/scratchdata

A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.

+99

+9.7%

1.1K

total stars

#392

google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

+99

+10.5%

1.0K

total stars

TeX

#393

galaxyproject/galaxy

An open-source, community-driven platform for data-intensive scientific analysis and visualization.

+98

+5.9%

1.7K

total stars

Python

#394

apache/hive

Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.

+97

+1.6%

6.0K

total stars

Java

#395

mdeff/fma

A dataset for music analysis and research, with support for deep learning and reproducible research.

+97

+3.9%

2.6K

total stars

Jupyter Notebook

#396

man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

+97

+4.6%

2.2K

total stars

C++

#397

itbdw/ip-database

An offline IP database for developers to look up IP address geolocation information.

+97

+7.0%

1.5K

total stars

HTML

#398

eventql/eventql

Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.

+97

+9.0%

1.2K

total stars

C++

#399

Azure/AzurePublicDataset

Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.

+97

+9.8%

1.1K

total stars

Jupyter Notebook

#400

faroit/awesome-python-scientific-audio

Curated list of Python software and packages for scientific research in audio

+96

+6.1%

1.7K

total stars

1...79...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.