Trending Projects

Discover the fastest growing open source projects

Showing 351-400 of 897 trending projects

#351

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

+248

+7.2%

3.7K

total stars

Java

#352

antonycourtney/tad

A desktop application for viewing and analyzing tabular data, with support for CSV, Parquet, and DuckDB.

+247

+7.8%

3.4K

total stars

TypeScript

#353

datastacktv/data-engineer-roadmap

This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.

+242

+1.9%

12.7K

total stars

#354

scrollmapper/bible_databases

This GitHub repository provides a collection of Bible versions and cross-reference databases, but it does not appear to be related to the given developer discovery platform focused on vibe coders.

+238

+18.8%

1.5K

total stars

Python

#355

jvns/pandas-cookbook

Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.

+237

+3.5%

7.0K

total stars

Jupyter Notebook

#356

ujjwalkarn/DataSciencePython

A Python library for common data analysis and machine learning tasks

+235

+4.3%

5.7K

total stars

Python

#357

openmaptiles/openmaptiles

OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.

+235

+8.4%

3.0K

total stars

PLpgSQL

#358

nullptrlabs/pgmodeler

An open-source data modeling tool designed for PostgreSQL, allowing developers to generate DDL commands visually.

+234

+7.1%

3.5K

total stars

C++

#359

apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

+233

+11.5%

2.3K

total stars

Thrift

#360

alibaba/druid

Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.

+232

+0.8%

28.2K

total stars

Java

#361

dgraph-io/badger

Fast, embeddable key-value database written in Go for building high-performance storage applications.

+231

+1.5%

15.5K

total stars

#362

h5py/h5py

A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.

+231

+11.7%

2.2K

total stars

Python

#363

datawhalechina/competition-baseline

A collection of code examples and baselines for common data science and machine learning competitions.

+230

+5.1%

4.7K

total stars

Jupyter Notebook

#364

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

+229

+13.7%

1.9K

total stars

CSS

#365

google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

+228

+28.0%

1.0K

total stars

TeX

#366

nalgeon/sqlean

The ultimate set of SQLite extensions for developers building applications with SQLite databases.

+227

+5.6%

4.3K

total stars

#367

JasonKessler/scattertext

A Python library for creating beautiful visualizations of language differences across document types.

+227

+10.8%

2.3K

total stars

Python

#368

apache/hive

Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.

+225

+3.9%

6.0K

total stars

Java

#369

huandu/go-sqlbuilder

A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.

+225

+15.6%

1.7K

total stars

#370

galaxyproject/galaxy

An open-source, community-driven platform for data-intensive scientific analysis and visualization.

+222

+14.6%

1.7K

total stars

Python

#371

crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

+222

+26.1%

1.1K

total stars

Python

#372

pgvector/pgvector-python

A Python library that provides support for the pgvector vector database, enabling efficient vector search and storage.

+221

+18.0%

1.4K

total stars

Python

#373

DotNetNext/SqlSugar

A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.

+218

+3.9%

5.8K

total stars

#374

seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

+218

+6.3%

3.7K

total stars

#375

hosseinmoein/DataFrame

C++ DataFrame library for statistical, financial, and machine learning analysis.

+218

+8.1%

2.9K

total stars

C++

#376

gee-community/geemap

A Python package for interactive geospatial analysis and visualization with Google Earth Engine.

+217

+5.9%

3.9K

total stars

Python

#377

jupyter/docker-stacks

Docker images containing Jupyter applications for data science and machine learning workflows.

+216

+2.6%

8.4K

total stars

Python

#378

VictoriaMetrics/fastcache

Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.

+216

+10.2%

2.3K

total stars

#379

qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

+215

+3.5%

6.4K

total stars

#380

mdeff/fma

A dataset for music analysis and research, with support for deep learning and reproducible research.

+214

+9.1%

2.6K

total stars

Jupyter Notebook

#381

malloydata/malloy

Malloy is an open-source language for describing data relationships and transformations.

+214

+9.8%

2.4K

total stars

TypeScript

#382

mootdx/mootdx

A Python library for conveniently reading data from the Tongdaxin financial data platform.

+214

+18.3%

1.4K

total stars

Python

#383

hazelcast/hazelcast

Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.

+213

+3.3%

6.6K

total stars

Java

#384

sqlkata/querybuilder

SQL query builder for C# developers, supporting multiple databases and complex queries.

+213

+6.8%

3.3K

total stars

#385

JoshClose/CsvHelper

A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.

+212

+4.2%

5.2K

total stars

#386

orbitdb/orbitdb

OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.

+211

+2.5%

8.7K

total stars

JavaScript

#387

stephencelis/SQLite.swift

A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.

+210

+2.1%

10.1K

total stars

Swift

#388

xerial/sqlite-jdbc

SQLite JDBC Driver - a Java library for accessing SQLite databases

+210

+7.0%

3.2K

total stars

Java

#389

faroit/awesome-python-scientific-audio

Curated list of Python software and packages for scientific research in audio

+209

+14.2%

1.7K

total stars

#390

apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

+208

+13.8%

1.7K

total stars

Rust

#391

ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets and tutorials for data science and data analysis in Python.

+208

+21.9%

1.2K

total stars

Jupyter Notebook

#392

apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

+207

+11.7%

2.0K

total stars

Rust

#393

felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+207

+16.7%

1.4K

total stars

C++

#394

Azure/AzurePublicDataset

Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.

+205

+23.3%

1.1K

total stars

Jupyter Notebook

#395

canonical/dqlite

An embeddable, replicated, and fault-tolerant SQL engine for building robust and scalable applications.

+204

+5.0%

4.3K

total stars

#396

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.

+203

+13.8%

1.7K

total stars

Jupyter Notebook

#397

Hiflylabs/awesome-dbt

A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.

+202

+14.0%

1.6K

total stars

#398

apache/cloudberry

Open-source massively parallel processing (MPP) database, an alternative to Greenplum.

+202

+20.4%

1.2K

total stars

#399

isar/isar

Extremely fast, easy to use, and fully async NoSQL database for Flutter apps

+200

+5.3%

4.0K

total stars

Dart

#400

rogersce/cnpy

A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.

+200

+15.8%

1.5K

total stars

C++

1...79...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.