Trending Projects

Discover the fastest growing open source projects

Showing 401-450 of 897 trending projects

#401

first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

+199

+4.8%

4.3K

total stars

#402

briatte/awesome-network-analysis

A curated list of awesome resources for network analysis and visualization, with a focus on R tools.

+199

+5.3%

4.0K

total stars

#403

bashtage/arch

A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.

+199

+15.4%

1.5K

total stars

Python

#404

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

+198

+5.6%

3.7K

total stars

Java

#405

DQinYuan/chinese_province_city_area_mapper

A Python module for extracting and mapping Chinese province, city, and district data.

+198

+12.5%

1.8K

total stars

Python

#406

rethinkdb/rethinkdb

Realtime NoSQL database for web apps

+195

+0.7%

27.0K

total stars

C++

#407

alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+195

+10.9%

2.0K

total stars

Jupyter Notebook

#408

zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

+195

+11.2%

1.9K

total stars

Python

#409

ideawu/ssdb

SSDB is a fast NoSQL database, an alternative to Redis, with support for leveldb and rocksdb backends.

+194

+2.3%

8.5K

total stars

C++

#410

colour-science/colour

A comprehensive Python library for color science and color space conversions.

+194

+8.3%

2.5K

total stars

Python

#411

google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

+194

+8.9%

2.4K

total stars

Python

#412

PostgresApp/PostgresApp

An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.

+193

+2.6%

7.7K

total stars

Makefile

#413

bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

+191

+10.8%

2.0K

total stars

Python

#414

cyang-kth/fmm

An open-source C++ framework for fast and parallel map matching of GPS trajectories.

+191

+22.9%

1.0K

total stars

C++

#415

ravendb/ravendb

A highly scalable, distributed, document-oriented NoSQL database with full-text search, spatial, and time-series support.

+189

+5.1%

3.9K

total stars

#416

pydata/numexpr

A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.

+189

+8.5%

2.4K

total stars

Python

#417

CliMA/Oceananigans.jl

A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.

+189

+17.3%

1.3K

total stars

Julia

#418

mirage/irmin

Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.

+188

+10.8%

1.9K

total stars

OCaml

#419

dbt-labs/metricflow

MetricFlow allows developers to define, build, and maintain metrics in code for business intelligence and analytics.

+187

+14.2%

1.5K

total stars

Python

#420

amphi-ai/amphi-etl

A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.

+185

+15.9%

1.4K

total stars

TypeScript

#421

JifuZhao/DS-Take-Home

A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.

+183

+11.9%

1.7K

total stars

Jupyter Notebook

#422

aws-samples/aws-glue-samples

AWS Glue code samples for building data integration and ETL pipelines on AWS.

+182

+13.4%

1.5K

total stars

Python

#423

scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+182

+21.7%

1.0K

total stars

#424

lk-geimfari/mimesis

Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.

+179

+3.9%

4.8K

total stars

Python

#425

Yimeng-Zhang/feature-engineering-and-feature-selection

A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.

+178

+12.2%

1.6K

total stars

Jupyter Notebook

#426

ClickHouse/clickhouse-go

A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.

+177

+5.8%

3.3K

total stars

#427

xtensor-stack/xtensor

A C++ library for multidimensional array operations with broadcasting and lazy computing.

+176

+5.0%

3.7K

total stars

C++

#428

GanjinZero/awesome_Chinese_medical_NLP

A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.

+176

+7.4%

2.5K

total stars

#429

projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+176

+14.1%

1.4K

total stars

Java

#430

dbt-labs/dbt-utils

Utility functions for dbt projects, a popular data transformation tool for data engineers.

+175

+11.4%

1.7K

total stars

Makefile

#431

tensorchord/pgvecto.rs

Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.

+173

+8.7%

2.2K

total stars

Rust

#432

rxin/db-readings

This is a collection of readings and resources related to databases, not a vibe coder platform.

+171

+2.2%

8.0K

total stars

#433

apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

+171

+3.2%

5.6K

total stars

Java

#434

jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

+171

+11.5%

1.7K

total stars

Jupyter Notebook

#435

MakieOrg/Makie.jl

A powerful data visualization and plotting library for the Julia programming language.

+170

+6.6%

2.7K

total stars

Julia

#436

uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+169

+15.6%

1.3K

total stars

TypeScript

#437

duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+169

+15.8%

1.2K

total stars

Python

#438

orium/rpds

A Rust library that provides persistent data structures for efficient and immutable data management.

+168

+11.2%

1.7K

total stars

Rust

#439

jldbc/pybaseball

A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.

+168

+11.7%

1.6K

total stars

Python

#440

cbailes/awesome-deep-trading

A curated list of resources for machine learning-based algorithmic trading and quantitative finance.

+165

+9.8%

1.8K

total stars

#441

materialsproject/pymatgen

A robust Python library for materials analysis and computational materials science.

+165

+10.0%

1.8K

total stars

Python

#442

apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

+165

+16.3%

1.2K

total stars

Java

#443

igraph/igraph

A powerful C library for analyzing complex networks and graph-based data structures.

+163

+9.1%

1.9K

total stars

#444

rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+163

+14.9%

1.3K

total stars

Jupyter Notebook

#445

apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

+161

+5.8%

2.9K

total stars

#446

duneanalytics/spellbook

A Python library providing SQL views for Dune Analytics, a popular blockchain data analysis platform.

+161

+12.4%

1.5K

total stars

Python

#447

hail-is/hail

Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.

+161

+18.1%

1.1K

total stars

Python

#448

LongOnly/Quantitative-Notebooks

Educational notebooks on quantitative finance, algorithmic trading, financial modeling, and investment strategy.

+160

+14.0%

1.3K

total stars

Jupyter Notebook

#449

manami-project/anime-offline-database

This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.

+160

+14.9%

1.2K

total stars

Makefile

#450

Alluxio/alluxio

Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.

+157

+2.2%

7.2K

total stars

Java

1...810...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.