Trending Projects

Discover the fastest growing open source projects

Showing 401-450 of 897 trending projects

#401
first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

+199
+4.8%
4.3K
total stars
#402
briatte/awesome-network-analysis

A curated list of awesome resources for network analysis and visualization, with a focus on R tools.

+199
+5.3%
4.0K
total stars
#403
bashtage/arch

A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.

+199
+15.4%
1.5K
total stars
#404
Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

+198
+5.6%
3.7K
total stars
#405
DQinYuan/chinese_province_city_area_mapper

A Python module for extracting and mapping Chinese province, city, and district data.

+198
+12.5%
1.8K
total stars
#406
rethinkdb/rethinkdb

Realtime NoSQL database for web apps

+195
+0.7%
27.0K
total stars
#407
alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+195
+10.9%
2.0K
total stars
#408
zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

+195
+11.2%
1.9K
total stars
#409
ideawu/ssdb

SSDB is a fast NoSQL database, an alternative to Redis, with support for leveldb and rocksdb backends.

+194
+2.3%
8.5K
total stars
#410
colour-science/colour

A comprehensive Python library for color science and color space conversions.

+194
+8.3%
2.5K
total stars
#411
google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

+194
+8.9%
2.4K
total stars
#412
PostgresApp/PostgresApp

An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.

+193
+2.6%
7.7K
total stars
#413
bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

+191
+10.8%
2.0K
total stars
#414
cyang-kth/fmm

An open-source C++ framework for fast and parallel map matching of GPS trajectories.

+191
+22.9%
1.0K
total stars
#415
ravendb/ravendb

A highly scalable, distributed, document-oriented NoSQL database with full-text search, spatial, and time-series support.

+189
+5.1%
3.9K
total stars
#416
pydata/numexpr

A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.

+189
+8.5%
2.4K
total stars
#417
CliMA/Oceananigans.jl

A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.

+189
+17.3%
1.3K
total stars
#418
mirage/irmin

Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.

+188
+10.8%
1.9K
total stars
#419
dbt-labs/metricflow

MetricFlow allows developers to define, build, and maintain metrics in code for business intelligence and analytics.

+187
+14.2%
1.5K
total stars
#420
amphi-ai/amphi-etl

A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.

+185
+15.9%
1.4K
total stars
#421
JifuZhao/DS-Take-Home

A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.

+183
+11.9%
1.7K
total stars
#422
aws-samples/aws-glue-samples

AWS Glue code samples for building data integration and ETL pipelines on AWS.

+182
+13.4%
1.5K
total stars
#423
scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+182
+21.7%
1.0K
total stars
#424
lk-geimfari/mimesis

Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.

+179
+3.9%
4.8K
total stars
#425
Yimeng-Zhang/feature-engineering-and-feature-selection

A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.

+178
+12.2%
1.6K
total stars
#426
ClickHouse/clickhouse-go

A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.

+177
+5.8%
3.3K
total stars
#427
xtensor-stack/xtensor

A C++ library for multidimensional array operations with broadcasting and lazy computing.

+176
+5.0%
3.7K
total stars
#428
GanjinZero/awesome_Chinese_medical_NLP

A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.

+176
+7.4%
2.5K
total stars
#429
projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+176
+14.1%
1.4K
total stars
#430
dbt-labs/dbt-utils

Utility functions for dbt projects, a popular data transformation tool for data engineers.

+175
+11.4%
1.7K
total stars
#431
tensorchord/pgvecto.rs

Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.

+173
+8.7%
2.2K
total stars
#432
rxin/db-readings

This is a collection of readings and resources related to databases, not a vibe coder platform.

+171
+2.2%
8.0K
total stars
#433
apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

+171
+3.2%
5.6K
total stars
#434
jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

+171
+11.5%
1.7K
total stars
#435
MakieOrg/Makie.jl

A powerful data visualization and plotting library for the Julia programming language.

+170
+6.6%
2.7K
total stars
#436
uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+169
+15.6%
1.3K
total stars
#437
duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+169
+15.8%
1.2K
total stars
#438
orium/rpds

A Rust library that provides persistent data structures for efficient and immutable data management.

+168
+11.2%
1.7K
total stars
#439
jldbc/pybaseball

A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.

+168
+11.7%
1.6K
total stars
#440
cbailes/awesome-deep-trading

A curated list of resources for machine learning-based algorithmic trading and quantitative finance.

+165
+9.8%
1.8K
total stars
#441
materialsproject/pymatgen

A robust Python library for materials analysis and computational materials science.

+165
+10.0%
1.8K
total stars
#442
apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

+165
+16.3%
1.2K
total stars
#443
igraph/igraph

A powerful C library for analyzing complex networks and graph-based data structures.

+163
+9.1%
1.9K
total stars
#444
rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+163
+14.9%
1.3K
total stars
#445
apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

+161
+5.8%
2.9K
total stars
#446
duneanalytics/spellbook

A Python library providing SQL views for Dune Analytics, a popular blockchain data analysis platform.

+161
+12.4%
1.5K
total stars
#447
hail-is/hail

Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.

+161
+18.1%
1.1K
total stars
#448
LongOnly/Quantitative-Notebooks

Educational notebooks on quantitative finance, algorithmic trading, financial modeling, and investment strategy.

+160
+14.0%
1.3K
total stars
#449
manami-project/anime-offline-database

This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.

+160
+14.9%
1.2K
total stars
#450
Alluxio/alluxio

Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.

+157
+2.2%
7.2K
total stars
1...810...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.