Trending Projects

Discover the fastest growing open source projects

Showing 151-200 of 897 trending projects

#151
soedinglab/MMseqs2

MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.

+4
+0.2%
2.0K
total stars
#152
zhu-xlab/GlobalBuildingAtlas

GlobalBuildingAtlas is an open global and complete dataset of building polygons, heights and LoD1 3D models.

+4
+0.2%
2.0K
total stars
#153
TuGraph-family/tugraph-db

TuGraph-DB is a high-performance graph database built for fast and efficient graph data processing.

+4
+0.2%
1.7K
total stars
#154
narwhals-dev/narwhals

Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.

+4
+0.3%
1.5K
total stars
#155
moshi4/pyCirclize

A Python library for creating circular data visualizations like Circos plots, chord diagrams, and radar charts.

+4
+0.4%
1.1K
total stars
#156
typeorm/typeorm

ORM for TypeScript and JavaScript with support for multiple databases and platforms.

+3
+0.0%
36.4K
total stars
#157
Automattic/mongoose

Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.

+3
+0.0%
27.5K
total stars
#158
valeriansaliou/sonic

Fast, lightweight search backend alternative to Elasticsearch

+3
+0.0%
21.2K
total stars
#159
apache/shardingsphere

Distributed SQL database middleware for sharding, scalability, and security

+3
+0.0%
20.7K
total stars
#160
rqlite/rqlite

A lightweight, fault-tolerant distributed database built on SQLite, designed for high availability.

+3
+0.0%
17.3K
total stars
#161
heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

+3
+0.0%
16.9K
total stars
#162
questdb/questdb

QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.

+3
+0.0%
16.7K
total stars
#163
argoproj/argo-workflows

Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.

+3
+0.0%
16.5K
total stars
#164
apache/doris

Apache Doris is a high-performance, unified analytics database for real-time data processing.

+3
+0.0%
15.1K
total stars
#165
dexie/Dexie.js

Dexie.js is a minimalistic IndexedDB wrapper that simplifies offline storage and database management in web applications.

+3
+0.0%
14.1K
total stars
#166
sql-js/sql.js

A JavaScript library that allows you to run SQLite on the web, enabling local database functionality for web apps.

+3
+0.0%
13.6K
total stars
#167
datahub-project/datahub

An open-source metadata platform for managing your data and AI stack across the enterprise.

+3
+0.0%
11.6K
total stars
#168
kedro-org/kedro

Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.

+3
+0.0%
10.8K
total stars
#169
drivendataorg/cookiecutter-data-science

A flexible and standardized cookiecutter template for doing and sharing data science work in Python.

+3
+0.0%
9.7K
total stars
#170
spacejam/sled

A high-performance, concurrent, embedded key-value database written in Rust for vibe coders.

+3
+0.0%
8.9K
total stars
#171
pawelsalawa/sqlitestudio

A free, open-source SQLite database manager for multiple platforms.

+3
+0.1%
6.4K
total stars
#172
apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

+3
+0.1%
6.4K
total stars
#173
OSGeo/gdal

GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.

+3
+0.1%
5.8K
total stars
#174
DotNetNext/SqlSugar

A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.

+3
+0.1%
5.8K
total stars
#175
alibaba/AliSQL

AliSQL is a MySQL branch originated from Alibaba Group, focused on high performance and scalability.

+3
+0.1%
5.8K
total stars
#176
youssefHosni/Data-Science-Interview-Questions-Answers

A curated list of data science interview questions and answers for developers.

+3
+0.1%
5.5K
total stars
#177
PyPortfolio/PyPortfolioOpt

A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.

+3
+0.1%
5.5K
total stars
#178
tidyverse/dplyr

dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.

+3
+0.1%
5.0K
total stars
#179
liam-hq/liam

Automatically generates beautiful and easy-to-read ER diagrams from your database.

+3
+0.1%
4.7K
total stars
#180
orioledb/orioledb

OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.

+3
+0.1%
4.0K
total stars
#181
briatte/awesome-network-analysis

A curated list of awesome resources for network analysis and visualization, with a focus on R tools.

+3
+0.1%
4.0K
total stars
#182
cozodb/cozo

A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.

+3
+0.1%
3.9K
total stars
#183
awslabs/deequ

Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.

+3
+0.1%
3.6K
total stars
#184
dathere/qsv

Blazing-fast data wrangling toolkit for AI and data engineering workflows

+3
+0.1%
3.5K
total stars
#185
nullptrlabs/pgmodeler

An open-source data modeling tool designed for PostgreSQL, allowing developers to generate DDL commands visually.

+3
+0.1%
3.5K
total stars
#186
apache/arrow-rs

Official Rust implementation of the Apache Arrow data format for efficient data processing and storage.

+3
+0.1%
3.4K
total stars
#187
pydata/pandas-datareader

A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.

+3
+0.1%
3.2K
total stars
#188
PeerDB-io/peerdb

Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage

+3
+0.1%
3.0K
total stars
#189
timescale/pgvectorscale

A Postgres extension for high-performance vector search, complementing pgvector for scale.

+3
+0.1%
2.9K
total stars
#190
garden-co/jazz

A distributed database with CRDT sync, offline support, and end-to-end encryption for vibe coders.

+3
+0.1%
2.5K
total stars
#191
neilotoole/sq

sq is a Go-based data wrangling tool that supports a variety of data formats and databases.

+3
+0.1%
2.5K
total stars
#192
lerocha/chinook-database

Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2

+3
+0.1%
2.5K
total stars
#193
apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

+3
+0.1%
2.4K
total stars
#194
apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

+3
+0.1%
2.3K
total stars
#195
duckdb/duckdb-wasm

WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.

+3
+0.2%
1.9K
total stars
#196
skfolio/skfolio

A Python library for portfolio optimization using scikit-learn and convex optimization techniques.

+3
+0.2%
1.9K
total stars
#197
NateScarlet/holiday-cn

A Python tool for automatically scraping data on China's statutory holidays from government announcements.

+3
+0.2%
1.8K
total stars
#198
materialsproject/pymatgen

A robust Python library for materials analysis and computational materials science.

+3
+0.2%
1.8K
total stars
#199
RoaringBitmap/CRoaring

Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.

+3
+0.2%
1.8K
total stars
#200
GreenmaskIO/greenmask

A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.

+3
+0.2%
1.6K
total stars
1...35...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.