Trending Projects

Discover the fastest growing open source projects

Showing 301-350 of 897 trending projects

#301

GreenmaskIO/greenmask

A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.

+122

+8.2%

1.6K

total stars

#302

mootdx/mootdx

A Python library for conveniently reading data from the Tongdaxin financial data platform.

+122

+9.7%

1.4K

total stars

Python

#303

databricks/LearningSparkV2

This is a book that teaches how to use Apache Spark for lightning-fast data analytics.

+122

+9.7%

1.4K

total stars

Scala

#304

juliasilge/tidytext

A library for text mining and natural language processing using tidy data principles in R.

+121

+11.2%

1.2K

total stars

#305

bububa/MongoHub-Mac

MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.

+121

+11.4%

1.2K

total stars

Objective-C

#306

zonination/investing

This R library provides historical investment returns analysis for the overall stock market.

+119

+7.3%

1.7K

total stars

#307

mysql/mysql-connector-j

MySQL Connector/J is a JDBC driver that enables Java applications to connect to MySQL databases.

+119

+13.3%

1.0K

total stars

Java

#308

vesoft-inc/nebula

Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.

+118

+1.0%

12.1K

total stars

C++

#309

hannorein/rebound

An open-source N-body simulation library for astrophysics and planetary science.

+118

+12.8%

1.0K

total stars

#310

Rockyzsu/stock

A Python library for quantitative trading and stock analysis.

+117

+1.7%

7.2K

total stars

Python

#311

Tessil/robin-map

A fast and efficient C++ hash map and hash set implementation using robin hood hashing.

+117

+8.8%

1.4K

total stars

C++

#312

iskandr/fancyimpute

A Python library providing multivariate imputation and matrix completion algorithms.

+117

+10.1%

1.3K

total stars

Python

#313

machow/siuba

Python library for using dplyr-like syntax with pandas and SQL databases

+117

+11.0%

1.2K

total stars

Python

#314

8080labs/ppscore

A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.

+117

+11.1%

1.2K

total stars

Python

#315

ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets and tutorials for data science and data analysis in Python.

+117

+11.2%

1.2K

total stars

Jupyter Notebook

#316

devrimgunduz/pagila

A PostgreSQL sample database for testing and learning SQL queries.

+117

+12.8%

1.0K

total stars

PLpgSQL

#317

jeremyevans/sequel

Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.

+116

+2.3%

5.1K

total stars

Ruby

#318

isar/isar

Extremely fast, easy to use, and fully async NoSQL database for Flutter apps

+116

+3.0%

4.0K

total stars

Dart

#319

matplotlib/AnatomyOfMatplotlib

Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.

+116

+10.4%

1.2K

total stars

Jupyter Notebook

#320

PeerDB-io/peerdb

Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage

+115

+4.0%

3.0K

total stars

#321

NateScarlet/holiday-cn

A Python tool for automatically scraping data on China's statutory holidays from government announcements.

+115

+6.7%

1.8K

total stars

Python

#322

databendlabs/databend

Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.

+114

+1.3%

9.2K

total stars

Rust

#323

PyPortfolio/PyPortfolioOpt

A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.

+114

+2.1%

5.5K

total stars

Jupyter Notebook

#324

neozhaoliang/pywonderland

A Python library that provides a tour of the wonderland of math with visualizations and algorithms.

+114

+2.8%

4.2K

total stars

Python

#325

EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

+113

+8.1%

1.5K

total stars

HTML

#326

statsmodels/statsmodels

Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.

+112

+1.0%

11.3K

total stars

Python

#327

Image-Py/imagepy

A Python-based image processing framework with plugins for common image processing libraries.

+112

+9.0%

1.4K

total stars

Python

#328

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

+111

+2.2%

5.2K

total stars

#329

nalgeon/redka

A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.

+111

+2.5%

4.5K

total stars

#330

moj-analytical-services/splink

Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.

+111

+5.9%

2.0K

total stars

Python

#331

li6185377/LKDBHelper-SQLite-ORM

An automatic database ORM library for Objective-C that provides thread-safe and deadlock-free database operations.

+111

+10.1%

1.2K

total stars

Objective-C

#332

GeospatialPython/pyshp

A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.

+111

+10.7%

1.1K

total stars

Python

#333

capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

+110

+7.7%

1.5K

total stars

Python

#334

lukasmartinelli/pgfutter

A tool to easily import CSV and JSON data into PostgreSQL databases.

+110

+8.9%

1.3K

total stars

#335

ycjuan/kaggle-2014-criteo

This is a C++ repository for a Kaggle competition in 2014, not a developer discovery platform.

+110

+9.6%

1.3K

total stars

C++

#336

brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

+110

+11.2%

1.1K

total stars

Python

#337

LAStools/LAStools

This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.

+110

+11.8%

1.0K

total stars

C++

#338

SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

+109

+0.3%

36.2K

total stars

#339

elastic/kibana

Kibana is an open-source data visualization and management tool for Elasticsearch

+109

+0.5%

21.0K

total stars

TypeScript

#340

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

+109

+1.3%

8.6K

total stars

Scala

#341

Softmotions/ejdb

EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.

+109

+8.0%

1.5K

total stars

#342

calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

+109

+9.9%

1.2K

total stars

Shell

#343

crazyhottommy/getting-started-with-genomics-tools-and-resources

A collection of Unix, R, and Python tools for bioinformatics and data science projects.

+108

+8.5%

1.4K

total stars

Shell

#344

easystats/easystats

An R project focused on providing high-performance statistical models, data analysis, and visualization tools.

+108

+10.4%

1.1K

total stars

#345

RoaringBitmap/CRoaring

Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.

+107

+6.4%

1.8K

total stars

#346

xitongsys/parquet-go

A pure Go library for reading and writing Parquet files, a columnar data format.

+107

+8.2%

1.4K

total stars

#347

joaoh82/rust_sqlite

A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.

+107

+11.0%

1.1K

total stars

Rust

#348

andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

+106

+0.7%

15.0K

total stars

Python

#349

bruin-data/bruin

A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.

+106

+8.0%

1.4K

total stars

#350

neumino/thinky

An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.

+106

+10.5%

1.1K

total stars

JavaScript

1...68...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.