Trending Projects

Discover the fastest growing open source projects

Showing 301-350 of 897 trending projects

#301
GreenmaskIO/greenmask

A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.

+122
+8.2%
1.6K
total stars
#302
mootdx/mootdx

A Python library for conveniently reading data from the Tongdaxin financial data platform.

+122
+9.7%
1.4K
total stars
#303
databricks/LearningSparkV2

This is a book that teaches how to use Apache Spark for lightning-fast data analytics.

+122
+9.7%
1.4K
total stars
#304
juliasilge/tidytext

A library for text mining and natural language processing using tidy data principles in R.

+121
+11.2%
1.2K
total stars
#305
bububa/MongoHub-Mac

MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.

+121
+11.4%
1.2K
total stars
#306
zonination/investing

This R library provides historical investment returns analysis for the overall stock market.

+119
+7.3%
1.7K
total stars
#307
mysql/mysql-connector-j

MySQL Connector/J is a JDBC driver that enables Java applications to connect to MySQL databases.

+119
+13.3%
1.0K
total stars
#308
vesoft-inc/nebula

Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.

+118
+1.0%
12.1K
total stars
#309
hannorein/rebound

An open-source N-body simulation library for astrophysics and planetary science.

+118
+12.8%
1.0K
total stars
#310
Rockyzsu/stock

A Python library for quantitative trading and stock analysis.

+117
+1.7%
7.2K
total stars
#311
Tessil/robin-map

A fast and efficient C++ hash map and hash set implementation using robin hood hashing.

+117
+8.8%
1.4K
total stars
#312
iskandr/fancyimpute

A Python library providing multivariate imputation and matrix completion algorithms.

+117
+10.1%
1.3K
total stars
#313
machow/siuba

Python library for using dplyr-like syntax with pandas and SQL databases

+117
+11.0%
1.2K
total stars
#314
8080labs/ppscore

A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.

+117
+11.1%
1.2K
total stars
#315
ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets and tutorials for data science and data analysis in Python.

+117
+11.2%
1.2K
total stars
#316
devrimgunduz/pagila

A PostgreSQL sample database for testing and learning SQL queries.

+117
+12.8%
1.0K
total stars
#317
jeremyevans/sequel

Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.

+116
+2.3%
5.1K
total stars
#318
isar/isar

Extremely fast, easy to use, and fully async NoSQL database for Flutter apps

+116
+3.0%
4.0K
total stars
#319
matplotlib/AnatomyOfMatplotlib

Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.

+116
+10.4%
1.2K
total stars
#320
PeerDB-io/peerdb

Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage

+115
+4.0%
3.0K
total stars
#321
NateScarlet/holiday-cn

A Python tool for automatically scraping data on China's statutory holidays from government announcements.

+115
+6.7%
1.8K
total stars
#322
databendlabs/databend

Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.

+114
+1.3%
9.2K
total stars
#323
PyPortfolio/PyPortfolioOpt

A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.

+114
+2.1%
5.5K
total stars
#324
neozhaoliang/pywonderland

A Python library that provides a tour of the wonderland of math with visualizations and algorithms.

+114
+2.8%
4.2K
total stars
#325
EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

+113
+8.1%
1.5K
total stars
#326
statsmodels/statsmodels

Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.

+112
+1.0%
11.3K
total stars
#327
Image-Py/imagepy

A Python-based image processing framework with plugins for common image processing libraries.

+112
+9.0%
1.4K
total stars
#328
treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

+111
+2.2%
5.2K
total stars
#329
nalgeon/redka

A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.

+111
+2.5%
4.5K
total stars
#330
moj-analytical-services/splink

Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.

+111
+5.9%
2.0K
total stars
#331
li6185377/LKDBHelper-SQLite-ORM

An automatic database ORM library for Objective-C that provides thread-safe and deadlock-free database operations.

+111
+10.1%
1.2K
total stars
#332
GeospatialPython/pyshp

A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.

+111
+10.7%
1.1K
total stars
#333
capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

+110
+7.7%
1.5K
total stars
#334
lukasmartinelli/pgfutter

A tool to easily import CSV and JSON data into PostgreSQL databases.

+110
+8.9%
1.3K
total stars
#335
ycjuan/kaggle-2014-criteo

This is a C++ repository for a Kaggle competition in 2014, not a developer discovery platform.

+110
+9.6%
1.3K
total stars
#336
brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

+110
+11.2%
1.1K
total stars
#337
LAStools/LAStools

This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.

+110
+11.8%
1.0K
total stars
#338
SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

+109
+0.3%
36.2K
total stars
#339
elastic/kibana

Kibana is an open-source data visualization and management tool for Elasticsearch

+109
+0.5%
21.0K
total stars
#340
delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

+109
+1.3%
8.6K
total stars
#341
Softmotions/ejdb

EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.

+109
+8.0%
1.5K
total stars
#342
calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

+109
+9.9%
1.2K
total stars
#343
crazyhottommy/getting-started-with-genomics-tools-and-resources

A collection of Unix, R, and Python tools for bioinformatics and data science projects.

+108
+8.5%
1.4K
total stars
#344
easystats/easystats

An R project focused on providing high-performance statistical models, data analysis, and visualization tools.

+108
+10.4%
1.1K
total stars
#345
RoaringBitmap/CRoaring

Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.

+107
+6.4%
1.8K
total stars
#346
xitongsys/parquet-go

A pure Go library for reading and writing Parquet files, a columnar data format.

+107
+8.2%
1.4K
total stars
#347
joaoh82/rust_sqlite

A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.

+107
+11.0%
1.1K
total stars
#348
andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

+106
+0.7%
15.0K
total stars
#349
bruin-data/bruin

A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.

+106
+8.0%
1.4K
total stars
#350
neumino/thinky

An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.

+106
+10.5%
1.1K
total stars
1...68...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.