Trending Projects

Discover the fastest growing open source projects

Showing 501-550 of 897 trending projects

#501
materialsproject/pymatgen

A robust Python library for materials analysis and computational materials science.

+66
+3.8%
1.8K
total stars
#502
projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+66
+4.8%
1.4K
total stars
#503
erthink/libmdbx

High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.

+66
+5.1%
1.4K
total stars
#504
koaning/drawdata

A Python library that allows developers to easily draw datasets within their notebooks.

+65
+4.1%
1.6K
total stars
#505
RUCAIBox/RecSysDatasets

A repository of public data sources for building and testing recommender systems.

+65
+5.9%
1.2K
total stars
#506
xerial/sqlite-jdbc

SQLite JDBC Driver - a Java library for accessing SQLite databases

+64
+2.0%
3.2K
total stars
#507
mozilla/mentat

A persistent, relational store inspired by Datomic and DataScript, written in Rust.

+64
+4.0%
1.7K
total stars
#508
datastacktv/data-engineer-roadmap

This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.

+63
+0.5%
12.7K
total stars
#509
jupyter/docker-stacks

Docker images containing Jupyter applications for data science and machine learning workflows.

+63
+0.8%
8.4K
total stars
#510
manami-project/anime-offline-database

This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.

+63
+5.4%
1.2K
total stars
#511
robjhyndman/forecast

A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.

+63
+5.7%
1.2K
total stars
#512
samapriya/awesome-gee-community-datasets

A community-driven catalog of geospatial datasets for use with Google Earth Engine.

+63
+6.1%
1.1K
total stars
#513
topling/toplingdb

ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.

+63
+6.6%
1.0K
total stars
#514
apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

+61
+2.1%
2.9K
total stars
#515
dolthub/go-mysql-server

A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.

+61
+2.4%
2.6K
total stars
#516
orium/rpds

A Rust library that provides persistent data structures for efficient and immutable data management.

+61
+3.8%
1.7K
total stars
#517
amundsen-io/amundsen

Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.

+60
+1.3%
4.7K
total stars
#518
duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+60
+5.1%
1.2K
total stars
#519
ekzhu/datasketch

A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.

+59
+2.1%
2.9K
total stars
#520
RoaringBitmap/CRoaring

Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.

+59
+3.4%
1.8K
total stars
#521
JoinQuant/jqdatasdk

A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.

+59
+5.0%
1.2K
total stars
#522
egbertbouman/youtube-comment-downloader

Simple script for downloading YouTube comments without using the YouTube API.

+59
+5.2%
1.2K
total stars
#523
big-data-europe/docker-hive

This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.

+59
+5.8%
1.1K
total stars
#524
SciRuby/sciruby

SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.

+58
+6.1%
1.0K
total stars
#525
Hiflylabs/awesome-dbt

A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.

+57
+3.6%
1.6K
total stars
#526
gobuffalo/pop

A Go ORM and query builder for interacting with databases in Go applications.

+57
+4.0%
1.5K
total stars
#527
hermitdave/FrequencyWords

A frequency word list generator and processed files for text analysis and natural language processing.

+57
+4.1%
1.5K
total stars
#528
uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+57
+4.8%
1.3K
total stars
#529
RoaringBitmap/RoaringBitmap

A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.

+56
+1.5%
3.8K
total stars
#530
igrigorik/gharchive.org

An open-source project that captures the public GitHub timeline and makes it accessible for analysis.

+56
+1.9%
3.0K
total stars
#531
quantopian/empyrical

A Python library that provides common financial risk and performance metrics used in financial analysis.

+56
+4.0%
1.5K
total stars
#532
sentinelsat/sentinelsat

A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.

+56
+5.9%
1.0K
total stars
#533
cube2222/octosql

OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.

+55
+1.1%
5.2K
total stars
#534
jrfiedler/causal_inference_python_code

Python code for causal inference, a book by Miguel Hernán and James Robins.

+55
+4.3%
1.3K
total stars
#535
marcboeker/gmail-to-sqlite

Index your Gmail account to a SQLite DB and perform custom data analysis on your email.

+55
+4.7%
1.2K
total stars
#536
modin-project/modin

Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.

+54
+0.5%
10.4K
total stars
#537
PostgresApp/PostgresApp

An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.

+54
+0.7%
7.7K
total stars
#538
tonsky/datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS

+54
+0.9%
5.7K
total stars
#539
google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

+54
+2.3%
2.4K
total stars
#540
AlaSQL/alasql

AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.

+53
+0.7%
7.3K
total stars
#541
isar/isar

Extremely fast, easy to use, and fully async NoSQL database for Flutter apps

+53
+1.3%
4.0K
total stars
#542
infostreams/db

A command-line tool for version controlling database snapshots, allowing developers to save, restore, and archive database state.

+53
+4.3%
1.3K
total stars
#543
tidyverse/dplyr

dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.

+52
+1.1%
5.0K
total stars
#544
wainshine/Chinese-Names-Corpus

A Chinese name corpus and generator for natural language processing and entity recognition.

+52
+1.2%
4.3K
total stars
#545
Wisser/Jailer

A Java-based database subsetting and relational data browsing tool for popular databases.

+52
+1.7%
3.1K
total stars
#546
huggingface/datatrove

A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.

+52
+1.8%
2.9K
total stars
#547
GanjinZero/awesome_Chinese_medical_NLP

A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.

+52
+2.1%
2.5K
total stars
#548
jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

+52
+3.2%
1.7K
total stars
#549
uhub/awesome-matlab

A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.

+52
+3.2%
1.7K
total stars
#550
pyjanitor-devs/pyjanitor

A Python library for cleaning and transforming data, inspired by the R package Janitor.

+52
+3.6%
1.5K
total stars
1...1012...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.