Trending Projects

Discover the fastest growing open source projects

Showing 601-650 of 897 trending projects

#601
axiomhq/hyperloglog

HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.

+40
+4.0%
1.0K
total stars
#602
Automattic/mongoose

Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.

+39
+0.1%
27.5K
total stars
#603
apache/pinot

Apache Pinot is a realtime distributed OLAP datastore for fast querying of large datasets.

+39
+0.7%
6.0K
total stars
#604
benbjohnson/thesecretlivesofdata

A JavaScript library for visualizing and understanding complex data structures.

+39
+1.1%
3.6K
total stars
#605
sfirke/janitor

A collection of simple tools for data cleaning and wrangling in R for data science tasks.

+39
+2.8%
1.4K
total stars
#606
JuliaStats/Distributions.jl

A comprehensive Julia library for probability distributions and related statistical functions.

+39
+3.4%
1.2K
total stars
#607
lvgalvao/data-engineering-roadmap

Comprehensive roadmap for data engineering and AI development in Python

+39
+3.6%
1.1K
total stars
#608
oetiker/rrdtool-1.x

RRDtool is a time-series database system for efficiently storing and graphing data.

+39
+3.7%
1.1K
total stars
#609
briatte/awesome-network-analysis

A curated list of awesome resources for network analysis and visualization, with a focus on R tools.

+38
+1.0%
4.0K
total stars
#610
erthink/libmdbx

High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.

+38
+2.9%
1.4K
total stars
#611
x2bool/xlite

A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.

+38
+3.0%
1.3K
total stars
#612
supermarin/ObjectiveRecord

ActiveRecord-like API for CoreData, a powerful object-relational mapping (ORM) for iOS development.

+38
+3.0%
1.3K
total stars
#613
gaarason/database-all

Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x

+38
+3.6%
1.1K
total stars
#614
JoshClose/CsvHelper

A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.

+37
+0.7%
5.2K
total stars
#615
meta-pytorch/data

A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.

+37
+3.0%
1.2K
total stars
#616
RUCAIBox/RecSysDatasets

A repository of public data sources for building and testing recommender systems.

+37
+3.3%
1.2K
total stars
#617
orbitdb/orbitdb

OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.

+36
+0.4%
8.7K
total stars
#618
bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

+36
+1.9%
2.0K
total stars
#619
edyoda/data-science-complete-tutorial

This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.

+36
+2.0%
1.8K
total stars
#620
orbitinghail/graft

Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.

+36
+2.6%
1.4K
total stars
#621
event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+36
+2.7%
1.4K
total stars
#622
infostreams/db

A command-line tool for version controlling database snapshots, allowing developers to save, restore, and archive database state.

+36
+2.9%
1.3K
total stars
#623
jtv/libpqxx

The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.

+36
+2.9%
1.3K
total stars
#624
apache/hive

Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.

+35
+0.6%
6.0K
total stars
#625
owid/covid-19-data

COVID-19 data repository for developers, providing daily updated case, death, and testing information.

+35
+0.6%
5.7K
total stars
#626
xerial/sqlite-jdbc

SQLite JDBC Driver - a Java library for accessing SQLite databases

+35
+1.1%
3.2K
total stars
#627
dolthub/go-mysql-server

A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.

+35
+1.4%
2.6K
total stars
#628
XTXMarkets/ternfs

An exabyte-scale, multi-region distributed file system for developers building AI-powered applications.

+35
+2.8%
1.3K
total stars
#629
duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+35
+2.9%
1.2K
total stars
#630
youngwookim/awesome-hadoop

A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.

+35
+3.2%
1.1K
total stars
#631
tonsky/datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS

+34
+0.6%
5.7K
total stars
#632
ydb-platform/ydb

An open-source distributed SQL database with high availability, scalability, and ACID transactions.

+34
+0.7%
4.7K
total stars
#633
apache/avro

Apache Avro is a data serialization system for efficient storage and transmission of structured data.

+34
+1.1%
3.2K
total stars
#634
igrigorik/gharchive.org

An open-source project that captures the public GitHub timeline and makes it accessible for analysis.

+34
+1.2%
3.0K
total stars
#635
koaning/drawdata

A Python library that allows developers to easily draw datasets within their notebooks.

+34
+2.1%
1.6K
total stars
#636
projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+34
+2.4%
1.4K
total stars
#637
submato/xhscrawl

A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.

+34
+2.8%
1.3K
total stars
#638
scratchdata/scratchdata

A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.

+34
+3.1%
1.1K
total stars
#639
moby/datakit

Connect processes into powerful data pipelines with a simple git-like filesystem interface

+34
+3.2%
1.1K
total stars
#640
crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

+34
+3.3%
1.1K
total stars
#641
cube2222/octosql

OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.

+33
+0.6%
5.2K
total stars
#642
paradigmxyz/cryo

cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.

+33
+2.2%
1.5K
total stars
#643
hermitdave/FrequencyWords

A frequency word list generator and processed files for text analysis and natural language processing.

+33
+2.3%
1.5K
total stars
#644
PumpkinDB/PumpkinDB

PumpkinDB is an immutable, ordered key-value database engine written in Rust.

+33
+2.4%
1.4K
total stars
#645
has2k1/plotnine

A grammar of graphics library for creating highly customizable and publication-quality plots in Python.

+32
+0.7%
4.5K
total stars
#646
camelot-dev/camelot

A Python library for extracting tabular data from PDF files, useful for data processing and analysis.

+32
+0.9%
3.6K
total stars
#647
ekzhu/datasketch

A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.

+32
+1.1%
2.9K
total stars
#648
AlexTheAnalyst/PortfolioProjects

This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.

+32
+2.3%
1.4K
total stars
#649
bashtage/linearmodels

This Python library provides additional linear models for statistical modeling and analysis.

+32
+3.2%
1.0K
total stars
#650
AlaSQL/alasql

AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.

+31
+0.4%
7.3K
total stars
1...1214...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.