Trending Projects

Discover the fastest growing open source projects

Showing 401-450 of 897 trending projects

#401

erikgrinaker/toydb

An educational distributed SQL database written in Rust, not focused on AI coding tools.

+95

+1.3%

7.2K

total stars

Rust

#402

deanmalmgren/textract

A Python library that provides a simple and unified interface for extracting text from any document format.

+95

+2.2%

4.5K

total stars

HTML

#403

apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

+95

+5.8%

1.7K

total stars

Rust

#404

felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+95

+7.0%

1.4K

total stars

C++

#405

hazelcast/hazelcast

Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.

+94

+1.4%

6.6K

total stars

Java

#406

sryza/spark-timeseries

A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.

+94

+8.5%

1.2K

total stars

Scala

#407

traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

+94

+9.4%

1.1K

total stars

#408

armink/FlashDB

An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.

+93

+4.0%

2.4K

total stars

#409

soedinglab/MMseqs2

MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.

+93

+4.9%

2.0K

total stars

#410

rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+93

+8.0%

1.3K

total stars

Jupyter Notebook

#411

alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+92

+4.9%

2.0K

total stars

Jupyter Notebook

#412

Factual/drake

A data workflow tool for data engineers and analysts, similar to 'Make for data'.

+92

+6.6%

1.5K

total stars

Clojure

#413

matplotlib/mplfinance

A Python library for financial data visualization using Matplotlib, focused on candlestick and OHLC charts.

+91

+2.2%

4.3K

total stars

Python

#414

gedeck/practical-statistics-for-data-scientists

This is a code repository for a book on practical statistics for data scientists, not a developer discovery platform.

+91

+2.9%

3.2K

total stars

Jupyter Notebook

#415

openmaptiles/openmaptiles

OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.

+91

+3.1%

3.0K

total stars

PLpgSQL

#416

posit-dev/great-tables

A Python library for creating easy-to-use, visually appealing data tables and summaries.

+91

+3.6%

2.6K

total stars

Python

#417

Yimeng-Zhang/feature-engineering-and-feature-selection

A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.

+91

+5.9%

1.6K

total stars

Jupyter Notebook

#418

aws-samples/aws-glue-samples

AWS Glue code samples for building data integration and ETL pipelines on AWS.

+91

+6.3%

1.5K

total stars

Python

#419

dicedb/dicedb

DiceDB is an open-source, fast, reactive, in-memory database optimized for modern hardware.

+90

+0.8%

10.7K

total stars

#420

wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

+90

+0.9%

10.4K

total stars

#421

ron-rs/ron

A Rust library for serializing and deserializing data in the Rusty Object Notation (RON) format.

+90

+2.4%

3.9K

total stars

Rust

#422

seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

+90

+2.5%

3.7K

total stars

#423

mono/taglib-sharp

A C# library for reading and writing metadata in media files, useful for audio and video processing applications.

+90

+6.7%

1.4K

total stars

#424

apache/druid

Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.

+89

+0.6%

14.0K

total stars

Java

#425

jvns/pandas-cookbook

Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.

+89

+1.3%

7.0K

total stars

Jupyter Notebook

#426

apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

+89

+1.4%

6.4K

total stars

Java

#427

Visualize-ML/Book6_First-Course-in-Data-Science

A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.

+89

+3.5%

2.6K

total stars

Jupyter Notebook

#428

malloydata/malloy

Malloy is an open-source language for describing data relationships and transformations.

+89

+3.8%

2.4K

total stars

TypeScript

#429

gee-community/geemap

A Python package for interactive geospatial analysis and visualization with Google Earth Engine.

+88

+2.3%

3.9K

total stars

Python

#430

mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+88

+8.7%

1.1K

total stars

Rust

#431

scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+88

+9.4%

1.0K

total stars

#432

camelot-dev/camelot

A Python library for extracting tabular data from PDF files, useful for data processing and analysis.

+87

+2.5%

3.6K

total stars

Python

#433

JifuZhao/DS-Take-Home

A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.

+87

+5.3%

1.7K

total stars

Jupyter Notebook

#434

has2k1/plotnine

A grammar of graphics library for creating highly customizable and publication-quality plots in Python.

+86

+1.9%

4.5K

total stars

Python

#435

distributedio/titan

A distributed, Redis-compatible NoSQL database that provides high performance and scalability.

+85

+6.4%

1.4K

total stars

#436

apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

+85

+7.8%

1.2K

total stars

Java

#437

event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+83

+6.5%

1.4K

total stars

TypeScript

#438

GoogleCloudPlatform/bigquery-utils

Useful scripts, UDFs, views, and other utilities for migration and data warehouse operations in BigQuery.

+82

+6.8%

1.3K

total stars

Jupyter Notebook

#439

first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

+81

+1.9%

4.3K

total stars

#440

zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

+81

+4.4%

1.9K

total stars

Python

#441

axiomhq/hyperloglog

HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.

+81

+8.5%

1.0K

total stars

#442

stephencelis/SQLite.swift

A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.

+80

+0.8%

10.1K

total stars

Swift

#443

orbitdb/orbitdb

OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.

+80

+0.9%

8.7K

total stars

JavaScript

#444

xiangyuecn/AreaCity-JsSpider-StatsGov

Comprehensive collection of city and administrative region data for China, with features like CSV export, JS code generation, and web scraping.

+80

+1.3%

6.4K

total stars

JavaScript

#445

ujjwalkarn/DataSciencePython

A Python library for common data analysis and machine learning tasks

+80

+1.4%

5.7K

total stars

Python

#446

neilotoole/sq

sq is a Go-based data wrangling tool that supports a variety of data formats and databases.

+80

+3.4%

2.5K

total stars

#447

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

+80

+4.4%

1.9K

total stars

CSS

#448

CliMA/Oceananigans.jl

A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.

+80

+6.7%

1.3K

total stars

Julia

#449

qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

+79

+1.3%

6.4K

total stars

#450

apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

+79

+1.4%

5.6K

total stars

Java

1...810...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.