Trending Projects

Discover the fastest growing open source projects

Showing 401-450 of 897 trending projects

#401
erikgrinaker/toydb

An educational distributed SQL database written in Rust, not focused on AI coding tools.

+95
+1.3%
7.2K
total stars
#402
deanmalmgren/textract

A Python library that provides a simple and unified interface for extracting text from any document format.

+95
+2.2%
4.5K
total stars
#403
apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

+95
+5.8%
1.7K
total stars
#404
felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+95
+7.0%
1.4K
total stars
#405
hazelcast/hazelcast

Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.

+94
+1.4%
6.6K
total stars
#406
sryza/spark-timeseries

A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.

+94
+8.5%
1.2K
total stars
#407
traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

+94
+9.4%
1.1K
total stars
#408
armink/FlashDB

An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.

+93
+4.0%
2.4K
total stars
#409
soedinglab/MMseqs2

MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.

+93
+4.9%
2.0K
total stars
#410
rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+93
+8.0%
1.3K
total stars
#411
alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+92
+4.9%
2.0K
total stars
#412
Factual/drake

A data workflow tool for data engineers and analysts, similar to 'Make for data'.

+92
+6.6%
1.5K
total stars
#413
matplotlib/mplfinance

A Python library for financial data visualization using Matplotlib, focused on candlestick and OHLC charts.

+91
+2.2%
4.3K
total stars
#414
gedeck/practical-statistics-for-data-scientists

This is a code repository for a book on practical statistics for data scientists, not a developer discovery platform.

+91
+2.9%
3.2K
total stars
#415
openmaptiles/openmaptiles

OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.

+91
+3.1%
3.0K
total stars
#416
posit-dev/great-tables

A Python library for creating easy-to-use, visually appealing data tables and summaries.

+91
+3.6%
2.6K
total stars
#417
Yimeng-Zhang/feature-engineering-and-feature-selection

A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.

+91
+5.9%
1.6K
total stars
#418
aws-samples/aws-glue-samples

AWS Glue code samples for building data integration and ETL pipelines on AWS.

+91
+6.3%
1.5K
total stars
#419
dicedb/dicedb

DiceDB is an open-source, fast, reactive, in-memory database optimized for modern hardware.

+90
+0.8%
10.7K
total stars
#420
wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

+90
+0.9%
10.4K
total stars
#421
ron-rs/ron

A Rust library for serializing and deserializing data in the Rusty Object Notation (RON) format.

+90
+2.4%
3.9K
total stars
#422
seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

+90
+2.5%
3.7K
total stars
#423
mono/taglib-sharp

A C# library for reading and writing metadata in media files, useful for audio and video processing applications.

+90
+6.7%
1.4K
total stars
#424
apache/druid

Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.

+89
+0.6%
14.0K
total stars
#425
jvns/pandas-cookbook

Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.

+89
+1.3%
7.0K
total stars
#426
apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

+89
+1.4%
6.4K
total stars
#427
Visualize-ML/Book6_First-Course-in-Data-Science

A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.

+89
+3.5%
2.6K
total stars
#428
malloydata/malloy

Malloy is an open-source language for describing data relationships and transformations.

+89
+3.8%
2.4K
total stars
#429
gee-community/geemap

A Python package for interactive geospatial analysis and visualization with Google Earth Engine.

+88
+2.3%
3.9K
total stars
#430
mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+88
+8.7%
1.1K
total stars
#431
scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+88
+9.4%
1.0K
total stars
#432
camelot-dev/camelot

A Python library for extracting tabular data from PDF files, useful for data processing and analysis.

+87
+2.5%
3.6K
total stars
#433
JifuZhao/DS-Take-Home

A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.

+87
+5.3%
1.7K
total stars
#434
has2k1/plotnine

A grammar of graphics library for creating highly customizable and publication-quality plots in Python.

+86
+1.9%
4.5K
total stars
#435
distributedio/titan

A distributed, Redis-compatible NoSQL database that provides high performance and scalability.

+85
+6.4%
1.4K
total stars
#436
apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

+85
+7.8%
1.2K
total stars
#437
event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+83
+6.5%
1.4K
total stars
#438
GoogleCloudPlatform/bigquery-utils

Useful scripts, UDFs, views, and other utilities for migration and data warehouse operations in BigQuery.

+82
+6.8%
1.3K
total stars
#439
first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

+81
+1.9%
4.3K
total stars
#440
zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

+81
+4.4%
1.9K
total stars
#441
axiomhq/hyperloglog

HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.

+81
+8.5%
1.0K
total stars
#442
stephencelis/SQLite.swift

A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.

+80
+0.8%
10.1K
total stars
#443
orbitdb/orbitdb

OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.

+80
+0.9%
8.7K
total stars
#444
xiangyuecn/AreaCity-JsSpider-StatsGov

Comprehensive collection of city and administrative region data for China, with features like CSV export, JS code generation, and web scraping.

+80
+1.3%
6.4K
total stars
#445
ujjwalkarn/DataSciencePython

A Python library for common data analysis and machine learning tasks

+80
+1.4%
5.7K
total stars
#446
neilotoole/sq

sq is a Go-based data wrangling tool that supports a variety of data formats and databases.

+80
+3.4%
2.5K
total stars
#447
data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

+80
+4.4%
1.9K
total stars
#448
CliMA/Oceananigans.jl

A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.

+80
+6.7%
1.3K
total stars
#449
qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

+79
+1.3%
6.4K
total stars
#450
apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

+79
+1.4%
5.6K
total stars
1...810...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.