Trending Projects

Discover the fastest growing open source projects

Showing 401-450 of 897 trending projects

#401
indradb/indradb

A Rust-based graph database for developers who need to store and query connected data.

0
0.0%
2.4K
total stars
#402
jayinai/data-science-question-answer

A collection of data science related questions and answers for developers.

0
0.0%
2.4K
total stars
#403
apache/hamilton

Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.

0
0.0%
2.4K
total stars
#404
malloydata/malloy

Malloy is an open-source language for describing data relationships and transformations.

0
0.0%
2.4K
total stars
#405
youngyangyang04/Skiplist-CPP

A lightweight key-value store built with C++ using a skiplist data structure.

0
0.0%
2.4K
total stars
#406
pydata/numexpr

A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.

0
0.0%
2.4K
total stars
#407
meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

0
0.0%
2.4K
total stars
#408
google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

0
0.0%
2.4K
total stars
#409
quarylabs/quary

Open-source BI platform for engineers to explore and model large-scale data pipelines.

0
0.0%
2.4K
total stars
#410
emirozer/fake2db

A Python library that generates fake data for custom test databases.

0
0.0%
2.4K
total stars
#411
PyWavelets/pywt

PyWavelets is a Python library for wavelet transform algorithms and techniques, useful for image and signal processing.

0
0.0%
2.3K
total stars
#412
orbitjs/orbit

A composable data framework for building ambitious web applications using TypeScript.

0
0.0%
2.3K
total stars
#413
VictoriaMetrics/fastcache

Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.

0
0.0%
2.3K
total stars
#414
JasonKessler/scattertext

A Python library for creating beautiful visualizations of language differences across document types.

0
0.0%
2.3K
total stars
#415
chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

0
0.0%
2.3K
total stars
#416
apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

0
0.0%
2.3K
total stars
#417
binance/binance-public-data

A Python library to access historical market data from the Binance cryptocurrency exchange.

0
0.0%
2.3K
total stars
#418
h5py/h5py

A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.

0
0.0%
2.2K
total stars
#419
man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

0
0.0%
2.2K
total stars
#420
supabase/etl

A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.

0
0.0%
2.2K
total stars
#421
BlankerL/DXY-COVID-19-Data

A data warehouse for COVID-19 time series data, useful for data analysis and visualization.

0
0.0%
2.2K
total stars
#422
IndrajeetPatil/ggstatsplot

ggstatsplot is an R library that enhances ggplot2 visualizations with statistical analysis and hypothesis testing.

0
0.0%
2.2K
total stars
#423
tensorchord/pgvecto.rs

Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.

0
0.0%
2.2K
total stars
#424
timeplus-io/proton

Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.

0
0.0%
2.2K
total stars
#425
ngaut/builddatabase

A distributed SQL database built from scratch, not focused on vibe coders or AI tools.

0
0.0%
2.1K
total stars
#426
RJT1990/pyflux

Open source time series library for Python, useful for statistical analysis and modeling.

0
0.0%
2.1K
total stars
#427
fugue-project/fugue

A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.

0
0.0%
2.1K
total stars
#428
Jon-Becker/prediction-market-analysis

Framework for collecting and analyzing prediction market data with comprehensive Polymarket/Kalshi datasets.

0
0.0%
2.1K
total stars
#429
konradhalas/dacite

A simple Python library for creating dataclasses from dictionaries.

0
0.0%
2.0K
total stars
#430
chris1610/pbpython

A collection of Python code, notebooks, and examples for practical business data analysis and visualization.

0
0.0%
2.0K
total stars
#431
moj-analytical-services/splink

Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.

0
0.0%
2.0K
total stars
#432
soedinglab/MMseqs2

MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.

0
0.0%
2.0K
total stars
#433
apache/bookkeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

0
0.0%
2.0K
total stars
#434
zhu-xlab/GlobalBuildingAtlas

GlobalBuildingAtlas is an open global and complete dataset of building polygons, heights and LoD1 3D models.

0
0.0%
2.0K
total stars
#435
apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

0
0.0%
2.0K
total stars
#436
alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

0
0.0%
2.0K
total stars
#437
mysql2sqlite/mysql2sqlite

Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.

0
0.0%
2.0K
total stars
#438
LastAncientOne/Stock_Analysis_For_Quant

A collection of stock analysis tools across various programming languages and platforms.

0
0.0%
2.0K
total stars
#439
shancarter/mr-data-converter

A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.

0
0.0%
2.0K
total stars
#440
bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

0
0.0%
2.0K
total stars
#441
igraph/igraph

A powerful C library for analyzing complex networks and graph-based data structures.

0
0.0%
1.9K
total stars
#442
JuliaPlots/Plots.jl

Powerful plotting and data visualization library for the Julia programming language.

0
0.0%
1.9K
total stars
#443
zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

0
0.0%
1.9K
total stars
#444
openacid/slim

A space-efficient trie data structure in Go with fast lookup performance.

0
0.0%
1.9K
total stars
#445
eveningkid/denodb

A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.

0
0.0%
1.9K
total stars
#446
duckdb/duckdb-wasm

WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.

0
0.0%
1.9K
total stars
#447
brimdata/zui

Zui is a powerful desktop app for exploring and working with data, with support for CSV, JSON, and the Zed data format.

0
0.0%
1.9K
total stars
#448
mirage/irmin

Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.

0
0.0%
1.9K
total stars
#449
enhancedformysql/The-Art-of-Problem-Solving-in-Software-Engineering_How-to-Make-MySQL-Better

This repository provides a comprehensive guide on optimizing MySQL performance and solving common database problems.

0
0.0%
1.9K
total stars
#450
fjall-rs/fjall

A high-performance, embeddable key-value storage engine written in Rust for developers building data-intensive applications.

0
0.0%
1.9K
total stars
1...810...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.