Category
Showing 151-200 of 897 trending projects
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
An open-source framework for change data capture from various databases using Apache Kafka.
A Python tool for automatically scraping data on China's statutory holidays from government announcements.
networkx is a Python library for creating, manipulating, and studying the structure and dynamics of complex networks.
Redis 6.0.20 through 8.0.0 for Windows, a popular open-source in-memory data structure store.
A repository of data science interview questions and answers for developers.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
A comprehensive English word database with translations, parts of speech, and definitions for developers.
High-performance distributed graph database for real-time use cases
A Python library that provides a simple and unified interface for extracting text from any document format.
WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.
A curated list of awesome PostgreSQL software, libraries, tools and resources.
A curated list of data engineering tools for software developers, not focused on AI coding tools.
A comprehensive database of countries, states, and cities with data in multiple formats
A Python library for scraping soccer data from various sources for sports analytics and data science.
This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.
An open-source metadata platform for managing your data and AI stack across the enterprise.
A distributed SQL database built from scratch, not focused on vibe coders or AI tools.
QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.
A comprehensive collection of data science cheatsheets for developers and data scientists.
An educational OLAP database system built in Rust for learning and experimentation.
Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.
Draco is a C++ library for compressing and decompressing 3D geometric meshes and point clouds.
ORM for TypeScript and JavaScript with support for multiple databases and platforms.
FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.
A simple, fast, and embeddable key-value store written in Go that supports transactions and data structures.
A curated list of data science interview questions and answers for developers.
A .NET Standard library that provides strongly typed exceptions for Entity Framework Core across multiple database providers.
A Python library that allows developers to easily draw datasets within their notebooks.
Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.
Tonbo is an embedded database for serverless and edge runtimes, optimized for offline-first and big data use cases.
db.py is a Python library that provides an easier way to interact with your databases.
OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.
A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.
Python wrapper for the TA-Lib technical analysis library, useful for financial pattern recognition.
A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.
Compilation of R and Python programming codes for data science and machine learning projects.
SciPy is a Python library for scientific and technical computing, providing a wide range of algorithms and tools.
A parallel processing library for Pandas that improves performance on multi-core CPUs.
Fast, embedded graph database with vector search and full-text search, compatible with Cypher queries.
A high-performance datastore for time series and tick data built on top of MongoDB.
A Python library for portfolio optimization using scikit-learn and convex optimization techniques.
Highly available PostgreSQL cluster using Docker, focused on data infrastructure for developers.
A comprehensive Python library for color science and color space conversions.
A data warehouse for COVID-19 time series data, useful for data analysis and visualization.
Apache Flink is a stream processing framework for real-time and batch data processing.
A robust Python library for materials analysis and computational materials science.
Fast, embeddable key-value database written in Go for building high-performance storage applications.
A high-performance NoSQL data store compatible with Apache Cassandra and Amazon DynamoDB.
Get weekly updates on trending AI coding tools and projects.