Category
Showing 351-400 of 897 trending projects
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
An open-access book on scientific visualization using Python and Matplotlib for data-driven developers
An open-source, scalable, and fault-tolerant NoSQL database with a focus on reliability and offline-first design.
A Rust data structure for efficiently storing and accessing data in a sparse set.
A free and easy-to-use .NET library for reading and writing CSV and fixed-length data files.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
A collection of open data sets and tools for data science and machine learning tasks.
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
Fast, lightweight search backend alternative to Elasticsearch
A curated list of resources for machine learning-based algorithmic trading and quantitative finance.
A beginner-friendly Python toolkit for financial data extraction, analysis, and automation.
Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
DBngin is a free, open-source, cross-platform database management tool for developers.
Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.
Efficient in-memory cache in Go for storing and retrieving large amounts of data.
Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.
WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.
Open-source repository for sharing code related to the MIMIC family of critical care databases.
A Python library that summarizes news articles by extracting the most important sentences.
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
An open-source C++ framework for fast and parallel map matching of GPS trajectories.
SQLDelight - Generates type-safe Kotlin APIs from SQL, enabling easier database management in Kotlin projects.
A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster).
Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.
A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.
Biopython is a set of Python modules that provide a wide range of functionality for bioinformatics, including DNA/RNA/protein sequence analysis, phylogenetics, and more.
Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.
SQL query builder for C# developers, supporting multiple databases and complex queries.
Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.
This GitHub repository provides a collection of Bible versions and cross-reference databases, but it does not appear to be related to the given developer discovery platform focused on vibe coders.
A database modeling language (DBML) that helps define and document database structures.
Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.
A versatile Python library for bioinformatics, providing data structures, algorithms, and educational resources.
ArangoDB is a multi-model database supporting documents, graphs, and key-values for high-performance applications.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.
An open-source, community-driven platform for data-intensive scientific analysis and visualization.
Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.
A dataset for music analysis and research, with support for deep learning and reproducible research.
ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.
An offline IP database for developers to look up IP address geolocation information.
Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.
Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.
Curated list of Python software and packages for scientific research in audio
Get weekly updates on trending AI coding tools and projects.