Category
Showing 451-500 of 897 trending projects
Crafty statistical graphics library for the Julia programming language
Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.
A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.
An Internet-scale distributed database system built on C++, inspired by Google's Bigtable.
This is a Python library for financial applications, not a tool for AI-powered vibe coders.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.
Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.
A Python library for portfolio optimization using scikit-learn and convex optimization techniques.
A Python toolbox for gaining geometric insights into high-dimensional data, useful for vibe coders working with AI tools.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
A fast B+ tree indexing structure in C for efficient storage and retrieval of billions of key-value pairs.
A Python statistical package based on Pandas, providing various statistical methods and tests.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
A Python package for time series classification, useful for developers working with time-series data.
Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.
A C# library that converts Excel spreadsheets to JSON objects and saves them to a text file.
A Python library for technical analysis indicators, with Chinese translation and documentation.
An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.
A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.
A Java ORM SQL query builder that supports popular databases like ClickHouse, Impala, MySQL, and Presto.
A Python tool for automatically scraping data on China's statutory holidays from government announcements.
A curated list of resources for machine learning-based algorithmic trading and quantitative finance.
Fast local PDF-to-Markdown/JSON converter for RAG pipelines. No GPU needed.
A Python library for processing and visualizing satellite imagery data.
A high-performance C++ linear algebra library focused on solvers, sparse matrices, and numerical computing.
A robust Python library for materials analysis and computational materials science.
This is a data repository for the Seaborn data visualization library in Python.
This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.
An educational OLAP database system built in Rust for learning and experimentation.
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
Apache Fluss is a real-time streaming storage platform built for big data analytics.
MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.
Entity Framework Core provider for PostgreSQL, enabling .NET developers to easily interact with PostgreSQL databases.
Poisson Surface Reconstruction is a C++ library for reconstructing surfaces from point cloud data.
A JavaScript statistical library that provides a wide range of statistical functions for data analysis.
A C# in-memory document database with source generator-based embedded typed readonly data.
Highly available PostgreSQL cluster using Docker, focused on data infrastructure for developers.
This Python repository contains code examples and notes for data analysis and mining.
Simple Python interface for Graphviz, a popular open-source data visualization tool.
A large-scale entity and relation database supporting aggregation of properties for big data applications.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A columnar storage extension for Postgres built as a foreign data wrapper.
SchemaCrawler is a free database schema discovery and comprehension tool that supports various database management systems.
A Python module for extracting and mapping Chinese province, city, and district data.
A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.
A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.
A collection of solutions to Chinese data competitions, primarily using Python.
A MongoDB schema analysis tool that helps developers understand and optimize their NoSQL database.
Fast n-dimensional filtering and grouping of records, a powerful data manipulation library for JavaScript.
Get weekly updates on trending AI coding tools and projects.