Category
Showing 251-300 of 897 trending projects
AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.
An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.
A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
Efficient in-memory cache in Go for storing and retrieving large amounts of data.
Comprehensive collection of city and administrative region data for China, with features like CSV export, JS code generation, and web scraping.
A Rust library for serializing and deserializing data in the Rusty Object Notation (RON) format.
A powerful data visualization and plotting library for the Julia programming language.
MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
A collection of stock analysis tools across various programming languages and platforms.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
LibRaw is a C++ library for reading RAW image files from digital cameras.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
An educational distributed SQL database written in Rust, not focused on AI coding tools.
Technical Analysis Library using Pandas and Numpy for financial data analysis and trading strategies.
A curated list of awesome resources for network analysis and visualization, with a focus on R tools.
A Python package for interactive geospatial analysis and visualization with Google Earth Engine.
A database modeling language (DBML) that helps define and document database structures.
The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.
A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.
A Python library that provides support for the pgvector vector database, enabling efficient vector search and storage.
Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A high-performance Python library for working with large tabular datasets, offering efficient data manipulation and visualization.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Apache Avro is a data serialization system for efficient storage and transmission of structured data.
A dataset of cluster data collected from Alibaba's production clusters for cluster management research.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
A Python library that allows developers to easily draw datasets within their notebooks.
Build vector tilesets from large collections of GeoJSON features.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.
A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.
Apache Pinot is a realtime distributed OLAP datastore for fast querying of large datasets.
A Python library for common data analysis and machine learning tasks
C++ DataFrame library for statistical, financial, and machine learning analysis.
An in-process OLAP SQL Engine powered by ClickHouse, enabling fast and efficient data analysis.
A comprehensive Python library for color science and color space conversions.
A Rust library for quantitative finance, including tools for machine learning, option pricing, and trading.
A simple, fast and versatile Datalog database written in Clojure for vibe coders.
Comprehensive roadmap for data engineering and AI development in Python
A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.
Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.
KurrentDB is an event-native database designed for modern software and event-driven architectures.
A lightweight data processing framework built on DuckDB and 3FS for vibe coders working with AI tools.
An open-source distributed SQL database with high availability, scalability, and ACID transactions.
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
Get weekly updates on trending AI coding tools and projects.