Category
Showing 701-750 of 897 trending projects
A Python driver for the ClickHouse database with native interface support.
LuxCore is a high-performance path-tracing render engine for realistic 3D graphics and visualization.
This is an astronomy visualization project that maps orbits of asteroids in the solar system.
A Rust data structure for efficiently storing and accessing data in a sparse set.
ActiveRecord-like API for CoreData, a powerful object-relational mapping (ORM) for iOS development.
A PHP library that provides a MySQL backup functionality, similar to the mysqldump CLI tool.
An exabyte-scale, multi-region distributed file system for developers building AI-powered applications.
A Python library for building business intelligence (BI) and OLAP solutions.
A Python library for reading, manipulating, and writing data in various spreadsheet file formats.
A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.
Useful scripts, UDFs, views, and other utilities for migration and data warehouse operations in BigQuery.
A comprehensive guide to technical references for data careers, including Python, machine learning, and data science.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
Trill is a single-node query processor for temporal or streaming data.
Apache Impala is a high-performance, open-source, SQL query engine that runs on Apache Hadoop and Apache Kudu.
Sample datasets for users of the Yelp Academic Dataset, useful for data analysis and machine learning.
This is a C++ repository for a Kaggle competition in 2014, not a developer discovery platform.
Embedded Go Database, a fast open-source NoSQL database solution for Go projects.
Percona Server is an enhanced, open-source version of the MySQL database management system.
Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.
A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.
A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.
An embedded time-series database written in Go for storing and querying metrics data.
Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.
A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
An automatic DBMS configuration tool for optimizing database performance.
Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.
A scalable, SQL-based streaming analytics platform from Uber, built on top of Apache Flink.
A C++ library for processing data streams, potentially useful for vibe coders working with AI-powered tools.
A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.
Index your Gmail account to a SQLite DB and perform custom data analysis on your email.
A public dataset of daily COVID-19 cases and deaths per country, useful for data analysis and visualization.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
db.py is a Python library that provides an easier way to interact with your databases.
A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.
An automatic database ORM library for Objective-C that provides thread-safe and deadlock-free database operations.
A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.
A PostgreSQL extension that adds HyperLogLog data structures as a native data type.
A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.
A high-level geospatial data visualization library for Python developers working with spatial data.
A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.
A library for text mining and natural language processing using tidy data principles in R.
Java client library for connecting to the InfluxDB time series database.
Simple script for downloading YouTube comments without using the YouTube API.
A high-performance B-tree implementation for Go, useful for building database-like applications.
NFStream is a flexible network data analysis framework for network monitoring, security, and traffic classification.
A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.
A Python data analysis library optimized for humans instead of machines.
A scalable, distributed ETL framework for building data lake analytics pipelines.
Get weekly updates on trending AI coding tools and projects.