Category
Showing 701-750 of 897 trending projects
A comprehensive guide to technical references for data careers, including Python, machine learning, and data science.
A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.
Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.
A Python library providing multivariate imputation and matrix completion algorithms.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
Trill is a single-node query processor for temporal or streaming data.
Apache Impala is a high-performance, open-source, SQL query engine that runs on Apache Hadoop and Apache Kudu.
Sample datasets for users of the Yelp Academic Dataset, useful for data analysis and machine learning.
Embedded Go Database, a fast open-source NoSQL database solution for Go projects.
A fast and elegant data exploration library for Elixir, providing series and dataframes for data science workflows.
This is a C++ repository for a Kaggle competition in 2014, not a developer discovery platform.
Percona Server is an enhanced, open-source version of the MySQL database management system.
Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.
A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
An embedded time-series database written in Go for storing and querying metrics data.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.
An automatic DBMS configuration tool for optimizing database performance.
Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.
A full-featured file system for online data storage, built with Python.
A scalable, SQL-based streaming analytics platform from Uber, built on top of Apache Flink.
A Python library for financial analysis and data scraping from the Finviz platform.
A C++ library for processing data streams, potentially useful for vibe coders working with AI-powered tools.
A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.
Index your Gmail account to a SQLite DB and perform custom data analysis on your email.
A public dataset of daily COVID-19 cases and deaths per country, useful for data analysis and visualization.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
db.py is a Python library that provides an easier way to interact with your databases.
A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.
An automatic database ORM library for Objective-C that provides thread-safe and deadlock-free database operations.
A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.
This repository provides code and data for a book on statistics for data scientists.
A PostgreSQL extension that adds HyperLogLog data structures as a native data type.
A Python library with data related to Brazilian municipalities, including IBGE codes, latitude, longitude, and more.
A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.
DBngin is a free, open-source, cross-platform database management tool for developers.
Lakekeeper is an open-source, secure, and fast Apache Iceberg REST Catalog written in Rust for data lakehouse governance.
A high-level geospatial data visualization library for Python developers working with spatial data.
A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.
A library for text mining and natural language processing using tidy data principles in R.
Java client library for connecting to the InfluxDB time series database.
Simple script for downloading YouTube comments without using the YouTube API.
Get weekly updates on trending AI coding tools and projects.