Category
Showing 451-500 of 897 trending projects
Simple script for downloading YouTube comments without using the YouTube API.
A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.
An open-source repository for parsing electricity data and powering a comprehensive electricity data platform.
A collection of stock analysis tools across various programming languages and platforms.
A free database of geographic place names and corresponding geospatial data for developers to use.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.
dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
LibRaw is a C++ library for reading RAW image files from digital cameras.
COVID-19 data repository for developers, providing daily updated case, death, and testing information.
Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.
A repository of public data sources for building and testing recommender systems.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
sq is a Go-based data wrangling tool that supports a variety of data formats and databases.
Lightweight, fast, and reliable key-value database engine in Go for high-throughput applications.
A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.
A collection of SQL queries to analyze social media datasets.
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support.
Apache Avro is a data serialization system for efficient storage and transmission of structured data.
Graph and network visualization library for R developers working with tabular data
An open-source project that captures the public GitHub timeline and makes it accessible for analysis.
PyWavelets is a Python library for wavelet transform algorithms and techniques, useful for image and signal processing.
This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.
Embedded Go Database, a fast open-source NoSQL database solution for Go projects.
Linq to database provider for .NET, supporting various database engines.
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
TuGraph-DB is a high-performance graph database built for fast and efficient graph data processing.
A Python library that implements database internals from scratch, useful for learning database concepts.
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
A Python driver for the ClickHouse database with native interface support.
SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.
OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.
A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
A Python database adapter for PostgreSQL, allowing developers to interact with their databases.
PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.
A C# library for reading and writing metadata in media files, useful for audio and video processing applications.
High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.
A cloud-native PostgreSQL database developed by Alibaba Cloud for high-performance, scalable data storage and management.
An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.
A Python statistical package based on Pandas, providing various statistical methods and tests.
A Python library for accurate and scalable fuzzy matching, record deduplication, and entity resolution.
A curated list of awesome JSON datasets that don't require authentication.
A cross-platform way to express data transformation, relational algebra, and standardized record expression and plans.
An ORM (Object-Relational Mapping) library for .NET that supports a wide range of database providers, including SQL Server, MySQL, PostgreSQL, and more.
A Chinese name corpus and generator for natural language processing and entity recognition.
A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.
AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.
Get weekly updates on trending AI coding tools and projects.