Category
Showing 301-350 of 897 trending projects
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.
This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.
High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.
Pentaho Data Integration (ETL) is a Java-based tool for building data integration and ETL pipelines.
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.
dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.
An open-source global repository of address, building, and parcel data for developers and geospatial applications.
A registry of publicly available datasets hosted on AWS for data-driven developers.
A frequency word list generator and processed files for text analysis and natural language processing.
Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.
Open Babel is a chemical toolbox for working with chemical data and cheminformatics.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
A high-level geospatial data visualization library for Python developers working with spatial data.
Simple script for downloading YouTube comments without using the YouTube API.
MyBatis SQL Mapper for Java simplifies database interactions with object mapping.
This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.
LiteDB is a lightweight, embedded NoSQL document database for .NET applications that can be used in a single data file.
An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.
A Chinese name corpus and generator for natural language processing and entity recognition.
SQLite JDBC Driver - a Java library for accessing SQLite databases
A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.
A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.
A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.
A curated list of Python packages for chemistry, including computational chemistry, molecular dynamics, and quantum chemistry.
The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
A versatile Python library for bioinformatics, providing data structures, algorithms, and educational resources.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
An open-source N-body simulation library for astrophysics and planetary science.
A PostgreSQL sample database for testing and learning SQL queries.
Immutable database and Datalog query engine for Clojure, ClojureScript and JS
An open-source repository for parsing electricity data and powering a comprehensive electricity data platform.
A Python library for extracting tabular data from PDF files, useful for data processing and analysis.
A curated list of awesome JSON datasets that don't require authentication.
A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.
A desktop application for viewing and analyzing tabular data, with support for CSV, Parquet, and DuckDB.
An open-source project that captures the public GitHub timeline and makes it accessible for analysis.
An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
RBush is a high-performance JavaScript R-tree-based 2D spatial index for points and rectangles.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A modern, embedded SQL database written in Go for embedded and mobile applications.
A cross-platform way to express data transformation, relational algebra, and standardized record expression and plans.
A Python library for portfolio optimization and back-testing in finance.
OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.
A grammar of graphics library for creating highly customizable and publication-quality plots in Python.
A collection of Jupyter Notebook files focused on data visualization and machine learning concepts.
Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.
Get weekly updates on trending AI coding tools and projects.