Category
Showing 251-300 of 897 trending projects
Concurrent data pipelines in Python for building efficient and scalable data processing workflows.
Database manager for multiple database engines, runs as desktop or web app.
MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.
A highly scalable, high-performance graph database that supports over 100 billion data points.
High-performance time-series database for IoT and IIoT
EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.
A collection of data science projects in Python using Jupyter Notebook.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
A book that teaches the basics of using the Redis in-memory data structure store.
Transporter is a powerful ETL tool that allows developers to sync data between various persistence engines.
A PostgreSQL sample database for testing and learning SQL queries.
A powerful 3D visualization library for scientific data in Python.
A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support.
Non-native graph database abstraction layer for Node.js and web browsers.
A curated list of awesome big data frameworks, resources and other awesomeness.
A fast, scalable, and distributed database for transactional, analytical, and AI workloads.
Blazing-fast data wrangling toolkit for AI and data engineering workflows
A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.
A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.
QueryKit is a simple CoreData query language for Swift and Objective-C developers.
Technical Analysis Library using Pandas and Numpy for financial data analysis and trading strategies.
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
pandasql is a Python library that allows developers to use SQL syntax to query Pandas DataFrames.
This is an astronomy visualization project that maps orbits of asteroids in the solar system.
A lightweight, fault-tolerant distributed database built on SQLite, designed for high availability.
A tool for comparing and evaluating databases for time series data.
A Python library for implementing the Louvain community detection algorithm on graphs.
Distributed transactional key-value database, originally created to complement TiDB
A distributed knowledge graph store built in Go for managing large-scale semantic data.
A collection of efficient Python tricks and tools for data scientists to improve their productivity.
A tool to easily import CSV and JSON data into PostgreSQL databases.
Open source research data repository software built with Java.
A collection of Jupyter Notebook files for data analysis using Python, including a Chinese translation of the popular 'Python for Data Analysis' book.
An open-source, self-hosted database management tool with a spreadsheet-like interface for Postgres
A collection of articles and source code on using the pandas data analysis library.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
DuckLake is an integrated data lake and catalog format written in C++.
A data quality and observability tool for monitoring and fixing data issues before they become problems.
A Python library for building business intelligence (BI) and OLAP solutions.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
A high-performance compression library written in C for developers working with large data sets.
Sample datasets for users of the Yelp Academic Dataset, useful for data analysis and machine learning.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.
An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.
A library for text mining and natural language processing using tidy data principles in R.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.
Get weekly updates on trending AI coding tools and projects.