Category
Showing 851-897 of 897 trending projects
OpenRefine is a powerful data cleaning and transformation tool that helps developers work with messy data.
An open-access book on scientific visualization using Python and Matplotlib for data-driven developers
Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.
A collection of Python code examples and tutorials for data science, machine learning, and web development.
Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases.
A curated list of awesome R packages, frameworks and software for data analysis and data science.
No description provided for this medical data repository.
COVID-19 data repository for developers, providing daily updated case, death, and testing information.
A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.
esProc SPL is a JVM-based programming language for structured data computation, serving as both a data analysis tool and an embedded computing engine.
A collection of Jupyter Notebook files for data analysis using Python, including a Chinese translation of the popular 'Python for Data Analysis' book.
This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.
An embeddable, replicated, and fault-tolerant SQL engine for building robust and scalable applications.
An open-source repository for parsing electricity data and powering a comprehensive electricity data platform.
A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.
A command-line tool to generate idiomatic Go code for SQL databases across multiple database engines.
A high-performance Java library for data analysis, visualization, and machine learning.
Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.
Fluent Migrator is a .NET migration framework for managing database schema changes across multiple database providers.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.
PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.
A Python library for creating easy-to-use, visually appealing data tables and summaries.
FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.
A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
Zui is a powerful desktop app for exploring and working with data, with support for CSV, JSON, and the Zed data format.
Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.
Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.
A Java ORM SQL query builder that supports popular databases like ClickHouse, Impala, MySQL, and Presto.
A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.
A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.
Python library for clustering categorical data using k-modes and k-prototypes algorithms.
A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
Mondrian is an OLAP server that enables real-time analysis of large data sets for business users.
An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.
Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x
Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.
A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.
Lightweight local JSON database for JavaScript/TypeScript apps
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
A powerful customer data pipeline for collecting, processing, and analyzing user events and behavior.
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.
A Python library providing multivariate imputation and matrix completion algorithms.
SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.
Get weekly updates on trending AI coding tools and projects.