Category
Showing 701-750 of 897 trending projects
A curated list of community detection research papers with implementations for data science and network analysis.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
A library for generating MaxMind GeoIP2 databases for China IP addresses.
A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.
A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.
MongoEngine is a Python Object-Document-Mapper (ODM) for working with MongoDB databases.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
ggplot2 is a powerful data visualization library for R that provides elegant and flexible graphics.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
Synth is a Rust library for generating realistic, randomized test data for applications and databases.
A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.
A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
A Python library for reading, manipulating, and writing data in various spreadsheet file formats.
A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.
Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.
A curated list of awesome database libraries, resources, and tools for developers.
This Scala library provides a high-performance implementation of the node2vec algorithm for embedding graphs.
A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.
This repository provides code examples for Oracle's AI-enabled database features and integrations.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
GridDB is a fast and scalable open-source database for time-series IoT and big data applications.
An R package that provides support for simple features, a standardized way to encode spatial vector data.
Realm is a mobile database that serves as a replacement for SQLite and ORMs.
Cloud-based database manager UI for querying, managing, and visualizing databases across multiple platforms.
This repository provides code and data for a book on statistics for data scientists.
A fast, efficient C extension for NumPy that provides optimized array functions.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
Open-source relational database engine powering web apps, APIs, and data-driven backends worldwide.
Automatically generates beautiful and easy-to-read ER diagrams from your database.
A Ruby library that makes it easy to group temporal data, useful for developers working with time-series data.
A PostgreSQL extension that adds HyperLogLog data structures as a native data type.
A comprehensive Julia library for probability distributions and related statistical functions.
A powerful 3D visualization library for scientific data in Python.
An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.
A next-generation curated knowledge sharing platform for data scientists and other technical professionals.
MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
A repository containing various NLP datasets collected and organized by the owner.
A Python library that provides a tour of the wonderland of math with visualizations and algorithms.
A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.
A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.
A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.
Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.
A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.
This Python repository contains code examples and notes for data analysis and mining.
A high-performance, highly available, and distributed time series database written in Rust.
Python library for clustering categorical data using k-modes and k-prototypes algorithms.
An embedded time-series database written in Go for storing and querying metrics data.
A Python library for comparing data across databases, supporting various database engines.
Get weekly updates on trending AI coding tools and projects.