Category
Showing 351-400 of 897 trending projects
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
A desktop application for viewing and analyzing tabular data, with support for CSV, Parquet, and DuckDB.
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
This GitHub repository provides a collection of Bible versions and cross-reference databases, but it does not appear to be related to the given developer discovery platform focused on vibe coders.
Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.
A Python library for common data analysis and machine learning tasks
OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.
An open-source data modeling tool designed for PostgreSQL, allowing developers to generate DDL commands visually.
Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.
Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.
Fast, embeddable key-value database written in Go for building high-performance storage applications.
A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.
A collection of code examples and baselines for common data science and machine learning competitions.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.
The ultimate set of SQLite extensions for developers building applications with SQLite databases.
A Python library for creating beautiful visualizations of language differences across document types.
Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
An open-source, community-driven platform for data-intensive scientific analysis and visualization.
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
A Python library that provides support for the pgvector vector database, enabling efficient vector search and storage.
A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.
A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.
C++ DataFrame library for statistical, financial, and machine learning analysis.
A Python package for interactive geospatial analysis and visualization with Google Earth Engine.
Docker images containing Jupyter applications for data science and machine learning workflows.
Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.
A curated list of awesome R packages, frameworks and software for data analysis and data science.
A dataset for music analysis and research, with support for deep learning and reproducible research.
Malloy is an open-source language for describing data relationships and transformations.
A Python library for conveniently reading data from the Tongdaxin financial data platform.
Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.
SQL query builder for C# developers, supporting multiple databases and complex queries.
A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.
OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.
A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.
SQLite JDBC Driver - a Java library for accessing SQLite databases
Curated list of Python software and packages for scientific research in audio
The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.
A collection of code snippets and tutorials for data science and data analysis in Python.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
Build vector tilesets from large collections of GeoJSON features.
Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.
An embeddable, replicated, and fault-tolerant SQL engine for building robust and scalable applications.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
Open-source massively parallel processing (MPP) database, an alternative to Greenplum.
Extremely fast, easy to use, and fully async NoSQL database for Flutter apps
A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.
Get weekly updates on trending AI coding tools and projects.