Trending Projects

Discover the fastest growing open source projects

Showing 801-850 of 897 trending projects

#801

mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

0.0%

1.1K

total stars

#802

jblindsay/whitebox-tools

An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.

0.0%

1.1K

total stars

Rust

#803

youngwookim/awesome-hadoop

A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.

0.0%

1.1K

total stars

#804

apache/amoro

Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.

0.0%

1.1K

total stars

Java

#805

Teradata/kylo

Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.

0.0%

1.1K

total stars

Java

#806

qri-io/qri

An open-source platform for building and sharing datasets, focused on trust, privacy, and decentralization.

0.0%

1.1K

total stars

#807

red-data-tools/pycall.rb

A library for calling Python functions from the Ruby language, enabling data science and ML workflows.

0.0%

1.1K

total stars

#808

moby/datakit

Connect processes into powerful data pipelines with a simple git-like filesystem interface

0.0%

1.1K

total stars

OCaml

#809

OvertureMaps/data

Overture Maps Data is a Python library providing access to open-source geographic data.

0.0%

1.1K

total stars

Python

#810

paulvangentcom/heartrate_analysis_python

A Python package for analyzing heart rate data from PPG and ECG signals.

0.0%

1.1K

total stars

Python

#811

pachterlab/gget

gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.

0.0%

1.1K

total stars

Python

#812

openspout/openspout

A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.

0.0%

1.1K

total stars

PHP

#813

shaypal5/awesome-twitter-data

A curated list of Twitter datasets and resources for data scientists and social network analysts.

0.0%

1.1K

total stars

#814

mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

0.0%

1.1K

total stars

Rust

#815

paulmach/orb

A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.

0.0%

1.1K

total stars

#816

brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

0.0%

1.1K

total stars

Python

#817

samapriya/awesome-gee-community-datasets

A community-driven catalog of geospatial datasets for use with Google Earth Engine.

0.0%

1.1K

total stars

HTML

#818

tangwz/db-monthly

A collection of monthly reports on the internals of Alibaba Cloud's database products.

0.0%

1.1K

total stars

#819

caserec/Datasets-for-Recommender-Systems

A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.

0.0%

1.1K

total stars

Jupyter Notebook

#820

apachecn/pyda-2e-zh

A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.

0.0%

1.1K

total stars

CSS

#821

dataquestio/project-walkthroughs

A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.

0.0%

1.1K

total stars

Jupyter Notebook

#822

traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

0.0%

1.1K

total stars

#823

gaarason/database-all

Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x

0.0%

1.1K

total stars

Java

#824

mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

0.0%

1.1K

total stars

Java

#825

Azure/AzurePublicDataset

Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.

0.0%

1.1K

total stars

Jupyter Notebook

#826

oetiker/rrdtool-1.x

RRDtool is a time-series database system for efficiently storing and graphing data.

0.0%

1.1K

total stars

#827

fraunhoferportugal/tsfel

An intuitive library to extract features from time series data for data science and machine learning.

0.0%

1.1K

total stars

Python

#828

liucongg/NLPDataSet

A repository containing various NLP datasets collected and organized by the owner.

0.0%

1.1K

total stars

#829

mpmath/mpmath

A Python library for arbitrary-precision floating-point arithmetic, providing advanced numerical capabilities.

0.0%

1.1K

total stars

Python

#830

big-data-europe/docker-hive

This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.

0.0%

1.1K

total stars

Shell

#831

joaoh82/rust_sqlite

A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.

0.0%

1.1K

total stars

Rust

#832

rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

0.0%

1.1K

total stars

Python

#833

marcboeker/go-duckdb

A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.

0.0%

1.1K

total stars

#834

eduosi/district

This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.

0.0%

1.1K

total stars

#835

docker-library/mongo

Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.

0.0%

1.1K

total stars

Shell

#836

brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

0.0%

1.1K

total stars

Jupyter Notebook

#837

crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

0.0%

1.1K

total stars

Python

#838

intake/intake

Intake is a lightweight Python package for discovering, investigating, loading and distributing data.

0.0%

1.1K

total stars

Python

#839

jorgecarleitao/arrow2

A Rust library to work with the Arrow data format, without requiring the Transmute crate.

0.0%

1.1K

total stars

Rust

#840

gunrock/gunrock

Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.

0.0%

1.1K

total stars

C++

#841

patx/pickledb

An in-memory key-value store using Python's orjson module for persistence, with SQLite support.

0.0%

1.1K

total stars

Python

#842

RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

0.0%

1.1K

total stars

#843

ddotta/awesome-polars

A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.

0.0%

1.1K

total stars

#844

paulyoder/LinqToExcel

A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.

0.0%

1.1K

total stars

#845

kblin/ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.

0.0%

1.1K

total stars

Python

#846

SciRuby/daru

SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.

0.0%

1.1K

total stars

Ruby

#847

Mrkuhuo/data-warehouse-learning

Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.

0.0%

1.1K

total stars

Java

#848

KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

0.0%

1.1K

total stars

Jupyter Notebook

#849

databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

0.0%

1.1K

total stars

Scala

#850

markwk/qs_ledger

A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.

0.0%

1.1K

total stars

Jupyter Notebook

1...1618

Stay in the loop

Get weekly updates on trending AI coding tools and projects.