Trending Projects

Discover the fastest growing open source projects

Showing 801-850 of 897 trending projects

#801
mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

0
0.0%
1.1K
total stars
#802
jblindsay/whitebox-tools

An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.

0
0.0%
1.1K
total stars
#803
youngwookim/awesome-hadoop

A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.

0
0.0%
1.1K
total stars
#804
apache/amoro

Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.

0
0.0%
1.1K
total stars
#805
Teradata/kylo

Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.

0
0.0%
1.1K
total stars
#806
qri-io/qri

An open-source platform for building and sharing datasets, focused on trust, privacy, and decentralization.

0
0.0%
1.1K
total stars
#807
red-data-tools/pycall.rb

A library for calling Python functions from the Ruby language, enabling data science and ML workflows.

0
0.0%
1.1K
total stars
#808
moby/datakit

Connect processes into powerful data pipelines with a simple git-like filesystem interface

0
0.0%
1.1K
total stars
#809
OvertureMaps/data

Overture Maps Data is a Python library providing access to open-source geographic data.

0
0.0%
1.1K
total stars
#810
paulvangentcom/heartrate_analysis_python

A Python package for analyzing heart rate data from PPG and ECG signals.

0
0.0%
1.1K
total stars
#811
pachterlab/gget

gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.

0
0.0%
1.1K
total stars
#812
openspout/openspout

A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.

0
0.0%
1.1K
total stars
#813
shaypal5/awesome-twitter-data

A curated list of Twitter datasets and resources for data scientists and social network analysts.

0
0.0%
1.1K
total stars
#814
mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

0
0.0%
1.1K
total stars
#815
paulmach/orb

A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.

0
0.0%
1.1K
total stars
#816
brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

0
0.0%
1.1K
total stars
#817
samapriya/awesome-gee-community-datasets

A community-driven catalog of geospatial datasets for use with Google Earth Engine.

0
0.0%
1.1K
total stars
#818
tangwz/db-monthly

A collection of monthly reports on the internals of Alibaba Cloud's database products.

0
0.0%
1.1K
total stars
#819
caserec/Datasets-for-Recommender-Systems

A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.

0
0.0%
1.1K
total stars
#820
apachecn/pyda-2e-zh

A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.

0
0.0%
1.1K
total stars
#821
dataquestio/project-walkthroughs

A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.

0
0.0%
1.1K
total stars
#822
traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

0
0.0%
1.1K
total stars
#823
gaarason/database-all

Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x

0
0.0%
1.1K
total stars
#824
mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

0
0.0%
1.1K
total stars
#825
Azure/AzurePublicDataset

Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.

0
0.0%
1.1K
total stars
#826
oetiker/rrdtool-1.x

RRDtool is a time-series database system for efficiently storing and graphing data.

0
0.0%
1.1K
total stars
#827
fraunhoferportugal/tsfel

An intuitive library to extract features from time series data for data science and machine learning.

0
0.0%
1.1K
total stars
#828
liucongg/NLPDataSet

A repository containing various NLP datasets collected and organized by the owner.

0
0.0%
1.1K
total stars
#829
mpmath/mpmath

A Python library for arbitrary-precision floating-point arithmetic, providing advanced numerical capabilities.

0
0.0%
1.1K
total stars
#830
big-data-europe/docker-hive

This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.

0
0.0%
1.1K
total stars
#831
joaoh82/rust_sqlite

A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.

0
0.0%
1.1K
total stars
#832
rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

0
0.0%
1.1K
total stars
#833
marcboeker/go-duckdb

A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.

0
0.0%
1.1K
total stars
#834
eduosi/district

This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.

0
0.0%
1.1K
total stars
#835
docker-library/mongo

Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.

0
0.0%
1.1K
total stars
#836
brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

0
0.0%
1.1K
total stars
#837
crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

0
0.0%
1.1K
total stars
#838
intake/intake

Intake is a lightweight Python package for discovering, investigating, loading and distributing data.

0
0.0%
1.1K
total stars
#839
jorgecarleitao/arrow2

A Rust library to work with the Arrow data format, without requiring the Transmute crate.

0
0.0%
1.1K
total stars
#840
gunrock/gunrock

Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.

0
0.0%
1.1K
total stars
#841
patx/pickledb

An in-memory key-value store using Python's orjson module for persistence, with SQLite support.

0
0.0%
1.1K
total stars
#842
RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

0
0.0%
1.1K
total stars
#843
ddotta/awesome-polars

A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.

0
0.0%
1.1K
total stars
#844
paulyoder/LinqToExcel

A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.

0
0.0%
1.1K
total stars
#845
kblin/ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.

0
0.0%
1.1K
total stars
#846
SciRuby/daru

SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.

0
0.0%
1.1K
total stars
#847
Mrkuhuo/data-warehouse-learning

Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.

0
0.0%
1.1K
total stars
#848
KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

0
0.0%
1.1K
total stars
#849
databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

0
0.0%
1.1K
total stars
#850
markwk/qs_ledger

A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.

0
0.0%
1.1K
total stars
1...1618

Stay in the loop

Get weekly updates on trending AI coding tools and projects.