Trending Projects

Discover the fastest growing open source projects

Showing 251-300 of 897 trending projects

#251

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

+0.1%

2.4K

total stars

Python

#252

h5py/h5py

A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.

+0.1%

2.2K

total stars

Python

#253

man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

+0.1%

2.2K

total stars

C++

#254

alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+0.1%

2.0K

total stars

Jupyter Notebook

#255

igraph/igraph

A powerful C library for analyzing complex networks and graph-based data structures.

+0.1%

1.9K

total stars

#256

broadinstitute/gatk

Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.

+0.1%

1.9K

total stars

Java

#257

feldera/feldera

The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.

+0.1%

1.8K

total stars

Rust

#258

xflr6/graphviz

Simple Python interface for Graphviz, a popular open-source data visualization tool.

+0.1%

1.8K

total stars

Python

#259

galaxyproject/galaxy

An open-source, community-driven platform for data-intensive scientific analysis and visualization.

+0.1%

1.7K

total stars

Python

#260

huandu/go-sqlbuilder

A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.

+0.1%

1.7K

total stars

#261

uhub/awesome-matlab

A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.

+0.1%

1.7K

total stars

#262

babyfish-ct/jimmer

An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.

+0.1%

1.6K

total stars

Java

#263

reata/sqllineage

SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.

+0.1%

1.6K

total stars

Python

#264

jldbc/pybaseball

A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.

+0.1%

1.6K

total stars

Python

#265

pysal/pysal

PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.

+0.1%

1.5K

total stars

Python

#266

quantopian/empyrical

A Python library that provides common financial risk and performance metrics used in financial analysis.

+0.1%

1.5K

total stars

Python

#267

DrTimothyAldenDavis/SuiteSparse

A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.

+0.1%

1.5K

total stars

#268

felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+0.1%

1.4K

total stars

C++

#269

projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+0.1%

1.4K

total stars

Java

#270

tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+0.1%

1.4K

total stars

#271

event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+0.1%

1.4K

total stars

TypeScript

#272

PyTables/PyTables

A powerful Python package to manage and work with extremely large amounts of data.

+0.1%

1.4K

total stars

Python

#273

wx-chevalier/Database-Notes

A comprehensive collection of notes and resources for understanding different database technologies and concepts.

+0.1%

1.4K

total stars

HTML

#274

PyO3/rust-numpy

Rust-based bindings for the NumPy C-API, enabling developers to leverage Rust for numerical computing.

+0.1%

1.3K

total stars

Rust

#275

datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

+0.1%

1.3K

total stars

#276

jtv/libpqxx

The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.

+0.2%

1.3K

total stars

C++

#277

uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+0.2%

1.3K

total stars

TypeScript

#278

submato/xhscrawl

A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.

+0.2%

1.3K

total stars

#279

duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+0.2%

1.2K

total stars

Python

#280

JoinQuant/jqdatasdk

A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.

+0.2%

1.2K

total stars

Python

#281

lit26/finvizfinance

A Python library for financial analysis and data scraping from the Finviz platform.

+0.2%

1.2K

total stars

Jupyter Notebook

#282

RUCAIBox/RecSysDatasets

A repository of public data sources for building and testing recommender systems.

+0.2%

1.2K

total stars

Python

#283

ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets and tutorials for data science and data analysis in Python.

+0.2%

1.2K

total stars

Jupyter Notebook

#284

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

+0.2%

1.2K

total stars

Java

#285

mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

+0.2%

1.1K

total stars

#286

openspout/openspout

A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.

+0.2%

1.1K

total stars

PHP

#287

crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

+0.2%

1.1K

total stars

Python

#288

cuge1995/awesome-time-series

A curated list of resources for time series forecasting, including papers, code, and other materials.

+0.2%

1.0K

total stars

#289

google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

+0.2%

1.0K

total stars

TeX

#290

opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

+0.2%

1.0K

total stars

Python

#291

allenai/s2orc

A large-scale open-access corpus of scientific papers and metadata for researchers and developers.

+0.2%

1.0K

total stars

Python

#292