Trending Projects

Discover the fastest growing open source projects

Showing 251-300 of 897 trending projects

#251
meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

+2
+0.1%
2.4K
total stars
#252
h5py/h5py

A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.

+2
+0.1%
2.2K
total stars
#253
man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

+2
+0.1%
2.2K
total stars
#254
alibaba/clusterdata

A dataset of cluster data collected from Alibaba's production clusters for cluster management research.

+2
+0.1%
2.0K
total stars
#255
igraph/igraph

A powerful C library for analyzing complex networks and graph-based data structures.

+2
+0.1%
1.9K
total stars
#256
broadinstitute/gatk

Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.

+2
+0.1%
1.9K
total stars
#257
feldera/feldera

The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.

+2
+0.1%
1.8K
total stars
#258
xflr6/graphviz

Simple Python interface for Graphviz, a popular open-source data visualization tool.

+2
+0.1%
1.8K
total stars
#259
galaxyproject/galaxy

An open-source, community-driven platform for data-intensive scientific analysis and visualization.

+2
+0.1%
1.7K
total stars
#260
huandu/go-sqlbuilder

A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.

+2
+0.1%
1.7K
total stars
#261
uhub/awesome-matlab

A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.

+2
+0.1%
1.7K
total stars
#262
babyfish-ct/jimmer

An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.

+2
+0.1%
1.6K
total stars
#263
reata/sqllineage

SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.

+2
+0.1%
1.6K
total stars
#264
jldbc/pybaseball

A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.

+2
+0.1%
1.6K
total stars
#265
pysal/pysal

PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.

+2
+0.1%
1.5K
total stars
#266
quantopian/empyrical

A Python library that provides common financial risk and performance metrics used in financial analysis.

+2
+0.1%
1.5K
total stars
#267
DrTimothyAldenDavis/SuiteSparse

A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.

+2
+0.1%
1.5K
total stars
#268
felt/tippecanoe

Build vector tilesets from large collections of GeoJSON features.

+2
+0.1%
1.4K
total stars
#269
projectnessie/nessie

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

+2
+0.1%
1.4K
total stars
#270
tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+2
+0.1%
1.4K
total stars
#271
event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+2
+0.1%
1.4K
total stars
#272
PyTables/PyTables

A powerful Python package to manage and work with extremely large amounts of data.

+2
+0.1%
1.4K
total stars
#273
wx-chevalier/Database-Notes

A comprehensive collection of notes and resources for understanding different database technologies and concepts.

+2
+0.1%
1.4K
total stars
#274
PyO3/rust-numpy

Rust-based bindings for the NumPy C-API, enabling developers to leverage Rust for numerical computing.

+2
+0.1%
1.3K
total stars
#275
datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

+2
+0.1%
1.3K
total stars
#276
jtv/libpqxx

The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.

+2
+0.2%
1.3K
total stars
#277
uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+2
+0.2%
1.3K
total stars
#278
submato/xhscrawl

A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.

+2
+0.2%
1.3K
total stars
#279
duckdb/dbt-duckdb

A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.

+2
+0.2%
1.2K
total stars
#280
JoinQuant/jqdatasdk

A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.

+2
+0.2%
1.2K
total stars
#281
lit26/finvizfinance

A Python library for financial analysis and data scraping from the Finviz platform.

+2
+0.2%
1.2K
total stars
#282
RUCAIBox/RecSysDatasets

A repository of public data sources for building and testing recommender systems.

+2
+0.2%
1.2K
total stars
#283
ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets and tutorials for data science and data analysis in Python.

+2
+0.2%
1.2K
total stars
#284
zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

+2
+0.2%
1.2K
total stars
#285
mukunku/ParquetViewer

A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.

+2
+0.2%
1.1K
total stars
#286
openspout/openspout

A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.

+2
+0.2%
1.1K
total stars
#287
crazyhottommy/RNA-seq-analysis

This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.

+2
+0.2%
1.1K
total stars
#288
cuge1995/awesome-time-series

A curated list of resources for time series forecasting, including papers, code, and other materials.

+2
+0.2%
1.0K
total stars
#289
google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

+2
+0.2%
1.0K
total stars
#290
opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

+2
+0.2%
1.0K
total stars
#291
allenai/s2orc

A large-scale open-access corpus of scientific papers and metadata for researchers and developers.

+2
+0.2%
1.0K
total stars
#292
1eez/103976

A comprehensive English word database with translations, parts of speech, and definitions for developers.

+2
+0.2%
1.0K
total stars
#293
shencangsheng/easydb_app

EasyDB is a lightweight desktop app that lets you query local CSV, Excel, and JSON files with SQL, without an external database.

+2
+0.2%
995
total stars
#294
elastic/kibana

Kibana is an open-source data visualization and management tool for Elasticsearch

+1
0.0%
21.0K
total stars
#295
mybatis/mybatis-3

MyBatis SQL Mapper for Java simplifies database interactions with object mapping.

+1
0.0%
20.4K
total stars
#296
treeverse/dvc

dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.

+1
+0.0%
15.4K
total stars
#297
zhisheng17/flink-learning

This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.

+1
+0.0%
15.1K
total stars
#298
Tencent/wcdb

WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.

+1
+0.0%
11.7K
total stars
#299
great-expectations/great_expectations

A Python library that helps ensure data quality and reliability through data profiling and testing.

+1
+0.0%
11.2K
total stars
#300
microsoft/sql-server-samples

This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.

+1
+0.0%
10.9K
total stars
1...57...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.