Trending Projects

Discover the fastest growing open source projects

Showing 501-550 of 897 trending projects

#501
Factual/drake

A data workflow tool for data engineers and analysts, similar to 'Make for data'.

+63
+4.4%
1.5K
total stars
#502
damklis/DataEngineeringProject

An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.

+63
+4.8%
1.4K
total stars
#503
datalevin/datalevin

A simple, fast and versatile Datalog database written in Clojure for vibe coders.

+63
+4.8%
1.4K
total stars
#504
CliMA/Oceananigans.jl

A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.

+63
+5.2%
1.3K
total stars
#505
influxdata/influxdb-java

Java client library for connecting to the InfluxDB time series database.

+63
+5.5%
1.2K
total stars
#506
Teradata/kylo

Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.

+63
+6.0%
1.1K
total stars
#507
apache/couchdb

An open-source, scalable, and fault-tolerant NoSQL database with a focus on reliability and offline-first design.

+62
+0.9%
6.8K
total stars
#508
man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

+62
+2.9%
2.2K
total stars
#509
attaswift/BTree

A fast, in-memory B-tree implementation for sorted collections in Swift.

+62
+4.9%
1.3K
total stars
#510
deanmalmgren/textract

A Python library that provides a simple and unified interface for extracting text from any document format.

+61
+1.4%
4.5K
total stars
#511
rilldata/rill

Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.

+61
+2.5%
2.5K
total stars
#512
supabase/etl

A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.

+61
+2.9%
2.2K
total stars
#513
heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

+60
+0.4%
16.9K
total stars
#514
pentaho/pentaho-kettle

Pentaho Data Integration (ETL) is a Java-based tool for building data integration and ETL pipelines.

+60
+0.7%
8.3K
total stars
#515
paul-buerkner/brms

R package for Bayesian generalized multivariate non-linear multilevel models using Stan

+60
+4.5%
1.4K
total stars
#516
spark-examples/pyspark-examples

A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.

+60
+4.7%
1.3K
total stars
#517
armink/FlashDB

An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.

+59
+2.5%
2.4K
total stars
#518
Cysharp/MasterMemory

A C# in-memory document database with source generator-based embedded typed readonly data.

+59
+3.4%
1.8K
total stars
#519
apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

+58
+0.9%
6.4K
total stars
#520
biopython/biopython

Biopython is a set of Python modules that provide a wide range of functionality for bioinformatics, including DNA/RNA/protein sequence analysis, phylogenetics, and more.

+58
+1.2%
4.9K
total stars
#521
wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

+58
+4.3%
1.4K
total stars
#522
datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

+58
+4.7%
1.3K
total stars
#523
prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

+57
+0.3%
16.7K
total stars
#524
jvns/pandas-cookbook

Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.

+57
+0.8%
7.0K
total stars
#525
filodb/FiloDB

A distributed, scalable Prometheus-compatible time series database written in Scala.

+57
+4.1%
1.5K
total stars
#526
eralchemy/eralchemy

A Python tool that generates Entity Relationship Diagrams (ERDs) from SQLAlchemy models.

+57
+4.2%
1.4K
total stars
#527
mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+57
+5.5%
1.1K
total stars
#528
cbailes/awesome-deep-trading

A curated list of resources for machine learning-based algorithmic trading and quantitative finance.

+56
+3.1%
1.8K
total stars
#529
mkazhdan/PoissonRecon

Poisson Surface Reconstruction is a C++ library for reconstructing surfaces from point cloud data.

+56
+3.2%
1.8K
total stars
#530
bashtage/arch

A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.

+55
+3.8%
1.5K
total stars
#531
scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+55
+5.7%
1.0K
total stars
#532
neilotoole/sq

sq is a Go-based data wrangling tool that supports a variety of data formats and databases.

+54
+2.3%
2.5K
total stars
#533
data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

+54
+2.9%
1.9K
total stars
#534
dask/dask-tutorial

An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.

+54
+3.0%
1.9K
total stars
#535
golang/leveldb

The LevelDB key-value database in the Go programming language.

+54
+4.9%
1.2K
total stars
#536
Azure/AzurePublicDataset

Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.

+54
+5.2%
1.1K
total stars
#537
kblin/ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.

+54
+5.3%
1.1K
total stars
#538
caj2pdf/caj2pdf

A Python tool to convert CAJ (China Academic Journals) files to PDF for developers who work with academic literature.

+52
+1.7%
3.2K
total stars
#539
huggingface/datatrove

A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.

+52
+1.8%
2.9K
total stars
#540
rordenlab/dcm2niix

A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.

+52
+4.8%
1.1K
total stars
#541
wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

+51
+0.5%
10.4K
total stars
#542
sqldelight/sqldelight

SQLDelight - Generates type-safe Kotlin APIs from SQL, enabling easier database management in Kotlin projects.

+51
+0.8%
6.8K
total stars
#543
zarr-developers/zarr-python

An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.

+51
+2.7%
1.9K
total stars
#544
uber-archive/AthenaX

A scalable, SQL-based streaming analytics platform from Uber, built on top of Apache Flink.

+51
+4.3%
1.2K
total stars
#545
google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

+51
+5.2%
1.0K
total stars
#546
arangodb/arangodb

ArangoDB is a multi-model database supporting documents, graphs, and key-values for high-performance applications.

+50
+0.4%
14.1K
total stars
#547
Visualize-ML/Book6_First-Course-in-Data-Science

A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.

+50
+1.9%
2.6K
total stars
#548
apache/druid

Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.

+49
+0.3%
14.0K
total stars
#549
mgramin/awesome-db-tools

A curated list of awesome database tools and resources to make working with databases easier.

+49
+1.0%
5.0K
total stars
#550
apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

+49
+2.9%
1.7K
total stars
1...1012...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.