Trending Projects

Discover the fastest growing open source projects

Showing 351-400 of 897 trending projects

#351

red-data-tools/pycall.rb

A library for calling Python functions from the Ruby language, enabling data science and ML workflows.

+106

+10.6%

1.1K

total stars

#352

pudo/dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

+105

+2.2%

4.9K

total stars

Python

#353

OvertureMaps/data

Overture Maps Data is a Python library providing access to open-source geographic data.

+105

+10.5%

1.1K

total stars

Python

#354

zvtvz/zvt

A modular quantitative trading framework for algorithmic trading, backtesting, and financial analysis.

+104

+2.7%

4.0K

total stars

Python

#355

uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+104

+9.0%

1.3K

total stars

TypeScript

#356

datasets/covid-19

This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.

+104

+9.9%

1.2K

total stars

Python

#357

couchbase/forestdb

A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.

+103

+8.4%

1.3K

total stars

C++

#358

egbertbouman/youtube-comment-downloader

Simple script for downloading YouTube comments without using the YouTube API.

+103

+9.4%

1.2K

total stars

Python

#359

apachecn/spark-doc-zh

This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.

+103

+9.5%

1.2K

total stars

JavaScript

#360

cn/GB2260

A Python library for retrieving administrative division codes for China's GB/T 2260 standard.

+102

+7.1%

1.5K

total stars

Python

#361

citusdata/postgresql-hll

A PostgreSQL extension that adds HyperLogLog data structures as a native data type.

+102

+9.2%

1.2K

total stars

#362

RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

+102

+10.6%

1.1K

total stars

#363

kedro-org/kedro

Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.

+100

+0.9%

10.8K

total stars

Python

#364

mysql2sqlite/mysql2sqlite

Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.

+100

+5.3%

2.0K

total stars

Awk

#365

opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

+99

+10.7%

1.0K

total stars

Python

#366

liyupi/sql-mother

A free, interactive SQL learning platform with an online SQL editor, real-time query results, and syntax highlighting.

+98

+2.5%

4.0K

total stars

TypeScript

#367

microsoft/sql-server-samples

This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.

+97

+0.9%

10.9K

total stars

#368

h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+97

+5.4%

1.9K

total stars

C++

#369

DataBrewery/cubes

A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.

+97

+7.0%

1.5K

total stars

Python

#370

eventql/eventql

Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.

+97

+9.0%

1.2K

total stars

C++

#371

mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

+97

+9.8%

1.1K

total stars

Java

#372

mattn/go-sqlite3

A lightweight SQLite3 driver for Go that implements the database/sql interface.

+96

+1.1%

9.0K

total stars

#373

OSGeo/gdal

GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.

+96

+1.7%

5.8K

total stars

C++

#374

TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+96

+6.5%

1.6K

total stars

Jupyter Notebook

#375

brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

+96

+9.8%

1.1K

total stars

Jupyter Notebook

#376

blaze/odo

A Python library for data migration and transformation in the Blaze project.

+96

+10.6%

1.0K

total stars

Python

#377

apache/fluss

Apache Fluss is a real-time streaming storage platform built for big data analytics.

+95

+5.5%

1.8K

total stars

Java

#378

wannesm/dtaidistance

A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.

+95

+8.5%

1.2K

total stars

Python

#379

petewarden/dstk

A collection of open data sets and tools for data science and machine learning tasks.

+95

+9.1%

1.1K

total stars

Ruby

#380

TurboWay/bigdata_analyse

This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.

+94

+1.9%

5.0K

total stars

Python

#381

cmu-db/bustub

An educational relational database management system (RDBMS) implementation in C++.

+94

+2.0%

4.9K

total stars

C++

#382

gtoonstra/etl-with-airflow

This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.

+94

+7.5%

1.4K

total stars

Shell

#383

sryza/spark-timeseries

A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.

+94

+8.5%

1.2K

total stars

Scala

#384

spatie/db-dumper

A PHP library for dumping the contents of a database to a file, supporting multiple database engines.

+94

+8.8%

1.2K

total stars

PHP

#385

traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

+94

+9.4%

1.1K

total stars

#386

iamseancheney/python_for_data_analysis_2nd_chinese_version

A Chinese translation of a popular book on using Python for data analysis with libraries like pandas and numpy.

+93

+1.1%

8.8K

total stars

#387

cantaro86/Financial-Models-Numerical-Methods

A collection of notebooks covering quantitative finance and numerical methods in Python.

+93

+1.4%

6.7K

total stars

Jupyter Notebook

#388

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

+93

+7.2%

1.4K

total stars

Java

#389

Toblerity/Fiona

Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.

+93

+8.2%

1.2K

total stars

Python

#390

ddsjoberg/gtsummary

An R package that provides customizable and presentation-ready data summary and analytic result tables.

+93

+8.6%

1.2K

total stars

#391

apachecn/pyda-2e-zh

A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.

+93

+9.3%

1.1K

total stars

CSS

#392

Data-Centric-AI-Community/ydata-profiling

A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.

+92

+0.7%

13.4K

total stars

Python

#393

apache/hugegraph

A highly scalable, high-performance graph database that supports over 100 billion data points.

+92

+3.2%

3.0K

total stars

Java

#394

zhihu/kids

A C++ library for processing data streams, potentially useful for vibe coders working with AI-powered tools.

+92

+8.1%

1.2K

total stars

C++

#395

apache/shardingsphere

Distributed SQL database middleware for sharding, scalability, and security

+91

+0.4%

20.7K

total stars

Java

#396

apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

+91

+4.2%

2.3K

total stars

Thrift

#397

tidyverse/tidyverse

A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.

+91

+5.4%

1.8K

total stars

#398

QueryKit/QueryKit

QueryKit is a simple CoreData query language for Swift and Objective-C developers.

+91

+6.7%

1.5K

total stars

Swift

#399

cozodb/cozo

A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.

+90

+2.4%

3.9K

total stars

Rust

#400

TobikoData/sqlmesh

Scalable and efficient data transformation framework with backwards compatibility for dbt.

+90

+3.2%

2.9K

total stars

Python

1...79...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.