Trending Projects

Discover the fastest growing open source projects

Showing 351-400 of 897 trending projects

#351
red-data-tools/pycall.rb

A library for calling Python functions from the Ruby language, enabling data science and ML workflows.

+106
+10.6%
1.1K
total stars
#352
pudo/dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

+105
+2.2%
4.9K
total stars
#353
OvertureMaps/data

Overture Maps Data is a Python library providing access to open-source geographic data.

+105
+10.5%
1.1K
total stars
#354
zvtvz/zvt

A modular quantitative trading framework for algorithmic trading, backtesting, and financial analysis.

+104
+2.7%
4.0K
total stars
#355
uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+104
+9.0%
1.3K
total stars
#356
datasets/covid-19

This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.

+104
+9.9%
1.2K
total stars
#357
couchbase/forestdb

A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.

+103
+8.4%
1.3K
total stars
#358
egbertbouman/youtube-comment-downloader

Simple script for downloading YouTube comments without using the YouTube API.

+103
+9.4%
1.2K
total stars
#359
apachecn/spark-doc-zh

This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.

+103
+9.5%
1.2K
total stars
#360
cn/GB2260

A Python library for retrieving administrative division codes for China's GB/T 2260 standard.

+102
+7.1%
1.5K
total stars
#361
citusdata/postgresql-hll

A PostgreSQL extension that adds HyperLogLog data structures as a native data type.

+102
+9.2%
1.2K
total stars
#362
RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

+102
+10.6%
1.1K
total stars
#363
kedro-org/kedro

Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.

+100
+0.9%
10.8K
total stars
#364
mysql2sqlite/mysql2sqlite

Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.

+100
+5.3%
2.0K
total stars
#365
opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

+99
+10.7%
1.0K
total stars
#366
liyupi/sql-mother

A free, interactive SQL learning platform with an online SQL editor, real-time query results, and syntax highlighting.

+98
+2.5%
4.0K
total stars
#367
microsoft/sql-server-samples

This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.

+97
+0.9%
10.9K
total stars
#368
h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+97
+5.4%
1.9K
total stars
#369
DataBrewery/cubes

A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.

+97
+7.0%
1.5K
total stars
#370
eventql/eventql

Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.

+97
+9.0%
1.2K
total stars
#371
mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

+97
+9.8%
1.1K
total stars
#372
mattn/go-sqlite3

A lightweight SQLite3 driver for Go that implements the database/sql interface.

+96
+1.1%
9.0K
total stars
#373
OSGeo/gdal

GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.

+96
+1.7%
5.8K
total stars
#374
TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+96
+6.5%
1.6K
total stars
#375
brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

+96
+9.8%
1.1K
total stars
#376
blaze/odo

A Python library for data migration and transformation in the Blaze project.

+96
+10.6%
1.0K
total stars
#377
apache/fluss

Apache Fluss is a real-time streaming storage platform built for big data analytics.

+95
+5.5%
1.8K
total stars
#378
wannesm/dtaidistance

A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.

+95
+8.5%
1.2K
total stars
#379
petewarden/dstk

A collection of open data sets and tools for data science and machine learning tasks.

+95
+9.1%
1.1K
total stars
#380
TurboWay/bigdata_analyse

This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.

+94
+1.9%
5.0K
total stars
#381
cmu-db/bustub

An educational relational database management system (RDBMS) implementation in C++.

+94
+2.0%
4.9K
total stars
#382
gtoonstra/etl-with-airflow

This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.

+94
+7.5%
1.4K
total stars
#383
sryza/spark-timeseries

A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.

+94
+8.5%
1.2K
total stars
#384
spatie/db-dumper

A PHP library for dumping the contents of a database to a file, supporting multiple database engines.

+94
+8.8%
1.2K
total stars
#385
traildb/traildb

TrailDB is an efficient database for storing and querying series of events.

+94
+9.4%
1.1K
total stars
#386
iamseancheney/python_for_data_analysis_2nd_chinese_version

A Chinese translation of a popular book on using Python for data analysis with libraries like pandas and numpy.

+93
+1.1%
8.8K
total stars
#387
cantaro86/Financial-Models-Numerical-Methods

A collection of notebooks covering quantitative finance and numerical methods in Python.

+93
+1.4%
6.7K
total stars
#388
opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

+93
+7.2%
1.4K
total stars
#389
Toblerity/Fiona

Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.

+93
+8.2%
1.2K
total stars
#390
ddsjoberg/gtsummary

An R package that provides customizable and presentation-ready data summary and analytic result tables.

+93
+8.6%
1.2K
total stars
#391
apachecn/pyda-2e-zh

A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.

+93
+9.3%
1.1K
total stars
#392
Data-Centric-AI-Community/ydata-profiling

A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.

+92
+0.7%
13.4K
total stars
#393
apache/hugegraph

A highly scalable, high-performance graph database that supports over 100 billion data points.

+92
+3.2%
3.0K
total stars
#394
zhihu/kids

A C++ library for processing data streams, potentially useful for vibe coders working with AI-powered tools.

+92
+8.1%
1.2K
total stars
#395
apache/shardingsphere

Distributed SQL database middleware for sharding, scalability, and security

+91
+0.4%
20.7K
total stars
#396
apache/parquet-format

Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.

+91
+4.2%
2.3K
total stars
#397
tidyverse/tidyverse

A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.

+91
+5.4%
1.8K
total stars
#398
QueryKit/QueryKit

QueryKit is a simple CoreData query language for Swift and Objective-C developers.

+91
+6.7%
1.5K
total stars
#399
cozodb/cozo

A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.

+90
+2.4%
3.9K
total stars
#400
TobikoData/sqlmesh

Scalable and efficient data transformation framework with backwards compatibility for dbt.

+90
+3.2%
2.9K
total stars
1...79...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.