Trending Projects

Discover the fastest growing open source projects

Showing 851-897 of 897 trending projects

#851
OpenRefine/OpenRefine

OpenRefine is a powerful data cleaning and transformation tool that helps developers work with messy data.

-1
-0.0%
11.8K
total stars
#852
rougier/scientific-visualization-book

An open-access book on scientific visualization using Python and Matplotlib for data-driven developers

-1
-0.0%
11.2K
total stars
#853
modin-project/modin

Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.

-1
-0.0%
10.4K
total stars
#854
jackzhenguo/python-small-examples

A collection of Python code examples and tutorials for data science, machine learning, and web development.

-1
-0.0%
8.1K
total stars
#855
microsoft/azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases.

-1
-0.0%
7.7K
total stars
#856
qinwf/awesome-R

A curated list of awesome R packages, frameworks and software for data analysis and data science.

-1
-0.0%
6.4K
total stars
#857
beamandrew/medical-data

No description provided for this medical data repository.

-1
-0.0%
6.0K
total stars
#858
owid/covid-19-data

COVID-19 data repository for developers, providing daily updated case, death, and testing information.

-1
-0.0%
5.7K
total stars
#859
sripathikrishnan/redis-rdb-tools

A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.

-1
-0.0%
5.2K
total stars
#860
SPLWare/esProc

esProc SPL is a JVM-based programming language for structured data computation, serving as both a data analysis tool and an embedded computing engine.

-1
-0.0%
4.7K
total stars
#861
BrambleXu/pydata-notebook

A collection of Jupyter Notebook files for data analysis using Python, including a Chinese translation of the popular 'Python for Data Analysis' book.

-1
-0.0%
4.7K
total stars
#862
first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

-1
-0.0%
4.3K
total stars
#863
canonical/dqlite

An embeddable, replicated, and fault-tolerant SQL engine for building robust and scalable applications.

-1
-0.0%
4.3K
total stars
#864
electricitymaps/electricitymaps-contrib

An open-source repository for parsing electricity data and powering a comprehensive electricity data platform.

-1
-0.0%
4.0K
total stars
#865
dpilger26/NumCpp

A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.

-1
-0.0%
3.9K
total stars
#866
xo/dbtpl

A command-line tool to generate idiomatic Go code for SQL databases across multiple database engines.

-1
-0.0%
3.9K
total stars
#867
jtablesaw/tablesaw

A high-performance Java library for data analysis, visualization, and machine learning.

-1
-0.0%
3.7K
total stars
#868
ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

-1
-0.0%
3.6K
total stars
#869
fluentmigrator/fluentmigrator

Fluent Migrator is a .NET migration framework for managing database schema changes across multiple database providers.

-1
-0.0%
3.5K
total stars
#870
WeBankFinTech/DataSphereStudio

DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.

-1
-0.0%
3.3K
total stars
#871
apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

-1
-0.0%
2.9K
total stars
#872
kayak/pypika

PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.

-1
-0.0%
2.9K
total stars
#873
posit-dev/great-tables

A Python library for creating easy-to-use, visually appealing data tables and summaries.

-1
-0.0%
2.6K
total stars
#874
FeatureBaseDB/featurebase

FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.

-1
-0.0%
2.5K
total stars
#875
chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

-1
-0.0%
2.3K
total stars
#876
bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

-1
-0.1%
2.0K
total stars
#877
brimdata/zui

Zui is a powerful desktop app for exploring and working with data, with support for CSV, JSON, and the Zed data format.

-1
-0.1%
1.9K
total stars
#878
apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

-1
-0.1%
1.9K
total stars
#879
matrixorigin/matrixone

Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.

-1
-0.1%
1.9K
total stars
#880
x-ream/sqli

A Java ORM SQL query builder that supports popular databases like ClickHouse, Impala, MySQL, and Presto.

-1
-0.1%
1.9K
total stars
#881
thbar/kiba

A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.

-1
-0.1%
1.8K
total stars
#882
obspy/obspy

A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.

-1
-0.1%
1.3K
total stars
#883
nicodv/kmodes

Python library for clustering categorical data using k-modes and k-prototypes algorithms.

-1
-0.1%
1.3K
total stars
#884
rocketlaunchr/dataframe-go

A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.

-1
-0.1%
1.3K
total stars
#885
meta-pytorch/data

A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.

-1
-0.1%
1.2K
total stars
#886
pentaho/mondrian

Mondrian is an OLAP server that enables real-time analysis of large data sets for business users.

-1
-0.1%
1.2K
total stars
#887
neumino/thinky

An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.

-1
-0.1%
1.1K
total stars
#888
gaarason/database-all

Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x

-1
-0.1%
1.1K
total stars
#889
facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

-1
-0.1%
1.0K
total stars
#890
sentinelsat/sentinelsat

A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.

-1
-0.1%
1.0K
total stars
#891
typicode/lowdb

Lightweight local JSON database for JavaScript/TypeScript apps

-2
-0.0%
22.5K
total stars
#892
datastacktv/data-engineer-roadmap

This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.

-2
-0.0%
12.7K
total stars
#893
snowplow/snowplow

A powerful customer data pipeline for collecting, processing, and analyzing user events and behavior.

-2
-0.0%
7.0K
total stars
#894
pudo/dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

-2
-0.0%
4.9K
total stars
#895
linhandev/dataset

A comprehensive index of medical imaging datasets for researchers and developers working in the medical imaging field.

-2
-0.1%
3.5K
total stars
#896
iskandr/fancyimpute

A Python library providing multivariate imputation and matrix completion algorithms.

-2
-0.2%
1.3K
total stars
#897
SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

-4
-0.0%
36.2K
total stars
1...17

Stay in the loop

Get weekly updates on trending AI coding tools and projects.