Trending Projects

Discover the fastest growing open source projects

Showing 651-700 of 897 trending projects

#651
percona/percona-toolkit

Percona Toolkit is a collection of advanced open source database tools for MySQL, MongoDB, and PostgreSQL.

+34
+2.4%
1.5K
total stars
#652
objectbox/objectbox-go

Embedded Go Database, a fast open-source NoSQL database solution for Go projects.

+34
+2.8%
1.3K
total stars
#653
submato/xhscrawl

A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.

+34
+2.8%
1.3K
total stars
#654
avehtari/BDA_py_demos

Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.

+34
+3.4%
1.0K
total stars
#655
inloop/sqlite-viewer

A simple SQLite file viewer that allows you to view and explore SQLite databases online.

+34
+3.4%
1.0K
total stars
#656
cube2222/octosql

OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.

+33
+0.6%
5.2K
total stars
#657
RoaringBitmap/RoaringBitmap

A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.

+33
+0.9%
3.8K
total stars
#658
itbdw/ip-database

An offline IP database for developers to look up IP address geolocation information.

+33
+2.3%
1.5K
total stars
#659
rordenlab/dcm2niix

A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.

+33
+3.0%
1.1K
total stars
#660
moby/datakit

Connect processes into powerful data pipelines with a simple git-like filesystem interface

+33
+3.1%
1.1K
total stars
#661
has2k1/plotnine

A grammar of graphics library for creating highly customizable and publication-quality plots in Python.

+32
+0.7%
4.5K
total stars
#662
xtensor-stack/xtensor

A C++ library for multidimensional array operations with broadcasting and lazy computing.

+32
+0.9%
3.7K
total stars
#663
camelot-dev/camelot

A Python library for extracting tabular data from PDF files, useful for data processing and analysis.

+32
+0.9%
3.6K
total stars
#664
ekzhu/datasketch

A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.

+32
+1.1%
2.9K
total stars
#665
rogersce/cnpy

A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.

+32
+2.2%
1.5K
total stars
#666
hermitdave/FrequencyWords

A frequency word list generator and processed files for text analysis and natural language processing.

+32
+2.3%
1.5K
total stars
#667
AlexTheAnalyst/PortfolioProjects

This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.

+32
+2.3%
1.4K
total stars
#668
spandanb/learndb-py

A Python library that implements database internals from scratch, useful for learning database concepts.

+32
+2.5%
1.3K
total stars
#669
AlaSQL/alasql

AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.

+31
+0.4%
7.3K
total stars
#670
openaddresses/openaddresses

An open-source global repository of address, building, and parcel data for developers and geospatial applications.

+31
+1.0%
3.1K
total stars
#671
openbabel/openbabel

Open Babel is a chemical toolbox for working with chemical data and cheminformatics.

+31
+2.5%
1.3K
total stars
#672
scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

+31
+3.1%
1.0K
total stars
#673
jupyter/docker-stacks

Docker images containing Jupyter applications for data science and machine learning workflows.

+30
+0.4%
8.4K
total stars
#674
kurrent-io/KurrentDB

KurrentDB is an event-native database designed for modern software and event-driven architectures.

+30
+0.5%
5.7K
total stars
#675
datavane/tis

A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.

+30
+2.4%
1.3K
total stars
#676
hail-is/hail

Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.

+30
+2.9%
1.1K
total stars
#677
pydata/pandas-datareader

A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.

+29
+0.9%
3.2K
total stars
#678
igraph/python-igraph

Python interface for the igraph library, a powerful tool for network analysis and visualization.

+29
+2.1%
1.4K
total stars
#679
scratchdata/scratchdata

A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.

+29
+2.7%
1.1K
total stars
#680
apache/phoenix

Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.

+29
+2.8%
1.1K
total stars
#681
awslabs/deequ

Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.

+28
+0.8%
3.6K
total stars
#682
fluentmigrator/fluentmigrator

Fluent Migrator is a .NET migration framework for managing database schema changes across multiple database providers.

+28
+0.8%
3.5K
total stars
#683
uhub/awesome-matlab

A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.

+28
+1.7%
1.7K
total stars
#684
pysal/pysal

PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.

+28
+1.9%
1.5K
total stars
#685
opengeos/Awesome-GEE

A curated list of Google Earth Engine resources for geospatial analysis and remote sensing applications.

+28
+2.4%
1.2K
total stars
#686
xiaoxu193/PyTeaser

A Python library that summarizes news articles by extracting the most important sentences.

+28
+2.5%
1.2K
total stars
#687
cyang-kth/fmm

An open-source C++ framework for fast and parallel map matching of GPS trajectories.

+28
+2.8%
1.0K
total stars
#688
DotNetNext/SqlSugar

A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.

+27
+0.5%
5.8K
total stars
#689
apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

+27
+0.5%
5.6K
total stars
#690
amundsen-io/amundsen

Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.

+27
+0.6%
4.7K
total stars
#691
jdorfman/awesome-json-datasets

A curated list of awesome JSON datasets that don't require authentication.

+27
+0.8%
3.6K
total stars
#692
ApsaraDB/PolarDB-for-PostgreSQL

A cloud-native PostgreSQL database developed by Alibaba Cloud for high-performance, scalable data storage and management.

+27
+0.9%
3.1K
total stars
#693
orium/rpds

A Rust library that provides persistent data structures for efficient and immutable data management.

+27
+1.6%
1.7K
total stars
#694
DrTimothyAldenDavis/SuiteSparse

A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.

+27
+1.9%
1.5K
total stars
#695
event-driven-io/Pongo

Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.

+27
+2.0%
1.4K
total stars
#696
uwdata/mosaic

An extensible framework for linking databases and interactive views, focused on scalability and visualization.

+27
+2.2%
1.3K
total stars
#697
nakabonne/tstorage

An embedded time-series database written in Go for storing and querying metrics data.

+27
+2.2%
1.2K
total stars
#698
YuLab-SMU/clusterProfiler

A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.

+27
+2.4%
1.2K
total stars
#699
datastacktv/data-engineer-roadmap

This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.

+26
+0.2%
12.7K
total stars
#700
wainshine/Chinese-Names-Corpus

A Chinese name corpus and generator for natural language processing and entity recognition.

+26
+0.6%
4.3K
total stars
1...1315...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.