Trending Projects

Discover the fastest growing open source projects

Showing 701-750 of 897 trending projects

#701
benedekrozemberczki/awesome-community-detection

A curated list of community detection research papers with implementations for data science and network analysis.

+53
+2.2%
2.4K
total stars
#702
capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

+53
+3.5%
1.5K
total stars
#703
alecthw/mmdb_china_ip_list

A library for generating MaxMind GeoIP2 databases for China IP addresses.

+53
+5.0%
1.1K
total stars
#704
huggingface/datatrove

A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.

+52
+1.8%
2.9K
total stars
#705
calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

+52
+4.5%
1.2K
total stars
#706
MongoEngine/mongoengine

MongoEngine is a Python Object-Document-Mapper (ODM) for working with MongoDB databases.

+51
+1.2%
4.4K
total stars
#707
h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+51
+2.8%
1.9K
total stars
#708
hadley/ggplot2-book

ggplot2 is a powerful data visualization library for R that provides elegant and flexible graphics.

+51
+3.2%
1.7K
total stars
#709
pyjanitor-devs/pyjanitor

A Python library for cleaning and transforming data, inspired by the R package Janitor.

+51
+3.6%
1.5K
total stars
#710
shuttle-hq/synth

Synth is a Rust library for generating realistic, randomized test data for applications and databases.

+51
+3.6%
1.5K
total stars
#711
obspy/obspy

A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.

+51
+4.1%
1.3K
total stars
#712
caserec/Datasets-for-Recommender-Systems

A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.

+51
+4.9%
1.1K
total stars
#713
meta-pytorch/data

A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.

+50
+4.2%
1.2K
total stars
#714
pyexcel/pyexcel

A Python library for reading, manipulating, and writing data in various spreadsheet file formats.

+49
+4.0%
1.3K
total stars
#715
rocketlaunchr/dataframe-go

A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.

+49
+4.0%
1.3K
total stars
#716
docker-library/mongo

Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.

+49
+4.8%
1.1K
total stars
#717
numetriclabz/awesome-db

A curated list of awesome database libraries, resources, and tools for developers.

+48
+3.7%
1.3K
total stars
#718
aditya-grover/node2vec

This Scala library provides a high-performance implementation of the node2vec algorithm for embedding graphs.

+47
+1.8%
2.7K
total stars
#719
chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

+47
+2.1%
2.3K
total stars
#720
oracle-samples/oracle-db-examples

This repository provides code examples for Oracle's AI-enabled database features and integrations.

+47
+3.5%
1.4K
total stars
#721
patx/pickledb

An in-memory key-value store using Python's orjson module for persistence, with SQLite support.

+47
+4.6%
1.1K
total stars
#722
griddb/griddb

GridDB is a fast and scalable open-source database for time-series IoT and big data applications.

+46
+1.9%
2.5K
total stars
#723
r-spatial/sf

An R package that provides support for simple features, a standardized way to encode spatial vector data.

+46
+3.3%
1.4K
total stars
#724
realm/realm-java

Realm is a mobile database that serves as a replacement for SQLite and ORMs.

+45
+0.4%
11.5K
total stars
#725
dbeaver/cloudbeaver

Cloud-based database manager UI for querying, managing, and visualizing databases across multiple platforms.

+45
+1.0%
4.7K
total stars
#726
andrewgbruce/statistics-for-data-scientists

This repository provides code and data for a book on statistics for data scientists.

+45
+3.9%
1.2K
total stars
#727
pydata/bottleneck

A fast, efficient C extension for NumPy that provides optimized array functions.

+45
+4.0%
1.2K
total stars
#728
abhishek-ch/around-dataengineering

A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.

+45
+4.1%
1.1K
total stars
#729
mysql/mysql-server

Open-source relational database engine powering web apps, APIs, and data-driven backends worldwide.

+44
+0.4%
12.2K
total stars
#730
liam-hq/liam

Automatically generates beautiful and easy-to-read ER diagrams from your database.

+44
+0.9%
4.7K
total stars
#731
ankane/groupdate

A Ruby library that makes it easy to group temporal data, useful for developers working with time-series data.

+44
+1.1%
3.9K
total stars
#732
citusdata/postgresql-hll

A PostgreSQL extension that adds HyperLogLog data structures as a native data type.

+44
+3.8%
1.2K
total stars
#733
JuliaStats/Distributions.jl

A comprehensive Julia library for probability distributions and related statistical functions.

+44
+3.9%
1.2K
total stars
#734
enthought/mayavi

A powerful 3D visualization library for scientific data in Python.

+43
+3.2%
1.4K
total stars
#735
PKUJohnson/OpenData

An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.

+43
+3.3%
1.4K
total stars
#736
airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professionals.

+42
+0.8%
5.5K
total stars
#737
alibaba/MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.

+42
+2.4%
1.8K
total stars
#738
mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

+42
+3.4%
1.3K
total stars
#739
liucongg/NLPDataSet

A repository containing various NLP datasets collected and organized by the owner.

+42
+4.0%
1.1K
total stars
#740
neozhaoliang/pywonderland

A Python library that provides a tour of the wonderland of math with visualizations and algorithms.

+41
+1.0%
4.2K
total stars
#741
multiprocessio/datastation

A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.

+41
+1.4%
3.0K
total stars
#742
sajal2692/data-science-portfolio

A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.

+41
+3.5%
1.2K
total stars
#743
dataquestio/project-walkthroughs

A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.

+41
+3.9%
1.1K
total stars
#744
wesm/msgvault

Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.

+40
+3.2%
1.3K
total stars
#745
8080labs/ppscore

A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.

+40
+3.5%
1.2K
total stars
#746
apachecn/python_data_analysis_and_mining_action

This Python repository contains code examples and notes for data analysis and mining.

+37
+2.1%
1.8K
total stars
#747
cnosdb/cnosdb

A high-performance, highly available, and distributed time series database written in Rust.

+37
+2.2%
1.7K
total stars
#748
nicodv/kmodes

Python library for clustering categorical data using k-modes and k-prototypes algorithms.

+37
+3.0%
1.3K
total stars
#749
nakabonne/tstorage

An embedded time-series database written in Go for storing and querying metrics data.

+37
+3.1%
1.2K
total stars
#750
datafold/data-diff

A Python library for comparing data across databases, supporting various database engines.

+36
+1.2%
3.0K
total stars
1...1416...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.