Trending Projects

Discover the fastest growing open source projects

Showing 751-800 of 897 trending projects

#751
mysql2sqlite/mysql2sqlite

Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.

+15
+0.8%
2.0K
total stars
#752
cnosdb/cnosdb

A high-performance, highly available, and distributed time series database written in Rust.

+15
+0.9%
1.7K
total stars
#753
capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

+15
+1.0%
1.5K
total stars
#754
andrewgbruce/statistics-for-data-scientists

This repository provides code and data for a book on statistics for data scientists.

+15
+1.3%
1.2K
total stars
#755
neo4j-contrib/neo4j-apoc-procedures

A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.

+14
+0.8%
1.9K
total stars
#756
skaiworldwide-oss/agensgraph

AgensGraph is a transactional graph database based on PostgreSQL for enterprise-level applications.

+14
+1.0%
1.5K
total stars
#757
PKUJohnson/OpenData

An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.

+14
+1.0%
1.4K
total stars
#758
numetriclabz/awesome-db

A curated list of awesome database libraries, resources, and tools for developers.

+14
+1.1%
1.3K
total stars
#759
sajal2692/data-science-portfolio

A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.

+14
+1.2%
1.2K
total stars
#760
wannesm/dtaidistance

A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.

+14
+1.2%
1.2K
total stars
#761
citusdata/postgresql-hll

A PostgreSQL extension that adds HyperLogLog data structures as a native data type.

+14
+1.2%
1.2K
total stars
#762
calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

+14
+1.2%
1.2K
total stars
#763
pydata/bottleneck

A fast, efficient C extension for NumPy that provides optimized array functions.

+14
+1.2%
1.2K
total stars
#764
robjhyndman/forecast

A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.

+14
+1.2%
1.2K
total stars
#765
apache/accumulo

Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.

+14
+1.3%
1.1K
total stars
#766
CSSEGISandData/COVID-19

Real-time global and U.S. data tracking for developers and researchers.

+13
+0.0%
29.0K
total stars
#767
airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professionals.

+13
+0.2%
5.5K
total stars
#768
sripathikrishnan/redis-rdb-tools

A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.

+13
+0.3%
5.2K
total stars
#769
gopherdata/gophernotes

The Go kernel for Jupyter notebooks and nteract, enabling data science and numerical computing in Go.

+13
+0.3%
4.0K
total stars
#770
fugue-project/fugue

A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.

+13
+0.6%
2.1K
total stars
#771
yhilpisch/py4fi

This is a Python library for financial applications, not a tool for AI-powered vibe coders.

+13
+0.7%
1.9K
total stars
#772
fonnesbeck/statistical-analysis-python-tutorial

A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.

+13
+0.8%
1.7K
total stars
#773
getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.

+13
+0.8%
1.6K
total stars
#774
nicodv/kmodes

Python library for clustering categorical data using k-modes and k-prototypes algorithms.

+13
+1.0%
1.3K
total stars
#775
BlakeRMills/MetBrewer

A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.

+13
+1.1%
1.2K
total stars
#776
facebook/mysql-5.6

This is Facebook's branch of the Oracle MySQL database, including the MyRocks storage engine.

+12
+0.5%
2.6K
total stars
#777
dblalock/bolt

A fast C++ library for high-performance matrix and vector operations.

+12
+0.5%
2.5K
total stars
#778
apache/bookkeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

+12
+0.6%
2.0K
total stars
#779
locationtech/geomesa

GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.

+12
+0.8%
1.5K
total stars
#780
tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+12
+0.8%
1.4K
total stars
#781
xitongsys/parquet-go

A pure Go library for reading and writing Parquet files, a columnar data format.

+12
+0.8%
1.4K
total stars
#782
petl-developers/petl

A Python library for extracting, transforming, and loading tabular data.

+12
+0.9%
1.3K
total stars
#783
mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

+12
+0.9%
1.3K
total stars
#784
EntilZha/PyFunctional

A Python library for creating data processing pipelines using functional programming principles.

+11
+0.4%
2.5K
total stars
#785
spandanb/learndb-py

A Python library that implements database internals from scratch, useful for learning database concepts.

+11
+0.8%
1.3K
total stars
#786
rocketlaunchr/dataframe-go

A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.

+11
+0.9%
1.3K
total stars
#787
kevwan/go-stash

A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.

+11
+0.9%
1.2K
total stars
#788
farzaa/gemini-bball

This is a Python library focused on basketball analytics and data processing.

+11
+1.0%
1.2K
total stars
#789
microsoft/azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases.

+10
+0.1%
7.7K
total stars
#790
jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

+10
+0.2%
4.7K
total stars
#791
multiprocessio/datastation

A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.

+10
+0.3%
3.0K
total stars
#792
h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+10
+0.5%
1.9K
total stars
#793
alibaba/MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.

+10
+0.6%
1.8K
total stars
#794
EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

+10
+0.7%
1.5K
total stars
#795
wainshine/Company-Names-Corpus

A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.

+10
+0.8%
1.3K
total stars
#796
schematics/schematics

Python data structures library focused on serialization, deserialization, and validation of complex data schemas.

+9
+0.3%
2.6K
total stars
#797
AileenNielsen/TimeSeriesAnalysisWithPython

A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.

+9
+0.5%
1.9K
total stars
#798
Kyubyong/numpy_exercises

A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.

+9
+0.5%
1.7K
total stars
#799
topepo/caret

An R package for training and plotting classification and regression models.

+9
+0.5%
1.7K
total stars
#800
tylertreat/BoomFilters

Performant probabilistic data structures for processing continuous, unbounded streams in Go.

+9
+0.6%
1.6K
total stars
1...151718

Stay in the loop

Get weekly updates on trending AI coding tools and projects.