Trending Projects

Discover the fastest growing open source projects

Showing 751-800 of 897 trending projects

#751

mysql2sqlite/mysql2sqlite

Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.

+15

+0.8%

2.0K

total stars

Awk

#752

cnosdb/cnosdb

A high-performance, highly available, and distributed time series database written in Rust.

+15

+0.9%

1.7K

total stars

Rust

#753

capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

+15

+1.0%

1.5K

total stars

Python

#754

andrewgbruce/statistics-for-data-scientists

This repository provides code and data for a book on statistics for data scientists.

+15

+1.3%

1.2K

total stars

#755

neo4j-contrib/neo4j-apoc-procedures

A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.

+14

+0.8%

1.9K

total stars

Java

#756

skaiworldwide-oss/agensgraph

AgensGraph is a transactional graph database based on PostgreSQL for enterprise-level applications.

+14

+1.0%

1.5K

total stars

#757

PKUJohnson/OpenData

An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.

+14

+1.0%

1.4K

total stars

Python

#758

numetriclabz/awesome-db

A curated list of awesome database libraries, resources, and tools for developers.

+14

+1.1%

1.3K

total stars

#759

sajal2692/data-science-portfolio

A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.

+14

+1.2%

1.2K

total stars

Jupyter Notebook

#760

wannesm/dtaidistance

A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.

+14

+1.2%

1.2K

total stars

Python

#761

citusdata/postgresql-hll

A PostgreSQL extension that adds HyperLogLog data structures as a native data type.

+14

+1.2%

1.2K

total stars

#762

calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

+14

+1.2%

1.2K

total stars

Shell

#763

pydata/bottleneck

A fast, efficient C extension for NumPy that provides optimized array functions.

+14

+1.2%

1.2K

total stars

Python

#764

robjhyndman/forecast

A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.

+14

+1.2%

1.2K

total stars

#765

apache/accumulo

Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.

+14

+1.3%

1.1K

total stars

Java

#766

CSSEGISandData/COVID-19

Real-time global and U.S. data tracking for developers and researchers.

+13

+0.0%

29.0K

total stars

#767

airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professionals.

+13

+0.2%

5.5K

total stars

Python

#768

sripathikrishnan/redis-rdb-tools

A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.

+13

+0.3%

5.2K

total stars

Python

#769

gopherdata/gophernotes

The Go kernel for Jupyter notebooks and nteract, enabling data science and numerical computing in Go.

+13

+0.3%

4.0K

total stars

#770

fugue-project/fugue

A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.

+13

+0.6%

2.1K

total stars

Python

#771

yhilpisch/py4fi

This is a Python library for financial applications, not a tool for AI-powered vibe coders.

+13

+0.7%

1.9K

total stars

Jupyter Notebook

#772

fonnesbeck/statistical-analysis-python-tutorial

A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.

+13

+0.8%

1.7K

total stars

HTML

#773

getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.

+13

+0.8%

1.6K

total stars

Rust

#774

nicodv/kmodes

Python library for clustering categorical data using k-modes and k-prototypes algorithms.

+13

+1.0%

1.3K

total stars

Python

#775

BlakeRMills/MetBrewer

A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.

+13

+1.1%

1.2K

total stars

#776

facebook/mysql-5.6

This is Facebook's branch of the Oracle MySQL database, including the MyRocks storage engine.

+12

+0.5%

2.6K

total stars

C++

#777

dblalock/bolt

A fast C++ library for high-performance matrix and vector operations.

+12

+0.5%

2.5K

total stars

C++

#778

apache/bookkeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

+12

+0.6%

2.0K

total stars

Java

#779

locationtech/geomesa

GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.

+12

+0.8%

1.5K

total stars

Scala

#780

tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+12

+0.8%

1.4K

total stars

#781

xitongsys/parquet-go

A pure Go library for reading and writing Parquet files, a columnar data format.

+12

+0.8%

1.4K

total stars

#782

petl-developers/petl

A Python library for extracting, transforming, and loading tabular data.

+12

+0.9%

1.3K

total stars

Python

#783

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

+12

+0.9%

1.3K

total stars

Jupyter Notebook

#784

EntilZha/PyFunctional

A Python library for creating data processing pipelines using functional programming principles.

+11

+0.4%

2.5K

total stars

Python

#785

spandanb/learndb-py

A Python library that implements database internals from scratch, useful for learning database concepts.

+11

+0.8%

1.3K

total stars

Python

#786

rocketlaunchr/dataframe-go

A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.

+11

+0.9%

1.3K

total stars

#787

kevwan/go-stash

A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.

+11

+0.9%

1.2K

total stars

#788

farzaa/gemini-bball

This is a Python library focused on basketball analytics and data processing.

+11

+1.0%

1.2K

total stars

Python

#789

microsoft/azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases.

+10

+0.1%

7.7K

total stars

TypeScript

#790

jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

+10

+0.2%

4.7K

total stars

TypeScript

#791

multiprocessio/datastation

A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.

+10

+0.3%

3.0K

total stars

TypeScript

#792

h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+10

+0.5%

1.9K

total stars

C++

#793

alibaba/MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.

+10

+0.6%

1.8K

total stars

#794

EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

+10

+0.7%

1.5K

total stars

HTML

#795

wainshine/Company-Names-Corpus

A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.

+10

+0.8%

1.3K

total stars

#796

schematics/schematics

Python data structures library focused on serialization, deserialization, and validation of complex data schemas.

+0.3%

2.6K

total stars

Python

#797

AileenNielsen/TimeSeriesAnalysisWithPython

A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.

+0.5%

1.9K

total stars

Jupyter Notebook

#798

Kyubyong/numpy_exercises

A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.

+0.5%

1.7K

total stars

Python

#799

topepo/caret

An R package for training and plotting classification and regression models.

+0.5%

1.7K

total stars

#800

tylertreat/BoomFilters

Performant probabilistic data structures for processing continuous, unbounded streams in Go.

+0.6%

1.6K

total stars

1...1517 18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.