Trending Projects

Discover the fastest growing open source projects

Showing 751-800 of 897 trending projects

#751

getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.

+36

+2.3%

1.6K

total stars

Rust

#752

jbmusso/awesome-graph

A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.

+36

+3.0%

1.2K

total stars

#753

eleanorlutz/asteroids_atlas_of_space

This is an astronomy visualization project that maps orbits of asteroids in the solar system.

+35

+2.8%

1.3K

total stars

Jupyter Notebook

#754

PizzaDeDados/datascience-pizza

A repository for collecting study materials and resources related to data analysis and related fields.

+34

+1.4%

2.4K

total stars

#755

AileenNielsen/TimeSeriesAnalysisWithPython

A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.

+34

+1.8%

1.9K

total stars

Jupyter Notebook

#756

matrixorigin/matrixone

Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.

+34

+1.9%

1.9K

total stars

#757

JetBrains/xodus

Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.

+34

+2.8%

1.3K

total stars

Java

#758

Toblerity/Fiona

Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.

+33

+2.8%

1.2K

total stars

Python

#759

ResidentMario/geoplot

A high-level geospatial data visualization library for Python developers working with spatial data.

+33

+2.8%

1.2K

total stars

Python

#760

apache/accumulo

Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.

+33

+3.0%

1.1K

total stars

Java

#761

lacuna/bifurcan

A library of functional, durable data structures written in Java for developers building robust applications.

+33

+3.4%

1.0K

total stars

Java

#762

petl-developers/petl

A Python library for extracting, transforming, and loading tabular data.

+32

+2.5%

1.3K

total stars

Python

#763

JasonKessler/scattertext

A Python library for creating beautiful visualizations of language differences across document types.

+31

+1.4%

2.3K

total stars

Python

#764

dtinit/data-transfer-project

The Data Transfer Project enables direct transfer of user data between online service providers.

+30

+0.8%

3.6K

total stars

Java

#765

databricks/koalas

Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.

+30

+0.9%

3.4K

total stars

Python

#766

GeostatsGuy/PythonNumericalDemos

Python demos for spatial data analytics, geostatistics, and machine learning to support courses.

+30

+2.1%

1.5K

total stars

Jupyter Notebook

#767

gtoonstra/etl-with-airflow

This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.

+30

+2.3%

1.4K

total stars

Shell

#768

dblalock/bolt

A fast C++ library for high-performance matrix and vector operations.

+29

+1.2%

2.5K

total stars

C++

#769

nicolaspanel/numjs

A JavaScript library that provides a NumPy-like interface for working with multi-dimensional arrays and matrices.

+29

+1.2%

2.5K

total stars

JavaScript

#770

alan-turing-institute/CleverCSV

A Python package for handling messy CSV files with improved dialect detection and a command-line interface.

+29

+2.2%

1.3K

total stars

Python

#771

topepo/caret

An R package for training and plotting classification and regression models.

+28

+1.7%

1.7K

total stars

#772

marsupialtail/quokka

A scalable, distributed ETL framework for building data lake analytics pipelines.

+28

+2.4%

1.2K

total stars

Python

#773

man-group/arctic

A high-performance datastore for time series and tick data built on top of MongoDB.

+27

+0.9%

3.1K

total stars

Python

#774

Tencent/paxosstore

PaxosStore is a high-performance, distributed database solution built for large-scale applications.

+27

+1.6%

1.7K

total stars

C++

#775

easystats/easystats

An R project focused on providing high-performance statistical models, data analysis, and visualization tools.

+27

+2.4%

1.1K

total stars

#776

samayo/country-json

A simple JSON data set of country information, useful for building apps that need country data.

+27

+2.4%

1.1K

total stars

JavaScript

#777

tangwz/db-monthly

A collection of monthly reports on the internals of Alibaba Cloud's database products.

+27

+2.5%

1.1K

total stars

#778

owid/covid-19-data

COVID-19 data repository for developers, providing daily updated case, death, and testing information.

+26

+0.5%

5.7K

total stars

Python

#779

tylertreat/BoomFilters

Performant probabilistic data structures for processing continuous, unbounded streams in Go.

+26

+1.6%

1.6K

total stars

#780

jeremycole/innodb_diagrams

Diagrams and documentation for InnoDB, the storage engine used by MySQL and MariaDB databases.

+26

+1.8%

1.5K

total stars

#781

tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+26

+1.9%

1.4K

total stars

#782

data-forge/data-forge-ts

A TypeScript toolkit for data transformation and analysis inspired by Pandas and LINQ.

+26

+1.9%

1.4K

total stars

TypeScript

#783

microsoft/Trill

Trill is a single-node query processor for temporal or streaming data.

+26

+2.1%

1.3K

total stars

#784

spatie/db-dumper

A PHP library for dumping the contents of a database to a file, supporting multiple database engines.

+26

+2.3%

1.2K

total stars

PHP

#785

typelevel/skunk

A functional, type-safe, composable Scala data access library for Postgres databases.

+25

+1.6%

1.6K

total stars

Scala

#786

mongodb/mongo-hadoop

A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.

+25

+1.6%

1.6K

total stars

Java

#787

TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+25

+1.6%

1.6K

total stars

Jupyter Notebook

#788

couchbase/forestdb

A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.

+25

+1.9%

1.3K

total stars

C++

#789

GeospatialPython/pyshp

A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.

+25

+2.2%

1.1K

total stars

Python

#790

eigenteam/eigen-git-mirror

A high-performance C++ linear algebra library focused on solvers, sparse matrices, and numerical computing.

+24

+1.3%

1.8K

total stars

C++

#791

edyoda/data-science-complete-tutorial

This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.

+24

+1.3%

1.8K

total stars

Jupyter Notebook

#792

PyTables/PyTables

A powerful Python package to manage and work with extremely large amounts of data.

+24

+1.8%

1.4K

total stars

Python

#793

Kyubyong/numpy_exercises

A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.

+23

+1.3%

1.7K

total stars

Python

#794

sfirke/janitor

A collection of simple tools for data cleaning and wrangling in R for data science tasks.

+23

+1.6%

1.4K

total stars

#795

wainshine/Company-Names-Corpus

A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.

+23

+1.8%

1.3K

total stars

#796

ifsnop/mysqldump-php

A PHP library that provides a MySQL backup functionality, similar to the mysqldump CLI tool.

+23

+1.8%

1.3K

total stars

PHP

#797

quantopian/qgrid

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

+22

+0.7%

3.1K

total stars

Python

#798

oceanbase/seekdb

AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.

+22

+0.9%

2.4K

total stars

C++

#799

Image-Py/imagepy

A Python-based image processing framework with plugins for common image processing libraries.

+22

+1.6%

1.4K

total stars

Python

#800

jstat/jstat

A JavaScript statistical library that provides a wide range of statistical functions for data analysis.

+21

+1.2%

1.8K

total stars

JavaScript

1...1517 18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.