Trending Projects

Discover the fastest growing open source projects

Showing 751-800 of 897 trending projects

#751
getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.

+36
+2.3%
1.6K
total stars
#752
jbmusso/awesome-graph

A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.

+36
+3.0%
1.2K
total stars
#753
eleanorlutz/asteroids_atlas_of_space

This is an astronomy visualization project that maps orbits of asteroids in the solar system.

+35
+2.8%
1.3K
total stars
#754
PizzaDeDados/datascience-pizza

A repository for collecting study materials and resources related to data analysis and related fields.

+34
+1.4%
2.4K
total stars
#755
AileenNielsen/TimeSeriesAnalysisWithPython

A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.

+34
+1.8%
1.9K
total stars
#756
matrixorigin/matrixone

Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.

+34
+1.9%
1.9K
total stars
#757
JetBrains/xodus

Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.

+34
+2.8%
1.3K
total stars
#758
Toblerity/Fiona

Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.

+33
+2.8%
1.2K
total stars
#759
ResidentMario/geoplot

A high-level geospatial data visualization library for Python developers working with spatial data.

+33
+2.8%
1.2K
total stars
#760
apache/accumulo

Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.

+33
+3.0%
1.1K
total stars
#761
lacuna/bifurcan

A library of functional, durable data structures written in Java for developers building robust applications.

+33
+3.4%
1.0K
total stars
#762
petl-developers/petl

A Python library for extracting, transforming, and loading tabular data.

+32
+2.5%
1.3K
total stars
#763
JasonKessler/scattertext

A Python library for creating beautiful visualizations of language differences across document types.

+31
+1.4%
2.3K
total stars
#764
dtinit/data-transfer-project

The Data Transfer Project enables direct transfer of user data between online service providers.

+30
+0.8%
3.6K
total stars
#765
databricks/koalas

Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.

+30
+0.9%
3.4K
total stars
#766
GeostatsGuy/PythonNumericalDemos

Python demos for spatial data analytics, geostatistics, and machine learning to support courses.

+30
+2.1%
1.5K
total stars
#767
gtoonstra/etl-with-airflow

This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.

+30
+2.3%
1.4K
total stars
#768
dblalock/bolt

A fast C++ library for high-performance matrix and vector operations.

+29
+1.2%
2.5K
total stars
#769
nicolaspanel/numjs

A JavaScript library that provides a NumPy-like interface for working with multi-dimensional arrays and matrices.

+29
+1.2%
2.5K
total stars
#770
alan-turing-institute/CleverCSV

A Python package for handling messy CSV files with improved dialect detection and a command-line interface.

+29
+2.2%
1.3K
total stars
#771
topepo/caret

An R package for training and plotting classification and regression models.

+28
+1.7%
1.7K
total stars
#772
marsupialtail/quokka

A scalable, distributed ETL framework for building data lake analytics pipelines.

+28
+2.4%
1.2K
total stars
#773
man-group/arctic

A high-performance datastore for time series and tick data built on top of MongoDB.

+27
+0.9%
3.1K
total stars
#774
Tencent/paxosstore

PaxosStore is a high-performance, distributed database solution built for large-scale applications.

+27
+1.6%
1.7K
total stars
#775
easystats/easystats

An R project focused on providing high-performance statistical models, data analysis, and visualization tools.

+27
+2.4%
1.1K
total stars
#776
samayo/country-json

A simple JSON data set of country information, useful for building apps that need country data.

+27
+2.4%
1.1K
total stars
#777
tangwz/db-monthly

A collection of monthly reports on the internals of Alibaba Cloud's database products.

+27
+2.5%
1.1K
total stars
#778
owid/covid-19-data

COVID-19 data repository for developers, providing daily updated case, death, and testing information.

+26
+0.5%
5.7K
total stars
#779
tylertreat/BoomFilters

Performant probabilistic data structures for processing continuous, unbounded streams in Go.

+26
+1.6%
1.6K
total stars
#780
jeremycole/innodb_diagrams

Diagrams and documentation for InnoDB, the storage engine used by MySQL and MariaDB databases.

+26
+1.8%
1.5K
total stars
#781
tidyverse/tidyr

tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.

+26
+1.9%
1.4K
total stars
#782
data-forge/data-forge-ts

A TypeScript toolkit for data transformation and analysis inspired by Pandas and LINQ.

+26
+1.9%
1.4K
total stars
#783
microsoft/Trill

Trill is a single-node query processor for temporal or streaming data.

+26
+2.1%
1.3K
total stars
#784
spatie/db-dumper

A PHP library for dumping the contents of a database to a file, supporting multiple database engines.

+26
+2.3%
1.2K
total stars
#785
typelevel/skunk

A functional, type-safe, composable Scala data access library for Postgres databases.

+25
+1.6%
1.6K
total stars
#786
mongodb/mongo-hadoop

A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.

+25
+1.6%
1.6K
total stars
#787
TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+25
+1.6%
1.6K
total stars
#788
couchbase/forestdb

A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.

+25
+1.9%
1.3K
total stars
#789
GeospatialPython/pyshp

A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.

+25
+2.2%
1.1K
total stars
#790
eigenteam/eigen-git-mirror

A high-performance C++ linear algebra library focused on solvers, sparse matrices, and numerical computing.

+24
+1.3%
1.8K
total stars
#791
edyoda/data-science-complete-tutorial

This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.

+24
+1.3%
1.8K
total stars
#792
PyTables/PyTables

A powerful Python package to manage and work with extremely large amounts of data.

+24
+1.8%
1.4K
total stars
#793
Kyubyong/numpy_exercises

A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.

+23
+1.3%
1.7K
total stars
#794
sfirke/janitor

A collection of simple tools for data cleaning and wrangling in R for data science tasks.

+23
+1.6%
1.4K
total stars
#795
wainshine/Company-Names-Corpus

A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.

+23
+1.8%
1.3K
total stars
#796
ifsnop/mysqldump-php

A PHP library that provides a MySQL backup functionality, similar to the mysqldump CLI tool.

+23
+1.8%
1.3K
total stars
#797
quantopian/qgrid

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

+22
+0.7%
3.1K
total stars
#798
oceanbase/seekdb

AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.

+22
+0.9%
2.4K
total stars
#799
Image-Py/imagepy

A Python-based image processing framework with plugins for common image processing libraries.

+22
+1.6%
1.4K
total stars
#800
jstat/jstat

A JavaScript statistical library that provides a wide range of statistical functions for data analysis.

+21
+1.2%
1.8K
total stars
1...151718

Stay in the loop

Get weekly updates on trending AI coding tools and projects.