Trending Projects

Discover the fastest growing open source projects

Showing 801-850 of 897 trending projects

#801
locationtech/geomesa

GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.

+21
+1.4%
1.5K
total stars
#802
ucarGroup/DataLink

DataLink is a real-time and offline data exchange platform that supports synchronization between heterogeneous data sources.

+21
+1.9%
1.1K
total stars
#803
EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

+20
+1.4%
1.5K
total stars
#804
dask/dask-tutorial

An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.

+19
+1.0%
1.9K
total stars
#805
cswinter/LocustDB

A blazingly fast analytics database built with Rust, optimized for rapidly devouring large amounts of data.

+19
+1.2%
1.6K
total stars
#806
cgarciae/pypeln

Concurrent data pipelines in Python for building efficient and scalable data processing workflows.

+19
+1.2%
1.6K
total stars
#807
tensorbase/tensorbase

TensorBase is a new big data warehousing solution built with Rust, focused on high-performance analytics.

+19
+1.3%
1.5K
total stars
#808
Softmotions/ejdb

EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.

+18
+1.2%
1.5K
total stars
#809
quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data, built with TypeScript.

+18
+1.3%
1.4K
total stars
#810
mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+18
+1.7%
1.1K
total stars
#811
attic-labs/noms

The versioned, forkable, syncable database for developers who need a scalable, distributed data solution.

+17
+0.2%
7.4K
total stars
#812
wesm/feather

Feather is a fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.

+16
+0.6%
2.8K
total stars
#813
BlankerL/DXY-COVID-19-Data

A data warehouse for COVID-19 time series data, useful for data analysis and visualization.

+16
+0.7%
2.2K
total stars
#814
GiovineItalia/Gadfly.jl

Crafty statistical graphics library for the Julia programming language

+16
+0.8%
1.9K
total stars
#815
CamDavidsonPilon/lifetimes

A Python library for calculating customer lifetime value metrics and cohort analysis.

+16
+1.1%
1.5K
total stars
#816
karlseguin/the-little-redis-book

A book that teaches the basics of using the Redis in-memory data structure store.

+16
+1.1%
1.5K
total stars
#817
scijs/ndarray

A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.

+16
+1.3%
1.2K
total stars
#818
datacrypt-project/hitchhiker-tree

A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.

+16
+1.3%
1.2K
total stars
#819
juliasilge/tidytext

A library for text mining and natural language processing using tidy data principles in R.

+16
+1.4%
1.2K
total stars
#820
brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

+16
+1.5%
1.1K
total stars
#821
filodb/FiloDB

A distributed, scalable Prometheus-compatible time series database written in Scala.

+15
+1.0%
1.5K
total stars
#822
attaswift/BTree

A fast, in-memory B-tree implementation for sorted collections in Swift.

+15
+1.1%
1.3K
total stars
#823
schematics/schematics

Python data structures library focused on serialization, deserialization, and validation of complex data schemas.

+14
+0.5%
2.6K
total stars
#824
citusdata/cstore_fdw

A columnar storage extension for Postgres built as a foreign data wrapper.

+14
+0.8%
1.8K
total stars
#825
re-data/re-data

A data quality and observability tool for monitoring and fixing data issues before they become problems.

+14
+0.9%
1.6K
total stars
#826
pentaho/mondrian

Mondrian is an OLAP server that enables real-time analysis of large data sets for business users.

+14
+1.2%
1.2K
total stars
#827
ricklamers/gridstudio

Grid Studio is a web-based application for data science with full integration of open source data science frameworks and languages.

+13
+0.1%
8.9K
total stars
#828
eveningkid/denodb

A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.

+13
+0.7%
1.9K
total stars
#829
Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

+13
+0.9%
1.5K
total stars
#830
slashbase/slashbaseide

Modern database IDE for dev & data workflows, supporting MySQL, PostgreSQL & MongoDB.

+13
+1.0%
1.3K
total stars
#831
prisma/prisma1

Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.

+12
+0.1%
16.4K
total stars
#832
thinkaurelius/titan

Titan is a distributed graph database that can be used for building large-scale data-intensive applications.

+12
+0.2%
5.2K
total stars
#833
variety/variety

A MongoDB schema analysis tool that helps developers understand and optimize their NoSQL database.

+12
+0.7%
1.8K
total stars
#834
cmu-db/noisepage

Self-Driving Database Management System from Carnegie Mellon University

+12
+0.7%
1.8K
total stars
#835
cn/GB2260

A Python library for retrieving administrative division codes for China's GB/T 2260 standard.

+12
+0.8%
1.5K
total stars
#836
machow/siuba

Python library for using dplyr-like syntax with pandas and SQL databases

+12
+1.0%
1.2K
total stars
#837
RxSwiftCommunity/RxRealm

A Swift extension for RealmSwift that provides reactive programming support using RxSwift.

+12
+1.0%
1.2K
total stars
#838
RJT1990/pyflux

Open source time series library for Python, useful for statistical analysis and modeling.

+11
+0.5%
2.1K
total stars
#839
begeekmyfriend/bplustree

A fast B+ tree indexing structure in C for efficient storage and retrieval of billions of key-value pairs.

+11
+0.6%
1.9K
total stars
#840
moby/datakit

Connect processes into powerful data pipelines with a simple git-like filesystem interface

+11
+1.0%
1.1K
total stars
#841
rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

+11
+1.0%
1.1K
total stars
#842
jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

+10
+0.2%
4.7K
total stars
#843
FeatureBaseDB/featurebase

FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.

+10
+0.4%
2.5K
total stars
#844
lukasmartinelli/pgfutter

A tool to easily import CSV and JSON data into PostgreSQL databases.

+10
+0.8%
1.3K
total stars
#845
YelpArchive/dataset-examples

Sample datasets for users of the Yelp Academic Dataset, useful for data analysis and machine learning.

+10
+0.8%
1.3K
total stars
#846
scratchdata/scratchdata

A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.

+10
+0.9%
1.1K
total stars
#847
mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

+10
+0.9%
1.1K
total stars
#848
orbitjs/orbit

A composable data framework for building ambitious web applications using TypeScript.

+9
+0.4%
2.3K
total stars
#849
shancarter/mr-data-converter

A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.

+9
+0.5%
2.0K
total stars
#850
openacid/slim

A space-efficient trie data structure in Go with fast lookup performance.

+9
+0.5%
1.9K
total stars
1...1618

Stay in the loop

Get weekly updates on trending AI coding tools and projects.