Trending Projects

Discover the fastest growing open source projects

Showing 601-650 of 897 trending projects

#601
jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

0
0.0%
1.7K
total stars
#602
eBay/akutan

A distributed knowledge graph store built in Go for managing large-scale semantic data.

0
0.0%
1.7K
total stars
#603
mozilla/mentat

A persistent, relational store inspired by Datomic and DataScript, written in Rust.

0
0.0%
1.7K
total stars
#604
Hiflylabs/awesome-dbt

A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.

0
0.0%
1.6K
total stars
#605
tylertreat/BoomFilters

Performant probabilistic data structures for processing continuous, unbounded streams in Go.

0
0.0%
1.6K
total stars
#606
cswinter/LocustDB

A blazingly fast analytics database built with Rust, optimized for rapidly devouring large amounts of data.

0
0.0%
1.6K
total stars
#607
typelevel/skunk

A functional, type-safe, composable Scala data access library for Postgres databases.

0
0.0%
1.6K
total stars
#608
Yimeng-Zhang/feature-engineering-and-feature-selection

A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.

0
0.0%
1.6K
total stars
#609
aergoio/litetree

SQLite with Branches - a lightweight, embedded database with version control capabilities.

0
0.0%
1.6K
total stars
#610
pointfreeco/sqlite-data

A fast, lightweight SQLite-based persistence layer with CloudKit synchronization for Swift developers.

0
0.0%
1.6K
total stars
#611
osm2pgsql-dev/osm2pgsql

A C++ library for importing OpenStreetMap data into a PostgreSQL/PostGIS database.

0
0.0%
1.6K
total stars
#612
github/covid19-dashboard

An open-source COVID-19 dashboard powered by the fastpages framework, featuring data visualizations.

0
0.0%
1.6K
total stars
#613
roboyoshi/datacurator-filetree

A standard filetree template for data curation and organization, useful for developers interested in data management.

0
0.0%
1.6K
total stars
#614
mongodb/mongo-hadoop

A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.

0
0.0%
1.6K
total stars
#615
js-data/js-data

A framework-agnostic, datastore-agnostic JavaScript ORM built for ease of use and peace of mind.

0
0.0%
1.6K
total stars
#616
cgarciae/pypeln

Concurrent data pipelines in Python for building efficient and scalable data processing workflows.

0
0.0%
1.6K
total stars
#617
huachaohuang/awesome-dbdev

A curated list of awesome materials and resources for database development.

0
0.0%
1.6K
total stars
#618
dineug/erd-editor

An open-source, TypeScript-based Entity-Relationship Diagram (ERD) editor for developers working with databases.

0
0.0%
1.6K
total stars
#619
SciTools/cartopy

Cartopy is a Python library for creating maps and visualizing spatial data with matplotlib support.

0
0.0%
1.6K
total stars
#620
delight-im/FreeGeoDB

A free database of geographic place names and corresponding geospatial data for developers to use.

0
0.0%
1.6K
total stars
#621
TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

0
0.0%
1.6K
total stars
#622
re-data/re-data

A data quality and observability tool for monitoring and fixing data issues before they become problems.

0
0.0%
1.6K
total stars
#623
antontarasenko/smq

A collection of SQL queries to analyze social media datasets.

0
0.0%
1.5K
total stars
#624
hi-primus/optimus

Agile data preparation workflows made easy with popular Python data science libraries.

0
0.0%
1.5K
total stars
#625
paradigmxyz/cryo

cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.

0
0.0%
1.5K
total stars
#626
aws-samples/aws-glue-samples

AWS Glue code samples for building data integration and ETL pipelines on AWS.

0
0.0%
1.5K
total stars
#627
cn/GB2260

A Python library for retrieving administrative division codes for China's GB/T 2260 standard.

0
0.0%
1.5K
total stars
#628
EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions for data science and machine learning developers.

0
0.0%
1.5K
total stars
#629
percona/percona-xtrabackup

Open source hot backup tool for InnoDB and XtraDB databases

0
0.0%
1.5K
total stars
#630
uwdata/arquero

A JavaScript library for efficient querying and transformation of array-backed data tables.

0
0.0%
1.5K
total stars
#631
google/tensorstore

A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.

0
0.0%
1.5K
total stars
#632
Awesome-Image-Registration-Organization/awesome-image-registration

A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.

0
0.0%
1.5K
total stars
#633
gobuffalo/pop

A Go ORM and query builder for interacting with databases in Go applications.

0
0.0%
1.5K
total stars
#634
karlseguin/the-little-mongodb-book

A concise guide to the MongoDB NoSQL database for developers.

0
0.0%
1.5K
total stars
#635
bashtage/arch

A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.

0
0.0%
1.5K
total stars
#636
Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

0
0.0%
1.5K
total stars
#637
json4s/json4s

A popular Scala library for parsing and manipulating JSON data in Scala applications.

0
0.0%
1.5K
total stars
#638
itbdw/ip-database

An offline IP database for developers to look up IP address geolocation information.

0
0.0%
1.5K
total stars
#639
pyjanitor-devs/pyjanitor

A Python library for cleaning and transforming data, inspired by the R package Janitor.

0
0.0%
1.5K
total stars
#640
Factual/drake

A data workflow tool for data engineers and analysts, similar to 'Make for data'.

0
0.0%
1.5K
total stars
#641
DataBrewery/cubes

A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.

0
0.0%
1.5K
total stars
#642
locationtech/geomesa

GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.

0
0.0%
1.5K
total stars
#643
skaiworldwide-oss/agensgraph

AgensGraph is a transactional graph database based on PostgreSQL for enterprise-level applications.

0
0.0%
1.5K
total stars
#644
shuttle-hq/synth

Synth is a Rust library for generating realistic, randomized test data for applications and databases.

0
0.0%
1.5K
total stars
#645
CamDavidsonPilon/lifetimes

A Python library for calculating customer lifetime value metrics and cohort analysis.

0
0.0%
1.5K
total stars
#646
dremio/dremio-oss

Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.

0
0.0%
1.5K
total stars
#647
Cyan4973/FiniteStateEntropy

A high-performance compression library written in C for developers working with large data sets.

0
0.0%
1.5K
total stars
#648
GeostatsGuy/PythonNumericalDemos

Python demos for spatial data analytics, geostatistics, and machine learning to support courses.

0
0.0%
1.5K
total stars
#649
Softmotions/ejdb

EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.

0
0.0%
1.5K
total stars
#650
jeremycole/innodb_diagrams

Diagrams and documentation for InnoDB, the storage engine used by MySQL and MariaDB databases.

0
0.0%
1.5K
total stars
1...1214...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.