Trending Projects

Discover the fastest growing open source projects

Showing 651-700 of 897 trending projects

#651

jrfiedler/causal_inference_python_code

Python code for causal inference, a book by Miguel Hernán and James Robins.

+0.1%

1.3K

total stars

Jupyter Notebook

#652

spandanb/learndb-py

A Python library that implements database internals from scratch, useful for learning database concepts.

+0.1%

1.3K

total stars

Python

#653

x2bool/xlite

A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.

+0.1%

1.3K

total stars

Rust

#654

obspy/obspy

A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.

+0.2%

1.3K

total stars

Python

#655

pyexcel/pyexcel

A Python library for reading, manipulating, and writing data in various spreadsheet file formats.

+0.2%

1.3K

total stars

Python

#656

microsoft/Trill

Trill is a single-node query processor for temporal or streaming data.

+0.2%

1.3K

total stars

#657

rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+0.2%

1.3K

total stars

Jupyter Notebook

#658

meta-pytorch/data

A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.

+0.2%

1.2K

total stars

Python

#659

scijs/ndarray

A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.

+0.2%

1.2K

total stars

JavaScript

#660

nakabonne/tstorage

An embedded time-series database written in Go for storing and querying metrics data.

+0.2%

1.2K

total stars

#661

matplotlib/AnatomyOfMatplotlib

Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.

+0.2%

1.2K

total stars

Jupyter Notebook

#662

BlakeRMills/MetBrewer

A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.

+0.2%

1.2K

total stars

#663

cmu-db/ottertune

An automatic DBMS configuration tool for optimizing database performance.

+0.2%

1.2K

total stars

Python

#664

kevwan/go-stash

A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.

+0.2%

1.2K

total stars

#665

2ndQuadrant/pglogical

A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.

+0.2%

1.2K

total stars

#666

marsupialtail/quokka

A scalable, distributed ETL framework for building data lake analytics pipelines.

+0.2%

1.2K

total stars

Python

#667

JuliaStats/Distributions.jl

A comprehensive Julia library for probability distributions and related statistical functions.

+0.2%

1.2K

total stars

Julia

#668

abhishek-ch/around-dataengineering

A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.

+0.2%

1.1K

total stars

Python

#669

graphframes/graphframes

GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.

+0.2%

1.1K

total stars

Scala

#670

alecthw/mmdb_china_ip_list

A library for generating MaxMind GeoIP2 databases for China IP addresses.

+0.2%

1.1K

total stars

#671

tdpetrou/Learn-Pandas

This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.

+0.2%

1.1K

total stars

Jupyter Notebook

#672

mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+0.2%

1.1K

total stars

Rust

#673

bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.

+0.2%

1.0K

total stars

Scala

#674

TIBCOSoftware/snappydata

SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.

+0.2%

1.0K

total stars

Scala

#675

dataprofessor/code

Compilation of R and Python programming codes for data science and machine learning projects.

+0.2%

1.0K

total stars

Jupyter Notebook

#676

rstudio/pointblank

Data quality assessment and reporting tool for data frames and database tables in R

+0.2%

1.0K

total stars

#677

twosigma/flint

A time series library for Apache Spark that provides a high-level API for working with time series data.

+0.2%

1.0K

total stars

Scala

#678

syndtr/goleveldb

LevelDB key/value database in Go for building high-performance data-intensive applications.

+0.0%

6.3K

total stars

#679

dpilger26/NumCpp

A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.

+0.0%

3.9K

total stars

C++

#680

multiprocessio/datastation

A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.

+0.0%

3.0K

total stars

TypeScript

#681

griddb/griddb

GridDB is a fast and scalable open-source database for time-series IoT and big data applications.

+0.0%

2.5K

total stars

C++

#682

google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

+0.0%

2.4K

total stars

Python

#683

shancarter/mr-data-converter

A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.

+0.1%

2.0K

total stars

JavaScript

#684

eveningkid/denodb

A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.

+0.1%

1.9K

total stars

TypeScript

#685

baidu/tera

An Internet-scale distributed database system built on C++, inspired by Google's Bigtable.

+0.1%

1.9K

total stars

C++

#686

apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

+0.1%

1.9K

total stars

C++

#687

h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+0.1%

1.9K

total stars

C++

#688

neo4j-contrib/neo4j-apoc-procedures

A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.

+0.1%

1.9K

total stars

Java

#689

plant99/felicette

A Python library for processing and visualizing satellite imagery data.

+0.1%

1.8K

total stars

Python

#690

jstat/jstat

A JavaScript statistical library that provides a wide range of statistical functions for data analysis.

+0.1%

1.8K

total stars

JavaScript

#691

zonination/investing

This R library provides historical investment returns analysis for the overall stock market.

+0.1%

1.7K

total stars

#692

hadley/ggplot2-book

ggplot2 is a powerful data visualization library for R that provides elegant and flexible graphics.

+0.1%

1.7K

total stars

Perl

#693

TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+0.1%

1.6K

total stars

Jupyter Notebook

#694

json4s/json4s

A popular Scala library for parsing and manipulating JSON data in Scala applications.

+0.1%

1.5K

total stars

Scala

#695

CodeCutTech/Efficient_Python_tricks_and_tools_for_data_scientists

A collection of efficient Python tricks and tools for data scientists to improve their productivity.

+0.1%

1.5K

total stars

Jupyter Notebook

#696

FirebirdSQL/firebird

Firebird is a relational database management system (RDBMS) suitable for a wide range of applications from desktop to client-server to large databases.

+0.1%

1.4K

total stars

C++

#697

enthought/mayavi

A powerful 3D visualization library for scientific data in Python.

+0.1%

1.4K

total stars

Python

#698