Trending Projects

Discover the fastest growing open source projects

Showing 651-700 of 897 trending projects

#651
jrfiedler/causal_inference_python_code

Python code for causal inference, a book by Miguel Hernán and James Robins.

+2
+0.1%
1.3K
total stars
#652
spandanb/learndb-py

A Python library that implements database internals from scratch, useful for learning database concepts.

+2
+0.1%
1.3K
total stars
#653
x2bool/xlite

A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.

+2
+0.1%
1.3K
total stars
#654
obspy/obspy

A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.

+2
+0.2%
1.3K
total stars
#655
pyexcel/pyexcel

A Python library for reading, manipulating, and writing data in various spreadsheet file formats.

+2
+0.2%
1.3K
total stars
#656
microsoft/Trill

Trill is a single-node query processor for temporal or streaming data.

+2
+0.2%
1.3K
total stars
#657
rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

+2
+0.2%
1.3K
total stars
#658
meta-pytorch/data

A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.

+2
+0.2%
1.2K
total stars
#659
scijs/ndarray

A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.

+2
+0.2%
1.2K
total stars
#660
nakabonne/tstorage

An embedded time-series database written in Go for storing and querying metrics data.

+2
+0.2%
1.2K
total stars
#661
matplotlib/AnatomyOfMatplotlib

Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.

+2
+0.2%
1.2K
total stars
#662
BlakeRMills/MetBrewer

A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.

+2
+0.2%
1.2K
total stars
#663
cmu-db/ottertune

An automatic DBMS configuration tool for optimizing database performance.

+2
+0.2%
1.2K
total stars
#664
kevwan/go-stash

A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.

+2
+0.2%
1.2K
total stars
#665
2ndQuadrant/pglogical

A high-performance logical replication extension for PostgreSQL that enables fast, cross-version database replication.

+2
+0.2%
1.2K
total stars
#666
marsupialtail/quokka

A scalable, distributed ETL framework for building data lake analytics pipelines.

+2
+0.2%
1.2K
total stars
#667
JuliaStats/Distributions.jl

A comprehensive Julia library for probability distributions and related statistical functions.

+2
+0.2%
1.2K
total stars
#668
abhishek-ch/around-dataengineering

A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.

+2
+0.2%
1.1K
total stars
#669
graphframes/graphframes

GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.

+2
+0.2%
1.1K
total stars
#670
alecthw/mmdb_china_ip_list

A library for generating MaxMind GeoIP2 databases for China IP addresses.

+2
+0.2%
1.1K
total stars
#671
tdpetrou/Learn-Pandas

This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.

+2
+0.2%
1.1K
total stars
#672
mycelial/mycelite

Mycelite is a SQLite extension that enables replication between SQLite instances.

+2
+0.2%
1.1K
total stars
#673
bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.

+2
+0.2%
1.0K
total stars
#674
TIBCOSoftware/snappydata

SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.

+2
+0.2%
1.0K
total stars
#675
dataprofessor/code

Compilation of R and Python programming codes for data science and machine learning projects.

+2
+0.2%
1.0K
total stars
#676
rstudio/pointblank

Data quality assessment and reporting tool for data frames and database tables in R

+2
+0.2%
1.0K
total stars
#677
twosigma/flint

A time series library for Apache Spark that provides a high-level API for working with time series data.

+2
+0.2%
1.0K
total stars
#678
syndtr/goleveldb

LevelDB key/value database in Go for building high-performance data-intensive applications.

+1
+0.0%
6.3K
total stars
#679
dpilger26/NumCpp

A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.

+1
+0.0%
3.9K
total stars
#680
multiprocessio/datastation

A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.

+1
+0.0%
3.0K
total stars
#681
griddb/griddb

GridDB is a fast and scalable open-source database for time-series IoT and big data applications.

+1
+0.0%
2.5K
total stars
#682
google/youtube-8m

Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.

+1
+0.0%
2.4K
total stars
#683
shancarter/mr-data-converter

A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.

+1
+0.1%
2.0K
total stars
#684
eveningkid/denodb

A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.

+1
+0.1%
1.9K
total stars
#685
baidu/tera

An Internet-scale distributed database system built on C++, inspired by Google's Bigtable.

+1
+0.1%
1.9K
total stars
#686
apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

+1
+0.1%
1.9K
total stars
#687
h2oai/datatable

A high-performance, memory-efficient Python data analysis library for handling large datasets.

+1
+0.1%
1.9K
total stars
#688
neo4j-contrib/neo4j-apoc-procedures

A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.

+1
+0.1%
1.9K
total stars
#689
plant99/felicette

A Python library for processing and visualizing satellite imagery data.

+1
+0.1%
1.8K
total stars
#690
jstat/jstat

A JavaScript statistical library that provides a wide range of statistical functions for data analysis.

+1
+0.1%
1.8K
total stars
#691
zonination/investing

This R library provides historical investment returns analysis for the overall stock market.

+1
+0.1%
1.7K
total stars
#692
hadley/ggplot2-book

ggplot2 is a powerful data visualization library for R that provides elegant and flexible graphics.

+1
+0.1%
1.7K
total stars
#693
TomAugspurger/effective-pandas

A collection of articles and source code on using the pandas data analysis library.

+1
+0.1%
1.6K
total stars
#694
json4s/json4s

A popular Scala library for parsing and manipulating JSON data in Scala applications.

+1
+0.1%
1.5K
total stars
#695
CodeCutTech/Efficient_Python_tricks_and_tools_for_data_scientists

A collection of efficient Python tricks and tools for data scientists to improve their productivity.

+1
+0.1%
1.5K
total stars
#696
FirebirdSQL/firebird

Firebird is a relational database management system (RDBMS) suitable for a wide range of applications from desktop to client-server to large databases.

+1
+0.1%
1.4K
total stars
#697
enthought/mayavi

A powerful 3D visualization library for scientific data in Python.

+1
+0.1%
1.4K
total stars
#698
yhat/pandasql

pandasql is a Python library that allows developers to use SQL syntax to query Pandas DataFrames.

+1
+0.1%
1.3K
total stars
#699
lukasmartinelli/pgfutter

A tool to easily import CSV and JSON data into PostgreSQL databases.

+1
+0.1%
1.3K
total stars
#700
couchbase/forestdb

A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.

+1
+0.1%
1.3K
total stars
1...1315...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.