Trending Projects

Discover the fastest growing open source projects

Showing 801-850 of 897 trending projects

#801

rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

0.0%

1.1K

total stars

Python

#802

marcboeker/go-duckdb

A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.

0.0%

1.1K

total stars

#803

eduosi/district

This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.

0.0%

1.1K

total stars

#804

docker-library/mongo

Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.

0.0%

1.1K

total stars

Shell

#805

brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

0.0%

1.1K

total stars

Jupyter Notebook

#806

intake/intake

Intake is a lightweight Python package for discovering, investigating, loading and distributing data.

0.0%

1.1K

total stars

Python

#807

jorgecarleitao/arrow2

A Rust library to work with the Arrow data format, without requiring the Transmute crate.

0.0%

1.1K

total stars

Rust

#808

RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

0.0%

1.1K

total stars

#809

patx/pickledb

An in-memory key-value store using Python's orjson module for persistence, with SQLite support.

0.0%

1.1K

total stars

Python

#810

ddotta/awesome-polars

A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.

0.0%

1.1K

total stars

#811

paulyoder/LinqToExcel

A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.

0.0%

1.1K

total stars

#812

kblin/ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.

0.0%

1.1K

total stars

Python

#813

SciRuby/daru

SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.

0.0%

1.1K

total stars

Ruby

#814

KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

0.0%

1.1K

total stars

Jupyter Notebook

#815

databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

0.0%

1.1K

total stars

Scala

#816

markwk/qs_ledger

A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.

0.0%

1.1K

total stars

Jupyter Notebook

#817

apache/phoenix

Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.

0.0%

1.1K

total stars

Java

#818

realm/realm-core

Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.

0.0%

1.0K

total stars

C++

#819

bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.

0.0%

1.0K

total stars

Scala

#820

josonle/Coding-Now

A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.

0.0%

1.0K

total stars

Python

#821

J535D165/recordlinkage

A powerful Python library for record linkage and duplicate detection in data-driven applications.

0.0%

1.0K

total stars

Python

#822

hannorein/rebound

An open-source N-body simulation library for astrophysics and planetary science.

0.0%

1.0K

total stars

#823

pixiedust/pixiedust

A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.

0.0%

1.0K

total stars

Jupyter Notebook

#824

avehtari/BDA_py_demos

Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.

0.0%

1.0K

total stars

Jupyter Notebook

#825

apache/celeborn

Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.

0.0%

1.0K

total stars

Java

#826

LAStools/LAStools

This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.

0.0%

1.0K

total stars

C++

#827

TIBCOSoftware/snappydata

SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.

0.0%

1.0K

total stars

Scala

#828

bashtage/linearmodels

This Python library provides additional linear models for statistical modeling and analysis.

0.0%

1.0K

total stars

Python

#829

inloop/sqlite-viewer

A simple SQLite file viewer that allows you to view and explore SQLite databases online.

0.0%

1.0K

total stars

JavaScript

#830

CJ-Chen/TBtools-II

A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.

0.0%

1.0K

total stars

Shell

#831

Kotlin/dataframe

A Kotlin library for structured data processing, suitable for data analysis and data science tasks.

0.0%

1.0K

total stars

Kotlin

#832

dataprofessor/code

Compilation of R and Python programming codes for data science and machine learning projects.

0.0%

1.0K

total stars

Jupyter Notebook

#833

axiomhq/hyperloglog

HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.

0.0%

1.0K

total stars

#834

IQSS/dataverse

Open source research data repository software built with Java.

0.0%

1.0K

total stars

Java

#835

shaiwz/data-platform-open

A no-code, visual data integration platform for building big data pipelines and workflows.

0.0%

1.0K

total stars

Java

#836

twosigma/flint

A time series library for Apache Spark that provides a high-level API for working with time series data.

0.0%

1.0K

total stars

Scala

#837

rstudio/pointblank

Data quality assessment and reporting tool for data frames and database tables in R

0.0%

1.0K

total stars

#838

OHDSI/CommonDataModel

A definition and DDLs for the OMOP Common Data Model (CDM), a data model for healthcare data.

0.0%

1.0K

total stars

HTML

#839

scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

0.0%

1.0K

total stars

#840

elliotchance/orderedmap

An ordered map implementation in Go with amortized O(1) performance for common operations.

0.0%

1.0K

total stars

#841

topling/toplingdb

ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.

0.0%

1.0K

total stars

C++

#842

lacuna/bifurcan

A library of functional, durable data structures written in Java for developers building robust applications.

0.0%

1.0K

total stars

Java

#843

opengeos/streamlit-geospatial

A multi-page Streamlit app for geospatial data visualization and analysis, useful for housing and real estate applications.

0.0%

1.0K

total stars

Python

#844

efficient/cuckoofilter

A space-efficient C++ implementation of the Cuckoo filter, a probabilistic data structure for set membership testing.

0.0%

1.0K

total stars

C++

#845

blaze/odo

A Python library for data migration and transformation in the Blaze project.

0.0%

1.0K

total stars

Python

#846

SciRuby/sciruby

SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.

0.0%

1.0K

total stars

Ruby

#847

CSSEGISandData/COVID-19

Real-time global and U.S. data tracking for developers and researchers.

-1

0.0%

29.0K

total stars

#848

alibaba/druid

Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.

-1

0.0%

28.2K

total stars

Java

#849

prisma/prisma1

Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.

-1

-0.0%

16.4K

total stars

Scala

#850

FavioVazquez/ds-cheatsheets

A comprehensive collection of data science cheatsheets for developers and data scientists.

-1

-0.0%

16.2K

total stars

1...1618

Stay in the loop

Get weekly updates on trending AI coding tools and projects.