Trending Projects

Discover the fastest growing open source projects

Showing 801-850 of 897 trending projects

#801
rhiever/datacleaner

A Python tool that automatically cleans and preprocesses data for analysis and machine learning.

0
0.0%
1.1K
total stars
#802
marcboeker/go-duckdb

A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.

0
0.0%
1.1K
total stars
#803
eduosi/district

This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.

0
0.0%
1.1K
total stars
#804
docker-library/mongo

Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.

0
0.0%
1.1K
total stars
#805
brandon-rhodes/pycon-pandas-tutorial

A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.

0
0.0%
1.1K
total stars
#806
intake/intake

Intake is a lightweight Python package for discovering, investigating, loading and distributing data.

0
0.0%
1.1K
total stars
#807
jorgecarleitao/arrow2

A Rust library to work with the Arrow data format, without requiring the Transmute crate.

0
0.0%
1.1K
total stars
#808
RedisTimeSeries/RedisTimeSeries

A Redis module that provides a time series data structure for storing and querying time series data.

0
0.0%
1.1K
total stars
#809
patx/pickledb

An in-memory key-value store using Python's orjson module for persistence, with SQLite support.

0
0.0%
1.1K
total stars
#810
ddotta/awesome-polars

A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.

0
0.0%
1.1K
total stars
#811
paulyoder/LinqToExcel

A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.

0
0.0%
1.1K
total stars
#812
kblin/ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.

0
0.0%
1.1K
total stars
#813
SciRuby/daru

SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.

0
0.0%
1.1K
total stars
#814
KeithGalli/pandas

A Python library for data manipulation and analysis, part of the core data science toolkit.

0
0.0%
1.1K
total stars
#815
databricks/spark-csv

CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.

0
0.0%
1.1K
total stars
#816
markwk/qs_ledger

A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.

0
0.0%
1.1K
total stars
#817
apache/phoenix

Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.

0
0.0%
1.1K
total stars
#818
realm/realm-core

Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.

0
0.0%
1.0K
total stars
#819
bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.

0
0.0%
1.0K
total stars
#820
josonle/Coding-Now

A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.

0
0.0%
1.0K
total stars
#821
J535D165/recordlinkage

A powerful Python library for record linkage and duplicate detection in data-driven applications.

0
0.0%
1.0K
total stars
#822
hannorein/rebound

An open-source N-body simulation library for astrophysics and planetary science.

0
0.0%
1.0K
total stars
#823
pixiedust/pixiedust

A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.

0
0.0%
1.0K
total stars
#824
avehtari/BDA_py_demos

Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.

0
0.0%
1.0K
total stars
#825
apache/celeborn

Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.

0
0.0%
1.0K
total stars
#826
LAStools/LAStools

This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.

0
0.0%
1.0K
total stars
#827
TIBCOSoftware/snappydata

SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.

0
0.0%
1.0K
total stars
#828
bashtage/linearmodels

This Python library provides additional linear models for statistical modeling and analysis.

0
0.0%
1.0K
total stars
#829
inloop/sqlite-viewer

A simple SQLite file viewer that allows you to view and explore SQLite databases online.

0
0.0%
1.0K
total stars
#830
CJ-Chen/TBtools-II

A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.

0
0.0%
1.0K
total stars
#831
Kotlin/dataframe

A Kotlin library for structured data processing, suitable for data analysis and data science tasks.

0
0.0%
1.0K
total stars
#832
dataprofessor/code

Compilation of R and Python programming codes for data science and machine learning projects.

0
0.0%
1.0K
total stars
#833
axiomhq/hyperloglog

HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.

0
0.0%
1.0K
total stars
#834
IQSS/dataverse

Open source research data repository software built with Java.

0
0.0%
1.0K
total stars
#835
shaiwz/data-platform-open

A no-code, visual data integration platform for building big data pipelines and workflows.

0
0.0%
1.0K
total stars
#836
twosigma/flint

A time series library for Apache Spark that provides a high-level API for working with time series data.

0
0.0%
1.0K
total stars
#837
rstudio/pointblank

Data quality assessment and reporting tool for data frames and database tables in R

0
0.0%
1.0K
total stars
#838
OHDSI/CommonDataModel

A definition and DDLs for the OMOP Common Data Model (CDM), a data model for healthcare data.

0
0.0%
1.0K
total stars
#839
scylladb/gocqlx

A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.

0
0.0%
1.0K
total stars
#840
elliotchance/orderedmap

An ordered map implementation in Go with amortized O(1) performance for common operations.

0
0.0%
1.0K
total stars
#841
topling/toplingdb

ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.

0
0.0%
1.0K
total stars
#842
lacuna/bifurcan

A library of functional, durable data structures written in Java for developers building robust applications.

0
0.0%
1.0K
total stars
#843
opengeos/streamlit-geospatial

A multi-page Streamlit app for geospatial data visualization and analysis, useful for housing and real estate applications.

0
0.0%
1.0K
total stars
#844
efficient/cuckoofilter

A space-efficient C++ implementation of the Cuckoo filter, a probabilistic data structure for set membership testing.

0
0.0%
1.0K
total stars
#845
blaze/odo

A Python library for data migration and transformation in the Blaze project.

0
0.0%
1.0K
total stars
#846
SciRuby/sciruby

SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.

0
0.0%
1.0K
total stars
#847
CSSEGISandData/COVID-19

Real-time global and U.S. data tracking for developers and researchers.

-1
0.0%
29.0K
total stars
#848
alibaba/druid

Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.

-1
0.0%
28.2K
total stars
#849
prisma/prisma1

Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.

-1
-0.0%
16.4K
total stars
#850
FavioVazquez/ds-cheatsheets

A comprehensive collection of data science cheatsheets for developers and data scientists.

-1
-0.0%
16.2K
total stars
1...1618

Stay in the loop

Get weekly updates on trending AI coding tools and projects.