Explore Projects

Discover 24 open source projects

Active filters (1):
Search: deduplicationร—
Clear all

Showing 1-20 of 24 projects

restic/restic

Secure, efficient backup tool with deduplication and cloud storage support

32.5K
Active
Go
CLI Tools
CI/CD
#backup#deduplication#secure

borgbackup/borg

Borg is a deduplicating archiver with compression and authenticated encryption, suitable for backup purposes.

13.0K
Active
Python
API Frameworks
#backup#deduplication#compression

kopia/kopia

Kopia is a cross-platform, open-source backup tool with encryption, compression, and deduplication features.

12.7K
Active
Go
API Frameworks
#backup#encryption#compression

prometheus/alertmanager

Prometheus Alertmanager is a powerful open-source monitoring and notification tool for developers working in cloud environments.

8.4K
Active
Go
Monitoring
CLI Tools
#monitoring#notifications#alerts

arsenetar/dupeguru

A Python-based deduplication tool to find and remove duplicate files on your system.

7.4K
Active
Python
API Frameworks
CLI Tools
Python
#deduplication#file-management#python

bup/bup

A fast and efficient backup system for developers using git packfile format

7.3K
Active
Python
None
#backup#git#incremental-saves

idealo/imagededup

Duplicates images made easy with AI-powered image deduplication.

5.6K
Stable
Python
Prompt Engineering
PyTorch
#image-deduplication#computer-vision#python

openvenues/libpostal

A C library for parsing and normalizing international street addresses using statistical NLP and open geo data.

4.7K
Stable
C
API Clients & Testing
Record Linkage
#address-parsing#address-normalization#natural-language-processing

dedupeio/dedupe

A Python library for accurate and scalable fuzzy matching, record deduplication, and entity resolution.

4.4K
Experimental
Python
ORMs & Query Builders
API Frameworks
Python
#data-cleaning#entity-resolution#fuzzy-matching

Boris-code/feapder

A powerful Python-based web scraping framework with features like browser rendering and data deduplication.

3.6K
Stable
Python
Backend Frameworks
CLI Tools
Python
#crawler#scraper#web-scraping

rustic-rs/rustic

Fast, encrypted, and deduplicated Rust backup solution

2.9K
Active
Rust
#backup#deduplication#encryption

mhx/dwarfs

A fast, high-compression read-only file system with deduplication and support for multiple platforms.

2.5K
Active
C++
CLI Tools
API Frameworks
#archiving#compression#deduplication

borgmatic-collective/borgmatic

Simple, configuration-driven backup software for servers and workstations, supporting various databases and storage systems.

2.2K
Active
Python
API Frameworks
CLI Tools
Python
#backup#deduplication#monitoring

moj-analytical-services/splink

Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.

2.0K
Active
Python
Databases
ETL & Pipelines
Python
#data-matching#data-deduplication#entity-resolution

cupcakearmy/autorestic

Easy-to-use, config-driven CLI tool for the Restic backup system, with support for deduplication and incremental backups.

1.8K
Active
Go
CLI Tools
Realtime
#backup#restic#deduplication

PlakarKorp/plakar

A backup solution powered by Kloset and ptar, designed for vibe coders and AI developers.

1.7K
Active
Go
File Storage
Analytics & Tracking
#backups#deduplication#storage

puleos/object-hash

Object-hash is a JavaScript library for generating hashes from objects, useful for caching and deduplication.

1.5K
Archived
JavaScript
General Utilities
Node
#hashing#caching#deduplication

s3git/s3git

A distributed version control system for cloud storage that enables versioning and deduplication of large datasets.

1.5K
Archived
Go
API Frameworks
Databases
Go
#cloud-storage#decentralized#distributed

NVIDIA-NeMo/Curator

Scalable data pre processing and curation toolkit for Large Language Models (LLMs)

1.4K
Active
Python
Python
#data-curation#large-language-models#data-preparation

scinos/yarn-deduplicate

A deduplication tool for managing duplicate packages in Yarn lock files.

1.4K
Active
TypeScript
CLI Tools
Backend Frameworks
Node
#dedupe#duplicates#lock-file
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.