Explore Projects

Discover 26 open source projects

Active filters (1):
Search: preprocessingร—
Clear all

Showing 1-20 of 26 projects

huggingface/datasets

AI-powered dataset management and preprocessing library for ML projects

21.2K
Active
Python
ML Ops
ETL & Pipelines
HuggingFace
#datasets#ml-ops#data-preprocessing

Unstructured-IO/unstructured

Unstructured is an open-source ETL solution for transforming complex documents into structured data for language models.

14.1K
Active
HTML
Document Processing
#document-processing#data-pipelines#natural-language-processing

idealo/imagededup

Duplicates images made easy with AI-powered image deduplication.

5.6K
Stable
Python
Prompt Engineering
PyTorch
#image-deduplication#computer-vision#python

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

dongrixinyu/JioNLP

A comprehensive Chinese NLP preprocessing and parsing package with high accuracy, efficiency, and ease of use.

3.8K
Stable
Python
NLP Frameworks
API Frameworks
Python
#natural-language-processing#preprocessing#parsing

nidhaloff/igel

A delightful machine learning tool that allows you to train, test, and use models without writing code

3.1K
Stable
Python
React
#machine-learning#automl#neural-networks

jbesomi/texthero

A Python library for text preprocessing, representation, and visualization in machine learning and NLP projects.

2.9K
Archived
Python
Text Preprocessing
Text Representation
Python
#nlp#text-mining#word-embeddings

reworkcss/rework

A plugin framework for CSS preprocessing in Node.js

2.8K
Archived
JavaScript
Component Libraries (React)
React
#css-preprocessing#plugin-framework#node-js

blmoistawinde/HarvestText

A versatile NLP toolkit for text mining and preprocessing, supporting tasks like sentiment analysis, entity extraction, and keyword summarization.

2.6K
Archived
Python
NLP
CLI Tools
Python
#nlp#text-mining#sentiment-analysis

The-Japan-DataScientist-Society/100knocks-preprocess

A repository for the 100 Knocks of Data Science Preprocessing, focused on structured data processing.

2.5K
Experimental
HTML
ETL & Pipelines
#data-science#preprocessing#structured-data

TorchIO-project/torchio

TorchIO is a Python library for efficient medical image preprocessing and data augmentation for AI applications.

2.4K
Active
Python
Computer Vision
Databases
PyTorch
#medical-imaging#data-augmentation#computer-vision

sveltejs/svelte-preprocess

A Svelte preprocessor with support for various languages and a focus on developer productivity.

1.8K
Archived
TypeScript
Component Libraries (Vue/Svelte)
Build Tools
Svelte
#preprocess#svelte#typescript

AxeldeRomblay/MLBox

MLBox is a powerful automated machine learning Python library that simplifies and accelerates the machine learning workflow.

1.5K
Archived
Python
ML Ops
Databases
#auto-ml#automated-machine-learning#data-science

Subash/Prepros

Compile almost any preprocessing language with live browser refresh.

1.5K
Archived
JavaScript
Frontend Frameworks
Build Tools
React
#live-reload#preprocessor#css-preprocessor

sunlabuiuc/PyHealth

A Python toolkit for deep learning and healthcare applications, with support for clinical data and electronic health records.

1.5K
Active
Python
Deep Learning
Healthcare
#clinical-data#electronic-health-record#data-mining

winedarksea/AutoTS

Automated Time Series Forecasting library for Python with advanced features like deep learning and feature engineering.

1.4K
Active
Python
ML Ops
Time Series
#time-series#forecasting#feature-engineering

yeyupiaoling/VoiceprintRecognition-Pytorch

This project provides advanced voiceprint recognition models and data preprocessing methods using PyTorch.

1.2K
Stable
Python
AI Voice & Speech
API Frameworks
PyTorch
#speaker-recognition#voice-recognition#arcface

kavgan/nlp-in-practice

Starter code for solving real-world text data problems using NLP techniques like Gensim Word2Vec and text classification.

1.2K
Archived
Jupyter Notebook
LLM Frameworks
API Frameworks
Jupyter Notebook
#gensim#machine-learning#natural-language-processing

NVIDIA-Merlin/NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data used in recommender systems.

1.1K
Stable
Python
ML Ops
ETL & Pipelines
Python
#deep-learning#feature-engineering#feature-selection

KinWaiCheuk/nnAudio

A PyTorch-based audio processing library for spectrograms, CQT, and neural network-based preprocessing.

1.1K
Stable
Python
Computer Vision
Caching
PyTorch
#audio-processing#spectrograms#neural-networks
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.