Explore Projects

Discover 31 open source projects

Active filters (1):
Search: unstructuredร—
Clear all

Showing 21-31 of 31 projects

lotus-data/lotus

A Python library that uses LLMs and embeddings to process datasets with up to 1000x speedups

1.6K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#ai-data-processing#llm#semantic-search

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K
Stable
Python
LLM Frameworks
ETL & Pipelines
Python
#document-processing#large-language-models#multimodal

tstanislawek/awesome-document-understanding

A curated list of resources for Document Understanding (DU) related to machine learning and natural language processing.

1.5K
Archived
Computer Vision
Natural Language Processing
#document-understanding#pdf-processing#ocr

superlinked/superlinked

Superlinked is a Python framework for building high-performance search & recommendation apps with structured and unstructured data.

1.5K
Stable
Jupyter Notebook
LLM Frameworks
RAG & Vector
Python
#data-pipeline#embeddings#information-retrieval

amphi-ai/amphi-etl

A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.

1.4K
Active
TypeScript
ETL & Pipelines
Data Analysis
TypeScript
#data-analysis#data-pipelines#data-transformation

Renumics/spotlight

Interactively explore unstructured datasets like audio, images, and video using this TypeScript library.

1.3K
Active
TypeScript
Computer Vision
Caching
React
#data-visualization#exploratory-data-analysis#unstructured-data

Open-Source-Legal/OpenContracts

An enterprise-grade, API-first LLM workspace for unstructured document processing, with features like data extraction, redaction, and prompt engineering.

1.2K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#llm#prompt-engineering#etl

Oxen-AI/Oxen

A fast data versioning system for ML datasets, making it easy to version and track changes like code.

1.1K
Active
Rust
Data & Databases
Version Control
Rust
#data-versioning#machine-learning#version-control

brettkromkamp/contextualise

Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.

1.1K
Experimental
Python
Knowledge Graphs
Research Tools
Flask
#knowledge-management#research-tool#semantic-web

databricks/lilac

An open-source Python library that helps curate better data for large language models (LLMs).

1.1K
Archived
Python
LLM Frameworks
Data Analysis
Python
#data-curation#unstructured-data#dataset-analysis

myscale/MyScaleDB

A high-performance vector search and full-text search database fork of ClickHouse, focused on use cases for AI and ML developers.

1.0K
Experimental
C++
Vector Databases
Databases
#ann#big-data#embedding
1

Stay in the loop

Get weekly updates on trending AI coding tools and projects.