Explore Projects

Discover 31 open source projects

Active filters (1):
Search: unstructuredร—
Clear all

Showing 1-20 of 31 projects

google/langextract

Extracts structured info from text using LLMs with source grounding

34.3K
Stable
Python
LLM Wrappers & SDKs
Local Inference Engines
Python
#llm#information-extraction#gemini

microsoft/graphrag

GraphRAG is a modular system for enhancing LLM outputs using knowledge graphs from unstructured text.

31.2K
Active
Python
RAG & Vector
RAG Frameworks
Python
#graphrag#llm#rag

treeverse/dvc

dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.

15.4K
Active
Python
ETL & Pipelines
Python
#data-versioning#machine-learning#reproducibility

Unstructured-IO/unstructured

Unstructured is an open-source ETL solution for transforming complex documents into structured data for language models.

14.1K
Active
HTML
Document Processing
#document-processing#data-pipelines#natural-language-processing

voxel51/fiftyone

Refine high-quality datasets and visual AI models with this Python library for active learning and data curation.

10.4K
Active
Python
Computer Vision
Python
#active-learning#data-curation#data-quality

neo4j-labs/llm-graph-builder

Builds a Neo4j graph from unstructured data using LLMs

4.5K
Active
Jupyter Notebook
LLM Frameworks
AI Tool Connectors
React
#graph-construction#LLM#Neo4j

varunshenoy/GraphGPT

A library for extracting knowledge graphs from unstructured text using the GPT-3 language model.

4.4K
Archived
JavaScript
LLM Frameworks
GraphQL
Node
#gpt-3#knowledge-graph#natural-language-processing

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.

3.7K
Active
Python
Agents & Orchestration
ETL & Pipelines
Python
#agents#data-pipelines#document-processing

towhee-io/towhee

A fast and simple framework for building neural data processing pipelines using Python.

3.5K
Archived
Python
LLM Frameworks
Computer Vision
Python
#machine-learning#computer-vision#embeddings

microsoft/table-transformer

Deep learning model for extracting & analyzing table structures from PDFs and images with datasets.

2.9K
Archived
Python
Computer Vision
ETL & Pipelines
PyTorch
#table-extraction#computer-vision#document-processing

logpai/loghub

A large collection of system log datasets for AI-driven log analytics.

2.6K
Active
Anomaly Detection
Datasets
#log-analysis#log-intelligence#log-parsing

milvus-io/bootcamp

This GitHub repository provides a Bootcamp for dealing with unstructured data like reverse image search, audio search, and NLP.

2.4K
Active
Jupyter Notebook
Embeddings
Semantic Search
Python
#audio-search#image-search#nlp

syslog-ng/syslog-ng

syslog-ng is an enhanced log daemon supporting a wide range of input and output methods for logging and monitoring.

2.3K
Active
C
API Frameworks
Databases
#logging#syslog#elastic

instill-ai/instill-core

Instill Core is an open-source AI infrastructure tool for orchestrating data, models, and pipelines to build AI-powered applications.

2.3K
Active
Python
LLM Frameworks
Agents & Orchestration
Golang
#ai#generative-ai#llm

nomic-ai/nomic

Nomic Developer API SDK is a Python library that provides tools for clustering, duplicate detection, embeddings, and topic modeling on unstructured data.

1.9K
Stable
Python
LLM Wrappers & SDKs
Databases
Python
#clustering#embeddings#text-processing

NanoNets/docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit.

1.9K
Stable
Python
Computer Vision
API Frameworks
Python
#document-analysis#document-data-extraction#ocr-benchmark

shcherbak-ai/contextgem

A Python library for extracting data and LLM outputs from various document types with ease.

1.8K
Stable
Python
LLM Frameworks
Data Extraction
#llm#data-extraction#document-intelligence

dingodb/dingo

A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.

1.7K
Active
Java
Vector Databases
API Frameworks
#vector-database#mysql-compatibility#structured-data

yobix-ai/extractous

Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.

1.7K
Archived
Rust
ETL & Pipelines
ETL & Pipelines
Rust
#data-extraction#unstructured-data#etl

datamade/usaddress

A Python library for parsing unstructured US addresses into structured address components.

1.6K
Stable
Python
API Frameworks
ORMs & Query Builders
Python
#address#address-parser#natural-language-processing
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.