Showing 1-8 of 8 projects
PaddleOCR converts documents/images to structured data for AI apps
Converts documents to AI-ready formats with advanced parsing
Unstructured is an open-source ETL solution for transforming complex documents into structured data for language models.
A set of TypeScript-based cloud services and utilities for processing and extracting structured data from various document formats.
Fast local PDF-to-Markdown/JSON converter for RAG pipelines. No GPU needed.
ExtractThinker is a powerful document intelligence library for LLMs, offering flexible and intuitive workflows.
An intelligent document parsing tool that extracts and converts data from various document formats to structured data like Markdown, JSON, CSV, and HTML.
An open-source toolkit for general OCR research and applications, with integrated training, evaluation, and production-ready OCR systems.
Get weekly updates on trending AI coding tools and projects.