About coOCR/HTR
Collaborative OCR/HTR for Historical Documents
The Project
coOCR/HTR is a browser-based tool that puts domain experts at the center of OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) workflows for historical documents. The expert leads, the AI assists.
The tool combines the pattern recognition capabilities of Large Language Models with the critical judgment of human experts. It supports researchers in Digital Humanities who work with handwritten sources from the 16th-20th century: letters, account books, diaries, and registers.
Methodology
Critical Expert in the Loop
The AI assists; the human decides. Every transcription is a hypothesis that requires expert validation. The interface positions the user as the expert operating a precision instrument, not a consumer of automated output.
Categorical Confidence
Instead of misleading percentage scores (92.3% confidence), we use three meaningful categories: confident, uncertain, and problematic. This avoids the "automation bias" where users over-trust high percentages.
Hybrid Validation
Validation combines deterministic rules (transcription markers, text statistics, OCR artifacts) with LLM Review. Advanced users can provide custom validation prompts for domain-specific checks.
Technology
Links
Credits
coOCR/HTR is developed as part of the Digital Humanities Craft initiative, exploring practical applications of AI in humanities research.
Built with Claude using the promptotyping methodology - iterative development through AI dialogue.
License
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to share and adapt this work for any purpose, even commercially, as long as you give appropriate credit.