coOCR/HTR

About coOCR/HTR

Collaborative OCR/HTR for Historical Documents

The Project

coOCR/HTR is a browser-based tool that puts domain experts at the center of OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) workflows for historical documents. The expert leads, the AI assists.

The tool combines the pattern recognition capabilities of Large Language Models with the critical judgment of human experts. It supports researchers in Digital Humanities who work with handwritten sources from the 16th-20th century: letters, account books, diaries, and registers.

Methodology

Critical Expert in the Loop

The AI assists; the human decides. Every transcription is a hypothesis that requires expert validation. The interface positions the user as the expert operating a precision instrument, not a consumer of automated output.

Categorical Confidence

Instead of misleading percentage scores (92.3% confidence), we use three meaningful categories: confident, uncertain, and problematic. This avoids the "automation bias" where users over-trust high percentages.

Hybrid Validation

Validation combines deterministic rules (transcription markers, text statistics, OCR artifacts) with LLM Review. Advanced users can provide custom validation prompts for domain-specific checks.

Technology

Vanilla JavaScript (ES6+)
No Build Process
CSS Custom Properties
IndexedDB Project Persistence
Multi-Provider LLM APIs
PAGE-XML / METS-XML

Links

Credits

coOCR/HTR is developed as part of the Digital Humanities Craft initiative, exploring practical applications of AI in humanities research.

Built with Claude using the promptotyping methodology - iterative development through AI dialogue.

License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

You are free to share and adapt this work for any purpose, even commercially, as long as you give appropriate credit.