Help
User guide for the coOCR/HTR Workbench
Quick Start
-
Load a Document
Click "Upload" to select an image, or choose a demo from the dropdown. PAGE-XML files with existing transcriptions are also supported.
-
Configure API Key
Click the key icon to enter your API key for Gemini, OpenAI, Anthropic, or configure a local Ollama server.
-
Transcribe
Click "Transcribe" to send the image to your selected LLM. The transcription appears in the middle panel.
-
Validate & Edit
Review validation results in the right panel. Double-click any cell to edit. Mark uncertain readings with [?].
-
Export
Click the export icon to download your transcription as plain text, JSON, or Markdown.
Keyboard Shortcuts
Confidence Markers
Use these markers in your transcriptions to indicate uncertainty:
Getting API Keys
Google Gemini
Free tier available. Get your key at Google AI Studio.
OpenAI
Paid API. Create an account and get your key at OpenAI Platform.
Anthropic
Paid API. Sign up at Anthropic Console.
Ollama (Local)
Free and private. Install Ollama, then run ollama pull llava for vision support.
API Key Security
API keys are used in browser memory by default. Optionally, they can be persisted in IndexedDB when you explicitly enable "Store API key persistently". This is convenient but not fully secure on shared or untrusted devices.
Known Risks
- Browser extensions with broad permissions can access page memory
- XSS vulnerabilities (if any exist) could expose keys
- Physical access to your device allows memory inspection via DevTools
- Shared computers: your session may persist if not properly closed
Recommendations
- Create a separate API key specifically for this tool
- Set spending limits at your provider
- Use a dedicated browser profile with minimal extensions
- For sensitive documents: use Ollama locally (no API key needed)
Local Setup (Convenience for Developers)
If you run coOCR/HTR locally (by cloning the repository), you can configure API keys to load automatically.
Step 1: Clone the Repository
git clone https://github.com/DigitalHumanitiesCraft/co-ocr-htr.git
cd co-ocr-htr/docs
Step 2: Create Local Config File
cp config.local.example.js config.local.js
Step 3: Add Your API Keys
Edit config.local.js and fill in your keys:
export const LOCAL_CONFIG = {
apiKeys: {
gemini: 'YOUR_GEMINI_KEY_HERE',
openai: 'YOUR_OPENAI_KEY_HERE',
anthropic: 'YOUR_ANTHROPIC_KEY_HERE'
},
// Optional: set default provider
defaultProvider: 'gemini'
};
Step 4: Start a Local Server
ES modules require a server. Use any of these:
# Python 3
python -m http.server 8000
# Node.js (npx)
npx serve
# PHP
php -S localhost:8000
Then open http://localhost:8000 in your browser.
Security Note: The file config.local.js is gitignored and will never be committed. Your API keys stay on your machine.
Troubleshooting
Check if your API key is valid and has sufficient quota. Also ensure the image is clear and not too large (under 10MB recommended). Some historical handwriting may be challenging for LLMs - try a different provider.
Upload an image first, then upload the corresponding PAGE-XML file. The transcription and bounding boxes will be imported automatically. This format is compatible with Transkribus exports.
Keys are loaded into memory at runtime. If you enabled persistent storage, an additional IndexedDB copy exists locally on your device. Browser-based key handling is not fully secure: extensions, XSS, or physical access could expose keys. Use dedicated keys with spending limits and keep persistence disabled on shared devices.
Partially. The interface works offline, but transcription requires an internet connection unless you use Ollama locally. Your work is auto-saved to the browser, so you won't lose progress.