OCR & text recognition
OculiX gives you two complementary OCR engines, with very different trade-offs:
| Engine | When to use it | Setup |
|---|---|---|
| Tesseract | Latin scripts, simple layouts, no extra process | Bundled, zero install |
| PaddleOCR | CJK, mixed scripts, complex layouts, high accuracy required | Optional HTTP server |
You can use both in the same script. Pick per-call based on what you’re reading.
Tesseract — bundled via Legerix
Section titled “Tesseract — bundled via Legerix”Tesseract is fully embedded in OculiX through Legerix. The Maven coordinate io.github.oculix-org:legerix:5.5.0-4 brings the native binaries and the tessdata traineddata files into the JAR. Nothing for you to install.
Languages bundled out of the box:
| Code | Language |
|---|---|
eng | English |
fra | French |
spa | Spanish |
chi_sim | Simplified Chinese |
hin | Hindi |
osd | Orientation & Script Detection (needed for auto-rotation) |
Reading text — high-level API
Section titled “Reading text — high-level API”from sikuli import *
# Read the whole screenprint Screen(0).text()# → "File Edit View ... Save Cancel"
# Read inside a regionbtn_area = Region(100, 200, 300, 50)print btn_area.text()Region.text() runs Tesseract on the captured region and returns the recognized text. The bounding region is automatically sharpened — for the best accuracy, capture a tight crop around the text you care about.
Find a button by its label
Section titled “Find a button by its label”# Locate the "Submit" label on screen, click on ittarget = Region(0, 0, 1920, 1080).findText("Submit")target.click()findText(label) returns a Match you can click, hover, type into, or chain with other region ops. Use it when the button’s image changes (theme switch, dynamic icon) but its label is stable.
Switch language per call
Section titled “Switch language per call”Settings.OcrLanguage = "fra"print Region(...).text() # reads as French
Settings.OcrLanguage = "chi_sim"print Region(...).text() # reads as Simplified ChineseOr temporarily:
with Settings.use(OcrLanguage="chi_sim"): print btn_area.text()Tune the engine
Section titled “Tune the engine”Tesseract exposes Page Segmentation Modes (PSM) and OCR Engine Modes (OEM):
Settings.OcrPSM = 7 # single uniform block of text (best for single labels)Settings.OcrPSM = 11 # sparse text — find as much as possibleSettings.OcrOEM = 3 # default — neural + legacyPSM 7 is the sweet spot for matching button labels — 11 is better when reading paragraphs.
PaddleOCR — optional HTTP server
Section titled “PaddleOCR — optional HTTP server”For CJK languages, vertical text, handwritten text, or noisy layouts, PaddleOCR outperforms Tesseract by a wide margin. It runs as a small Python HTTP server that OculiX talks to over localhost.
Install the server
Section titled “Install the server”pip install paddleocrserver-poweredpaddleocrserver# Listens on http://localhost:5000The server is published on PyPI as paddleocrserver-powered and runs on Flask + Waitress, single-binary, GPU-optional.
Use it from a script
Section titled “Use it from a script”from sikuli import *from org.sikuli.script import PaddleOCREngine
ocr = PaddleOCREngine() # auto-detects http://localhost:5000
# Capture the region, run OCRimg = capture(Region(0, 0, 1920, 1080))json = ocr.recognize(img.getFile())
# Find coordinates of a text on screencoords = ocr.findTextCoordinates(json, "Validate")if coords: x, y, w, h = coords[0], coords[1], coords[2], coords[3] click(Location(x + w//2, y + h//2))Inspect everything PaddleOCR sees
Section titled “Inspect everything PaddleOCR sees”results = ocr.parseTextWithConfidence(json)for text, confidence in results.items(): print "%-30s %.2f" % (text, confidence)# Submit 0.9997# Cancel 0.9981# 你好 0.9962# ...When to prefer PaddleOCR
Section titled “When to prefer PaddleOCR”- CJK — Chinese, Japanese, Korean. PaddleOCR was trained for them.
- Mixed scripts in the same image (Latin + Asian)
- Vertical text (Japanese, traditional Chinese)
- High accuracy needed on small fonts, anti-aliased text, or noisy backgrounds
Fallback chain (internals, FYI)
Section titled “Fallback chain (internals, FYI)”Commons.loadTesseract() runs a 4-level fallback when initializing the OCR stack:
- Legerix native binaries (bundled) — first attempt
- Tess4J + Legerix tessdata — second attempt if native loader fails
- System
tesseractbinary — third attempt if the JVM can’t load JNI SikuliXception("Reinstall OculiX")— give up
In practice you’ll never hit anything beyond step 1, but if you do see this in your logs, it’s almost always a corrupted install — reinstall or rebuild from source.
Performance tips
Section titled “Performance tips”- Crop tight. Tesseract works on what you give it — a small focused region runs 10x faster than the whole screen.
- Cache the engine.
PaddleOCREngine ocr = new PaddleOCREngine()once per script, not per call. - Use
findText()overtext()+ string parsing when you want to click a specific word — it’s a single round-trip. - Don’t OCR what you can match. A PNG match is always cheaper than OCR. Use OCR for dynamic labels.
Common pitfalls
Section titled “Common pitfalls”| Symptom | Likely cause |
|---|---|
Numbers come back as letters (O instead of 0) | PSM not set to single-line; relax similarity |
| Chinese returns Latin characters | Settings.OcrLanguage is still "eng" |
| PaddleOCR returns empty results | Server not running on localhost:5000 |
Connection refused | Install paddleocrserver-powered and start it |
- Vision pipeline internals — how OculiX sees
- Jython scripting in depth — using OCR inside scripts
- API reference —
Region.text(),Region.findText(),PaddleOCREngine