I am trying to train Tesseract for some funny looking fonts, like Palace for example.I have tried a simple way - produced traindata withhttp://trainyourtesseract.com/ (via Wayback Machine)and then have made a call like
api->Init(".\\tessdata", "eng+Palace",OEM_TESSERACT_ONLY). api->SetPageSegMode(PSM_SINGLE_LINE); api->SetImage(image); // Get OCR result outText = api->GetUTF8Text();
The result for a line like
M P S T a o e h i l n p r s t u w y
is below, no glyph is correctly recognized:
.MDXXXo,XkX.n.mX.XnoX
Does trainyourtesseract make bad traineddata or do I make wrong calls,and how does one handle such cases?
Actualle, I have tried the same with less funny fonts,but also the recognition almost does not improve.
I am attaching the tiff file and my trained data for Palace.
Thank you everyone in advance for help,Yuliana