Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

Supervised OCR Post-Corre… - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

Supervised OCR Post-Correction of Historical Swedish Texts: What Role Does the OCR System Play?

Paper i proceeding
Författare Dana Dannélls
Persson Simon
Publicerad i Digital Humanities in the Nordic countries (DHN)
Förlag CEUR
Publiceringsår 2020
Publicerad vid Institutionen för svenska språket
Språk en
Länkar ceur-ws.org/Vol-2612/paper2.pdf
Ämnesord Historical newspaper corpus OCR Machine learning
Ämneskategorier Informatik, data- och systemvetenskap, Data- och systemvetenskap, Datorlingvistik

Sammanfattning

Current approaches for post-correction of OCR errors offer solutions that are tailored to a specific OCR system. This can be problematic if the post-correction method was trained on a specific OCR system but have to be applied on the result of another system. Whereas OCR post-correction of historical text has received much attention lately, the question of what role does the OCR system play for the post-correction method has not been addressed. In this study we explore a dataset of 400 documents of historical Swedish text which has been OCR processed by three state-of-the-art OCR systems: Abbyy Finereader, Tesseract and Ocropus. We examine the OCR results of each system and present a supervised machine learning post-correction method that tries to approach the challenges exhibited by each system. We study the performance of our method by using three evaluation tools: PrimA, Språkbanken evaluation tool and Frontiers Toolkit. Based on the evaluation analysis we discuss the impact each of the OCR systems has on the results of the post- correction method. We report on quantitative and qualitative results showing varying degrees of OCR post-processing complexity that are important to consider when developing an OCR post-correction method.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?