Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP Repairing Bad Text
    primarykey
    data
    text
    <p>This is something I'm working on and I'd like input from the intelligent people here on StackOverflow.</p> <p>What I'm attempting is a function to repair text based on combining various bad versions of the same text page. Basically this can be used to combine different OCR results into one with greater accuracy than any of them individually.</p> <p>I start with a dictionary of 600,000 English words, that's pretty much everything including legal and medical terms and common names. I have this already.</p> <p>Then I have 4 versions of the text sample.</p> <p>Something like this:</p> <pre><code>$text[0] = 'Fir5t text sample is thisline'; $text[1] = 'Fir5t text Smplee is this line.'; $text[2] = 'First te*t sample i this l1ne.'; $text[3] = 'F i r st text s ample is this line.'; </code></pre> <p>I attempting to combine the above to get an output which looks like:</p> <pre><code>$text = 'First text sample is this line.'; </code></pre> <p>Don't tell me it's impossible, because it is certainly not, just very difficult.</p> <p>I would very much appreciate any ideas anyone has towards this.</p> <p>Thank you!</p> <p>My current thoughts:</p> <p>Just checking the words against the dictionary will not work, since some of the spaces are in the wrong place and occasionally the word will not be in the dictionary.</p> <p>The major concern is repairing broken spacings, once this is fixed then then the most commonly occurring dictionary word can be chosen if exists, or else the most commonly occurring non-dictionary word.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload