Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The following prototype in Mathematica finds the coordinates of blocks of text and performs OCR within each block. You may need to adapt the parameters values to fit the dimensions of your actual images. I do not address the machine learning part of the question; perhaps you would not even need it for this application.</p> <p>Import the picture, create a binary mask for the printed parts, and enlarge these parts using an horizontal closing (dilation and erosion).</p> <p><img src="https://i.stack.imgur.com/KfXGk.png" alt="enter image description here"></p> <p>Query for each blob's orientation, cluster the orientations, and determine the overall rotation by averaging the orientations of the largest cluster.</p> <p><img src="https://i.stack.imgur.com/i5bw7.png" alt="enter image description here"></p> <p>Use the previous angle to straighten the image. At this time OCR is possible, but you would lose the spatial information for the blocks of text, which will make the post-processing much more difficult than it needs to be. Instead, find blobs of text by horizontal closing. </p> <p><img src="https://i.stack.imgur.com/elxKZ.png" alt="enter image description here"></p> <p>For each connected component, query for the bounding box position and the centroid position. Use the bounding box positions to extract the corresponding image patch and perform OCR on the patch.</p> <p><img src="https://i.stack.imgur.com/bhGjU.png" alt="enter image description here"></p> <p>At this point, you have a list of strings and their spatial positions. That's not XML yet, but it sounds like a good starting point to be tailored straightforwardly to your needs.</p> <p>This is the code. Again, the parameters (structuring elements) of the morphological functions may need to change, based on the scale of your actual images; also, if the invoice is too tilted, you may need to "rotate" roughly the structuring elements in order to still achieve good "un-skewing."</p> <pre><code>img = ColorConvert[Import@"http://www.team-bhp.com/forum/attachments/test-drives-initial-ownership-reports/490952d1296308008-laura-tsi-initial-ownership-experience-img023.jpg", "Grayscale"]; b = ColorNegate@Binarize[img]; mask = Closing[b, BoxMatrix[{2, 20}]] orientations = ComponentMeasurements[mask, "Orientation"]; angles = FindClusters@orientations[[All, 2]] \[Theta] = Mean[angles[[1]]] straight = ColorNegate@Binarize[ImageRotate[img, \[Pi] - \[Theta], Background -&gt; 1]] TextRecognize[straight] boxes = Closing[straight, BoxMatrix[{1, 20}]] comp = MorphologicalComponents[boxes]; measurements = ComponentMeasurements[{comp, straight}, {"BoundingBox", "Centroid"}]; texts = TextRecognize@ImageTrim[straight, #] &amp; /@ measurements[[All, 2, 1]]; Cases[Thread[measurements[[All, 2, 2]] -&gt; texts], (_ -&gt; t_) /; StringLength[t] &gt; 0] // TableForm </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload