StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>Updated Answer</strong></p> <p>After your comment I realised that I was wrong, i.e. the diacritic character <em>should</em> appear second in the byte sequence, even though it should be <em>rendered</em> left of the main character.</p> <p>So, it turns out, iText doesn't support this type of rendering on Indic charactersets. Roughly speaking, iText uses awt's <code>Graphics2D</code> to render non-Latin unicode characters, one-by-one, as images in the PDF. (I guess this is because appropriate fonts are not necessarily be installed on everyone's computer). This feature doesn't take this special ordering into account.</p> <p>iText does support similar behaviour for Arabic, using a class contributed by another developer. See <a href="http://itext.svn.sourceforge.net/viewvc/itext/trunk/itext/src/main/java/com/itextpdf/text/pdf/ArabicLigaturizer.java?revision=5075&content-type=text/plain" rel="nofollow">com.itextpdf.text.pdf.ArabicLigaturizer</a>. Perhaps you could create a similar one yourself? (!)</p> <p>It looks like this has come up before:</p> <ul> <li><a href="http://thread.gmane.org/gmane.comp.java.lib.itext.general/56702/focus=59552" rel="nofollow">http://thread.gmane.org/gmane.comp.java.lib.itext.general/56702/focus=59552</a></li> <li><a href="http://itext-general.2136553.n4.nabble.com/patch-for-complex-scripts-indic-rendering-td2167588.html" rel="nofollow">http://itext-general.2136553.n4.nabble.com/patch-for-complex-scripts-indic-rendering-td2167588.html</a></li> </ul> <p><strong>Original Answer</strong></p> <p>Kem chho,</p> <p>I believe that iText is displaying the correct characters, but the first 2 characters of your input have been 'flipped' before you translated the string into unicode points. So, the problem occurred before the data even gets to iText.</p> <p>The underlying issue is that the 'first' character is a 'pre-base' character, which is a type of <a href="http://en.wikipedia.org/wiki/Diacritic" rel="nofollow">Diacritic</a>. It's a bit like an 'accent' in European texts, in that it can't exist on its own, and its purpose is to embellish another character. In this case it turns a 'Ba' (બ) into a 'Bi'.</p> <p>You'll see int the the Unicode Codepage, that the first character (િ) is indeed codepoint \u0ABF, and the second (બ) is \u0AAC : <a href="http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode" rel="nofollow">http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode</a></p> <p>So, somewhere between Google Transliterate and your codepoint representation, these characters got flipped. So, you need to review how you did that translation. </p> <p><strong>How did you convert these characters into codepoints?</strong></p> <p>Seemingly, some interpreters place the 'pre-base' after the main consonant, instead of before it:</p> <ul> <li>Note that when you paste those characters into a (Linux) terminal, the first 2 characters come out back-to-front. I believe something similar happened for you too. </li> <li>You'll also notice that when you try editing this word in Google Transliterate, you can't place the cursor between the first 2 characters, and when you hit backspace, the left character is deleted before the right.</li> </ul> <p>So, if you can work out where this 'flipping' occured, then hopefully your solution will present itself.</p> <p>Hope this helps</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload