Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Finally it was quite easy to achieve with only a small improvement to the original <code>sed(1)</code> code. Perhaps it could be done somehow better but while having conversion code working in "line scope" I managed to let it be (with minor improvements that are not important to the essence of this question) and rather read whole file in the pattern space, replace newlines with <code>\001</code> (<code>^A</code>) characters, let the original code do it's work and in the end replace the <code>^A</code> characters back to newlines. Here it is:</p> <pre><code>#! /bin/sed -f # pinyin2utf8.sed -- Convert US-ASCII Pinyin to UTF-8 # Copyright (C) 2012 Matous J. Fialka, &lt;http://mjf.cz/&gt; # Released under the terms of The MIT License # # DESCRIPTION # Script converts all occurences of US-ASCII encoded Pinyin text # enclosed by the solidus characters pairs to UTF-8 encoded text. # # USAGE # pinyin2utf8.sed filename [ &gt; filename.out ] # # WARNINGS # Script contains the ^A control character, usually displayed as # mentioned in most text editors, that can be usually reproduced # by pressing ^V ^A key sequence. The ^A control characters thus # MUST NOT occure in the input stream. To find the sequences in # the script lookup the y/// command in the code, please. # # In the US-ASCII encoded Pinyin to UTF-8 Pinyin conversion code # special delimiting sequences of left and right parentheses are # used and those two delimiting sequences of left or righ parens # SHOULD NOT be used in the input stream. : 0 $! { N b 0 } # HERE BE DRAGONS y/\n/^A/ y/\//\ / : a h s/[^\n]*\n// s/\n.*// # CONVERSION CODE BEGINNING s/ang1/(((aq)))ng/g s/ang2/(((aw)))ng/g s/ang3/(((ae)))ng/g s/ang4/(((ar)))ng/g s/eng1/(((eq)))ng/g s/eng2/(((ew)))ng/g s/eng3/(((ee)))ng/g s/eng4/(((er)))ng/g s/ing1/(((iq)))ng/g s/ing2/(((iw)))ng/g s/ing3/(((ie)))ng/g s/ing4/(((ir)))ng/g s/ong1/(((oq)))ng/g s/ong2/(((ow)))ng/g s/ong3/(((oe)))ng/g s/ong4/(((or)))ng/g s/an1/(((aq)))n/g s/an2/(((aw)))n/g s/an3/(((ae)))n/g s/an4/(((ar)))n/g s/en1/(((eq)))n/g s/en2/(((ew)))n/g s/en3/(((ee)))n/g s/en4/(((er)))n/g s/in1/(((iq)))n/g s/in2/(((iw)))n/g s/in3/(((ie)))n/g s/in4/(((ir)))n/g s/un1/(((uq)))n/g s/un2/(((uw)))n/g s/un3/(((ue)))n/g s/un4/(((ur)))n/g s/ao1/(((aq)))o/g s/ao2/(((aw)))o/g s/ao3/(((ae)))o/g s/ao4/(((ar)))o/g s/ou1/(((oq)))u/g s/ou2/(((ow)))u/g s/ou3/(((oe)))u/g s/ou4/(((or)))u/g s/ai1/(((aq)))i/g s/ai2/(((aw)))i/g s/ai3/(((ae)))i/g s/ai4/(((ar)))i/g s/ei1/(((eq)))i/g s/ei2/(((ew)))i/g s/ei3/(((ee)))i/g s/ei4/(((er)))i/g s/a1/(((aq)))/g s/a2/(((aw)))/g s/a3/(((ae)))/g s/a4/(((ar)))/g s/a1/(((aq)))/g s/a2/(((aw)))/g s/a3/(((ae)))/g s/a4/(((ar)))/g s/er2/(((ew)))r/g s/er3/(((ee)))r/g s/er4/(((er)))r/g s/lyue/l(((u:)))e/g s/nyue/n(((u:)))e/g s/e1/(((eq)))/g s/e2/(((ew)))/g s/e3/(((ee)))/g s/e4/(((er)))/g s/o1/(((oq)))/g s/o2/(((ow)))/g s/o3/(((oe)))/g s/o4/(((or)))/g s/i1/(((iq)))/g s/i2/(((iw)))/g s/i3/(((ie)))/g s/i4/(((ir)))/g s/nyu3/n(((u:e)))/g s/lyu/l(((u:)))/g s/u:1/(((u:q)))/g s/u:2/(((u:w)))/g s/u:3/(((u:e)))/g s/u:4/(((u:r)))/g s/u:0/(((u:s)))/g s/u1/(((uq)))/g s/u2/(((uw)))/g s/u3/(((ue)))/g s/u4/(((ur)))/g s/(((aq)))/ā/g s/(((aw)))/á/g s/(((ae)))/ǎ/g s/(((ar)))/à/g s/(((eq)))/ē/g s/(((ew)))/é/g s/(((ee)))/ě/g s/(((er)))/è/g s/(((iq)))/ī/g s/(((iw)))/í/g s/(((ie)))/ǐ/g s/(((ir)))/ì/g s/(((oq)))/ō/g s/(((ow)))/ó/g s/(((oe)))/ǒ/g s/(((or)))/ò/g s/(((uq)))/ū/g s/(((uw)))/ú/g s/(((ue)))/ǔ/g s/(((ur)))/ù/g s/(((u:q)))/ǖ/g s/(((u:w)))/ǘ/g s/(((u:e)))/ǚ/g s/(((u:r)))/ǜ/g s/(((u:s)))/ü/g # CONVERSION CODE END G s/\([^\n]*\)\n\([^\n]*\)\n[^\n]*\n/\2\/\1\// /\n/ b a # HERE BE DRAGONS y/^A/\ / </code></pre> <p>Sample input text:</p> <pre><code>$ cat test.in ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ /ni3 hao3/ ni3 hao3 ni3 hao3 /ni3 hao3/ ni3 hao3 ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ /ni3 hao3/ ni3 hao3 /ni3 hao3/ /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3/ ni3 hao3 /ni3 hao3 ni3 hao3 ni3 hao3/ ni3 hao3 </code></pre> <p>Sample run:</p> <pre><code>$ pinyin2utf8.sed test.in ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ /nǐ hǎo/ ni3 hao3 ni3 hao3 /nǐ hǎo/ ni3 hao3 ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo/ ni3 hao3 /nǐ hǎo nǐ hǎo nǐ hǎo/ ni3 hao3 </code></pre> <p>It seems to work just fine (at least to suite my needs) and thus I consider this issue to be closed. Many thanks belongs to all people involved, especially Mr. Lev Levitsky!</p> <p>P.S.: I also placed the code <a href="https://gist.github.com/2690801" rel="nofollow">here (GitHub)</a> where you can track some possible future changes.</p> <p>P.S. 2: The <code>^A</code> characters were lost while saving this answer. Now they are replaced with their ASCII representation here. You have to replace them to their binary representation (in <code>vi(1)</code> press <code>^V ^A</code> in insert mode) or use the <a href="https://gist.github.com/2690801" rel="nofollow">GitHub version</a> instead.</p> <p>P.S. 3: I still feel the <code>^A</code> "hack" as quite ugly. In case anybody knows to avoid it in this case while still having the middle conversion code as simple as it is now, please share your ideas.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload