Note that there are some explanatory texts on larger screens.

plurals
  1. PORegex for LaTeX umlaut escapes?
    primarykey
    data
    text
    <p>I am writing a Scala script which gets information from several sources, including a BibTeX file. Using the <a href="http://code.google.com/p/java-bibtex/" rel="nofollow">jbibtex library</a> to parse the file. </p> <p>My BibTeX source file contains LaTeX style escapes for non-ASCII letters, like </p> <blockquote> <p>author = {Fjeld, Morten and Sch\"{a}r, Sissel Guttormsen}</p> </blockquote> <p>I tried to use simple replacement, but failed, because I cannot write a proper regex to match the escape. </p> <p>The best I could come up with was </p> <pre><code>val stringWithEscapedUmlaut = """Sch\"{a}r""" val properString = stringWithEscapedUmlaut.replaceAll("""\\"\{a}""", "ä") </code></pre> <p>but the regex engine complains about the match. </p> <blockquote> <p>java.util.regex.PatternSyntaxException: Illegal repetition near index 2 \"{a}</p> </blockquote> <p>As far as I am aware, I should escape <code>\</code> and <code>{</code> in a regex, but not <code>"</code> or <code>}</code>. Nevertheless, I tried adding more escape backslashes at increasingly random places :( but no success. </p> <p>Any ideas how to match this? </p> <p><strong>Update</strong> The solution for an A-Umlaut escape turns out to be simple (thank you Keppil for that). It is </p> <pre><code>replace("\"{a}", "ä") </code></pre> <p>But LaTeX also has escapes for other characters, for example <code>\{ss}</code> for <code>ß</code>. </p> <p>Scala won't let me use "{ss}" in a string, so I tried using raw strings, """{ss}""". Then the whole replacement falls apart. </p> <pre><code>object Converter { def cleanLatexEscapes(rawString: String): String = { val aumlauts = rawString.replace("\"{a}", "ä") val oumlauts = aumlauts.replace("\"{o}", "ö") val uumlauts = oumlauts.replace("\"{u}", "ü") val scharfesEs = uumlauts.replace("""\{ss}""", "ß") return scharfesEs } } import org.scalatest._ class ConverterSpec extends FlatSpec { "cleanLatexEscapes" should "clean 'Käseklöße in der Küche'" in { val escaped = """K\"{a}sekl\"{o}\{ss}e in der K\"{u}che""" val cleaned = Converter.cleanLatexEscapes(escaped) assert(cleaned === "Käseklöße in der Küche") } } </code></pre> <blockquote> <p>cleanLatexEscapes - should clean 'Käseklöße in der Küche' <strong>* FAILED *</strong> "K[\äsekl\öße in der K]üche" did not equal "K[äseklöße in der K]üche"</p> </blockquote> <p>What is happening here and how do I fix it so that both umlauts and scharfes es escapes are covered? Also, where do the square brackets come from in the test output? </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload