Note that there are some explanatory texts on larger screens.

plurals
  1. POEmacs, unicode, xterm mouse escape sequences, and wide terminals
    text
    copied!<p>Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with \0. This is a pain on wide monitors because only the first 223 columns have mouse. </p> <p>What is the culprit, and how can I work around it?</p> <p>From what I can tell this has something to do with Unicode/UTF-8 support, because it wasn't a problem 5-6 years ago when I last had a big monitor.</p> <p>Gory details follow...</p> <p>Thanks!</p> <p>Emacs xterm-mouse-mode has a well-known weakness handling mouse clicks starting around x=95. <a href="http://www.ece.cmu.edu/~ryanjohn/linux-hacks.html" rel="noreferrer">A workaround</a>, adopted by recent versions of emacs, pushes the problem off to x=223. </p> <p>Several years ago I figured out that xterm encodes positions in 7-bit octets. Given position 'x' to encode, with X=x-96, send: </p> <pre><code>\40+x (x &lt; 96) \300+X/64 \200+X%64 (otherwise) </code></pre> <p>We have to add one to given x position from emacs, because positions in xterm start at one, not zero. Hence the magic x=95 number pops up because it's coded as "\300\200" -- the first escaped number. Somebody (emacs? bash? xterm?) treats those like "C0" control sequences from <a href="http://en.wikipedia.org/wiki/C0_and_C1_control_codes" rel="noreferrer">ISO 2022</a>. Starting at x=159, we change to "C1" sequences (\301\200), which are also part of ISO 2022. </p> <p>Trouble hits with \302 sequences, which corresponds to the current x=223 limit. Several years ago I was able to extend the hack to intercept \302 and \303 sequences manually, which got past the problem. Fast forward a few years, and today I find that I'm stuck back at x=223 because Somebody is replacing those sequences with \0.</p> <p>So, where I'd expect clicking at line 1, col 250 to produce </p> <pre><code>ESC [ M SPC \303\207 ! ESC [ M # \303\207 ! </code></pre> <p>Instead emacs reports (for any col > 223)</p> <pre><code>ESC [ M SPC C-@ ! ESC [ M # C-@ ! </code></pre> <p>I suspect that Unicode/UTF-8 support is the culprit. Some digging shows that <a href="http://unicode.org/versions/corrigendum1.html" rel="noreferrer">the Unicode standard allowed C0 and C1 sequences as part of UTF-8 until Nov 2000</a>, and I guess Somebody didn't get the memo (fortunately). However, \302\200 - \302\237 are <a href="http://www.utf8-chartable.de/" rel="noreferrer"><em>Unicode</em> control sequences</a>, so Somebody slurps them up (doing who-knows-what with them!) and returns \0 instead. </p> <p>Some more detailed questions:<br> - Who is this Somebody that intercepts the codes before they reach emacs' lossage buffer?<br> - If it's really just about control sequences, how come characters after \302\237, which are UTF-8 encodings of printable Unicode, also come back as \0 ?<br> - What makes emacs decide whether to display lossage as unicode characters or octal escape sequences, and why don't the two match? For example, my self-built cygwin emacs 23.2.1 (xterm 229) reports \301\202 for column 161, but my rhel5.5-supplied emacs 22.3.1 (xterm 215) reports "Â" (latin A with circumflex), which is actually \303\202 in UTF-8! </p> <p><strong>Update:</strong></p> <p>Here's a patch against xterm-261 which makes it emit mouse positions in utf-8 format:</p> <pre><code>diff -r button.c button.utf-8-fix.c --- a/button.c Sat Aug 14 08:23:00 2010 +0200 +++ b/button.c Thu Aug 26 16:16:48 2010 +0200 @@ -3994,1 +3994,27 @@ -#define MOUSE_LIMIT (255 - 32) +#define MOUSE_LIMIT (2047 - 32) +#define MOUSE_UTF_8_START (127 - 32) + +static unsigned +EmitMousePosition(Char line[], unsigned count, int value) +{ + /* Add pointer position to key sequence + * + * Encode large positions as two-byte UTF-8 + * + * NOTE: historically, it was possible to emit 256, which became + * zero by truncation to 8 bits. While this was arguably a bug, + * it's also somewhat useful as a past-end marker so we keep it. + */ + if(value == MOUSE_LIMIT) { + line[count++] = CharOf(0); + } + else if(value &lt; MOUSE_UTF_8_START) { + line[count++] = CharOf(' ' + value + 1); + } + else { + value += ' ' + 1; + line[count++] = CharOf(0xC0 + (value &gt;&gt; 6)); + line[count++] = CharOf(0x80 + (value &amp; 0x3F)); + } + return count; +} @@ -4001,1 +4027,1 @@ - Char line[6]; + Char line[9]; /* \e [ &gt; M Pb Pxh Pxl Pyh Pyl */ @@ -4021,2 +4047,0 @@ - else if (row &gt; MOUSE_LIMIT) - row = MOUSE_LIMIT; @@ -4028,1 +4052,5 @@ - else if (col &gt; MOUSE_LIMIT) + + /* Limit to representable mouse dimensions */ + if (row &gt; MOUSE_LIMIT) + row = MOUSE_LIMIT; + if (col &gt; MOUSE_LIMIT) @@ -4090,2 +4118,2 @@ - line[count++] = CharOf(' ' + col + 1); - line[count++] = CharOf(' ' + row + 1); + count = EmitMousePosition(line, count, col); + count = EmitMousePosition(line, count, row); </code></pre> <p>Hopefully this (or something like it) will appear in a future version of xterm... the patch makes xterm work out of the box with emacs-23 (which assumes utf-8 input) and fixes the existing problems with xt-mouse.el also. To use it with emacs-22 requires a redefinition of the function it uses to decode mouse positions (the new definition works fine with emacs-23 also):</p> <pre><code>(defadvice xterm-mouse-event-read (around utf-8 compile activate) (setq ad-return-value (let ((c (read-char))) (cond ;; mouse clicks outside the encodable range produce 0 ((= c 0) #x800) ;; must convert UTF-8 to unicode ourselves ((and (&gt;= c #xC2) (&lt; emacs-major-version 23)) (logior (lsh (logand c #x1F) 6) (logand (read-char) #x3F))) ;; normal case (c) ) ))) </code></pre> <p>Distribute the defun as part of the .emacs on all machines you log into, and patch the xterm on any machines you work from. Voila!</p> <p><strong>WARNING:</strong> Applications which use xterm's mouse modes but do not treat their input as utf-8 will get confused by this patch because the mouse escape sequences get longer. However, those applications break horribly with the current xterm because mouse positions with x > 95 look like utf-8 codes but aren't. I'd create a new mouse mode for xterm, but certain applications (gnu screen!) filter out unknown escape sequences. Emacs is the only terminal-mouse app I use, so I consider the patch a net win, but YMMV. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload