Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <blockquote> <p>is anybody familiar with this sort of obfuscation scheme in the Windows world?</p> </blockquote> <p>Once you understand it correctly, it's just a trivial rotation cipher like <a href="http://en.wikipedia.org/wiki/ROT13" rel="nofollow">ROT13</a>.</p> <p>Why would anyone use this?</p> <p>Well, in general, this is very common. Let's say you have some data that you need to obfuscate. But the decryption algorithm and key have to be embedded in software that the viewers have. There's no point using something fancy like AES, because someone can always just dig the algorithm and key out of your code instead of cracking AES. An encryption scheme that's even marginally harder to crack than finding the hidden key is just as good as a perfect encryption scheme—that is, good enough to deter casual viewers, and useless against serious attackers. (Often you aren't even really worried about <em>stopping</em> attacks, but about proving after the fact that your attacker must have acted in bad faith for contractual/legal reasons.) So, you use either a simple rotation cipher, or a simple xor cipher—it's fast, it's hard to get wrong and easy to debug, and if worst comes to worst you can even decrypt it manually to recover corrupted data.</p> <p>As for the particulars: </p> <p>If you want to handle non-ASCII characters, you pretty much have to use Unicode. If you used some fixed 8-bit charset, or the local system's OEM charset, you wouldn't be able to handle passwords from other machines.</p> <p>A Python script would almost certainly handle Unicode characters, because in Python you either deal in bytes in a <code>str</code>, or Unicode characters in a <code>unicode</code>. But a Windows C or .NET app would be much more likely to use UTF-16, because Windows native APIs deal in UTF-16-LE code points in a <code>WCHAR *</code> (aka a string of 16-bit words).</p> <p>So, why 4142? Well, it really doesn't matter what the key is. I'm guessing some programmer suggested <a href="http://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe.2C_and_Everything_.2842.29" rel="nofollow">42</a>. His manager then said "That doesn't sound very secure." He sighed and said, "I already explained why no key is going to be any more secure than… you know what, forget it, what about 4142?" The manager said, "Ooh, that sounds like a really secure number!" So that's why 4142.</p> <hr> <blockquote> <p>If it's not a library function, can you think of a better method to de-obfuscate these values without resorting to the magic 142 number.</p> </blockquote> <p>You do need to resort to the magic 4142, but you can make this a lot simpler:</p> <pre><code>def decrypt(block): return struct.pack('&gt;H', (4142 - int(block, 10)) % 65536) </code></pre> <p>So, each block of 5 characters is the decimal representation of a UTF-16 code unit, subtracted from 4142, using C unsigned-short wraparound rules.</p> <p>This would be trivial to implement in native Windows C, but it's slightly harder in Python. The best transformation function I can come up with is:</p> <pre><code>def decrypt_block(block): return struct.pack('&gt;H', (4142 - int(block, 10)) % 65536) def decrypt(pwd): blocks = [pwd[i:i+5] for i in range(0, len(pwd), 5)] return ''.join(map(decrypt_block, blocks)).decode('utf-16-be') </code></pre> <p>This would be a lot more trivial in C or C#, which is probably what they implemented things in, so let me explain what I'm doing.</p> <p>You already know how to transform the string into a sequence of 5-character blocks.</p> <p>My <code>int(block, 10)</code> is doing the same thing as your <code>int(block.lstrip('0'))</code>, making sure that a <code>'0'</code> prefix doesn't make Python treat it as an octal numeral instead of decimal, but more explicitly. I don't think this is actually necessary in Jython 2.2 (it definitely isn't in more modern Python/Jython), but I left it just in case.</p> <p>Next, in C, you'd just do <code>unsigned short x = 4142U - y;</code>, which would automatically underflow appropriately. Python doesn't have <code>unsigned short</code> values, just signed <code>int</code>, so we have to do the underflow manually. (Because Python uses floored division and remainder, the sign is always the same as the divisor—this wouldn't be true in C, at least not C99 and most platforms' C89.)</p> <p>Then, in C, we'd just cast the unsigned short to a 16-bit "wide character"; Python doesn't have any way to do that, so we have to use <a href="http://docs.python.org/2.6/library/struct.html" rel="nofollow"><code>struct.pack</code></a>. (Note that I'm converting it to big-endian, because I think that makes this easier to debug; in C you'd convert to native-endian, and since this is Windows, that would be little-endian.)</p> <p>So, now we've got a sequence of 2-character UTF-16-BE code points. I just <code>join</code> them into one big string, then <code>decode</code> it as UTF-16-BE.</p> <hr> <p>If you really want to test that I've got this right, you'll need to find characters that aren't just non-ASCII, but non-Western. In particular, you need:</p> <ul> <li>A character that's > U+4142 but &lt; U+10000. Most CJK ideographs, like U+7000 (瀀), fit the bill. This should appear as <code>'41006'</code>, because that's 4142-0x7000 rolled over as an unsigned short.</li> <li>A character that's >= U+10000. This includes uncommon CJK characters, specialized mathematical characters, characters from ancient scripts, etc. For example, the Old Italic character U+10300 () encodes to the surrogate pair (0xd800, 0xdf00); 4142-0xd800=14382, and 4142-0xdf00=12590, so you'd get <code>'1438212590'</code>.</li> </ul> <p>The first will be hard to find—even most Chinese- and Japanese-native programmers I've dealt with use ASCII passwords. And the second, even more so; nobody but a historical linguistics professor is likely to even think of using archaic scripts in their passwords. By Murphy's Law, if you write the correct code, it will never be used, but if you don't, it's guaranteed to show up as soon as you ship your code.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload