Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h1>Overview</h1> <p>You need to wrap a bytestream and escape specific values. Also, the other way around is required: unescape control-codes and get the raw payload. You are working with sockets. The socket-commands uses string-parameters. In python, every string is basically a wrapper around a <code>char*</code>-array.</p> <h1>Naive approach</h1> <p>Its a string and we want to replace specific values with other ones. So what is the simplest way to achieve this?</p> <pre><code>def unstuff(self, s): return s.replace('\xFE\xDC', '\xFC').replace('\xFE\xDD', '\xFE').replace('\xFE\xDE', '\xFE') def stuff(self, s): return s.replace('\xFC', '\xFE\xDC').replace('\xFD', '\xFE\xDD').replace('\xFE', '\xFE\xDE') </code></pre> <p>Seems to be bad. With every replace-call, a new string-copy will be created. </p> <h1>Iterator</h1> <p>A very pythonic approach is to define an iterator for this specific problem: define the iterator to transform the input-data into the desired output. </p> <pre><code>def unstuff(data): i = iter(data) dic = {'\xDC' : '\xFC', '\xDD' : '\xFD', '\xFE' : '\xDE'} while True: d = i.next() # throws StopIteration on the end if d == '\xFE': d2 = i.next() if d2 in dic: yield dic[d2] else: yield '\xFE' yield d2 else: yield d def stuff(data): i = iter(data) dic = { '\xFC' : '\xDC', '\xFD' : '\xDD', '\xFE' : '\xDE' } while True: d = i.next() # throws StopIteration on the end if d in dic: yield '\xFE' yield dic[d] else: yield d def main(): s = 'hello\xFE\xDCWorld' unstuffed = "".join(unstuff(s)) stuffed = "".join(stuff(unstuffed)) print s, unstuffed, stuffed # also possible for c in unstuff(s): print ord(c) if __name__ == '__main__': main() </code></pre> <p><code>stuff()</code> and <code>unstuff()</code> need something iterable (list, string, ...) and return an <a href="http://docs.python.org/2/library/stdtypes.html#iterator-types" rel="nofollow noreferrer">iterator-object</a>. If you want to <code>print</code> the result or pass it into <code>socket.send</code>, you need to convert it back to a string (as shown with <code>"".join()</code>). Every unexpected data is handled somehow: <code>0xFE 0x__</code> will be returned verbatim, if it does not match any pattern.</p> <h1>RegExp</h1> <p>Another way would be to use <a href="http://docs.python.org/2/library/re.html" rel="nofollow noreferrer">regular expressions</a>. Its a big topic and a source of trouble sometimes, but we can keep it simple:</p> <pre><code>import re s = 'hello\xFE\xDCWorld' # our test-string # read: FE DC or FE DD or FE DE unstuff = re.compile('\xFE\xDC|\xFE\xDD|\xFE\xDE') # read: # - use this pattern to match against the string # - replace what you have found (m.groups(0), whole match) with # char(ord(match[1])^0x20) unstuffed = unstuff.sub(lambda m: chr(ord(m.group(0)[1])^0x20), s) # same thing, other way around stuff = re.compile('\xFC|\xFD|\xFE') stuffed = stuff.sub(lambda m: '\xFE' + chr(ord(m.group(0))^0x20), unstuffed) print s, unstuffed, stuffed </code></pre> <p>As said, you must create the new string somewhere to be able to use it with sockets. At least, this approach do not create unnecessary copies of the string like <code>s.replace(..).replace(..).replace(..)</code> would. You should keep the patterns <code>stuff</code>and <code>unstuff</code> somewhere around as building these objects is relatively expensive.</p> <h1>native C-Function</h1> <p>If some things are going to slow in python, we might want to use cpython and implement it as oure C-code. Basically, I do a first run, count how many bytes I nerd, allocate a new string and do the second run. I'm not very used to python-c-extensions and so I do not want to share this code. It just seems to work, see the next chapter</p> <h1>Comparison</h1> <p>One of the most important rules of optimization: compare! The basic setup for every test:</p> <pre><code>generate random binary data as a string while less_than_a_second: unstuff(stuff(random_data)) count += 1 return time_needed / count </code></pre> <p>I know, the setup isn't optimal. But we should get some usable result:</p> <p><a href="http://i46.tinypic.com/vdkd2e.png" rel="nofollow noreferrer">graph http://i46.tinypic.com/vdkd2e.png</a></p> <p>What do we see? Native is the fastest way to go, but only for very small strings. This is probably because of the python-interpreter: only one function-call is needed instead of three. But microseconds is fast enough the most of the times. After ~500 bytes, the timings are nearly the same with the naive approach. There must be some deep magic happening down there in the implementation. Iterators and RegExp are unacceptable compared to the effort.</p> <p>To sum things up: use the naive approach. Its hard to get something better. Also: if you simply guess about timings, you will be almost always wrong.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload