StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13530746
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-11-23T14:09:07.143
FavoriteCount
0
LastActivityDate
2012-11-24T13:03:54.627
LastEditDate
2012-11-24T13:03:54.627
LastEditorUserId
1311571
OwnerUserId
1311571
ParentId
13530087
PostTypeId
2
Score
6
ViewCount
0
LastEditorDisplayName
text
Body
<h1>Overview</h1> <p>You need to wrap a bytestream and escape specific values. Also, the other way around is required: unescape control-codes and get the raw payload. You are working with sockets. The socket-commands uses string-parameters. In python, every string is basically a wrapper around a <code>char*</code>-array.</p> <h1>Naive approach</h1> <p>Its a string and we want to replace specific values with other ones. So what is the simplest way to achieve this?</p> <pre><code>def unstuff(self, s): return s.replace('\xFE\xDC', '\xFC').replace('\xFE\xDD', '\xFE').replace('\xFE\xDE', '\xFE') def stuff(self, s): return s.replace('\xFC', '\xFE\xDC').replace('\xFD', '\xFE\xDD').replace('\xFE', '\xFE\xDE') </code></pre> <p>Seems to be bad. With every replace-call, a new string-copy will be created. </p> <h1>Iterator</h1> <p>A very pythonic approach is to define an iterator for this specific problem: define the iterator to transform the input-data into the desired output. </p> <pre><code>def unstuff(data): i = iter(data) dic = {'\xDC' : '\xFC', '\xDD' : '\xFD', '\xFE' : '\xDE'} while True: d = i.next() # throws StopIteration on the end if d == '\xFE': d2 = i.next() if d2 in dic: yield dic[d2] else: yield '\xFE' yield d2 else: yield d def stuff(data): i = iter(data) dic = { '\xFC' : '\xDC', '\xFD' : '\xDD', '\xFE' : '\xDE' } while True: d = i.next() # throws StopIteration on the end if d in dic: yield '\xFE' yield dic[d] else: yield d def main(): s = 'hello\xFE\xDCWorld' unstuffed = "".join(unstuff(s)) stuffed = "".join(stuff(unstuffed)) print s, unstuffed, stuffed # also possible for c in unstuff(s): print ord(c) if __name__ == '__main__': main() </code></pre> <p><code>stuff()</code> and <code>unstuff()</code> need something iterable (list, string, ...) and return an <a href="http://docs.python.org/2/library/stdtypes.html#iterator-types" rel="nofollow noreferrer">iterator-object</a>. If you want to <code>print</code> the result or pass it into <code>socket.send</code>, you need to convert it back to a string (as shown with <code>"".join()</code>). Every unexpected data is handled somehow: <code>0xFE 0x__</code> will be returned verbatim, if it does not match any pattern.</p> <h1>RegExp</h1> <p>Another way would be to use <a href="http://docs.python.org/2/library/re.html" rel="nofollow noreferrer">regular expressions</a>. Its a big topic and a source of trouble sometimes, but we can keep it simple:</p> <pre><code>import re s = 'hello\xFE\xDCWorld' # our test-string # read: FE DC or FE DD or FE DE unstuff = re.compile('\xFE\xDC|\xFE\xDD|\xFE\xDE') # read: # - use this pattern to match against the string # - replace what you have found (m.groups(0), whole match) with # char(ord(match[1])^0x20) unstuffed = unstuff.sub(lambda m: chr(ord(m.group(0)[1])^0x20), s) # same thing, other way around stuff = re.compile('\xFC|\xFD|\xFE') stuffed = stuff.sub(lambda m: '\xFE' + chr(ord(m.group(0))^0x20), unstuffed) print s, unstuffed, stuffed </code></pre> <p>As said, you must create the new string somewhere to be able to use it with sockets. At least, this approach do not create unnecessary copies of the string like <code>s.replace(..).replace(..).replace(..)</code> would. You should keep the patterns <code>stuff</code>and <code>unstuff</code> somewhere around as building these objects is relatively expensive.</p> <h1>native C-Function</h1> <p>If some things are going to slow in python, we might want to use cpython and implement it as oure C-code. Basically, I do a first run, count how many bytes I nerd, allocate a new string and do the second run. I'm not very used to python-c-extensions and so I do not want to share this code. It just seems to work, see the next chapter</p> <h1>Comparison</h1> <p>One of the most important rules of optimization: compare! The basic setup for every test:</p> <pre><code>generate random binary data as a string while less_than_a_second: unstuff(stuff(random_data)) count += 1 return time_needed / count </code></pre> <p>I know, the setup isn't optimal. But we should get some usable result:</p> <p><a href="http://i46.tinypic.com/vdkd2e.png" rel="nofollow noreferrer">graph http://i46.tinypic.com/vdkd2e.png</a></p> <p>What do we see? Native is the fastest way to go, but only for very small strings. This is probably because of the python-interpreter: only one function-call is needed instead of three. But microseconds is fast enough the most of the times. After ~500 bytes, the timings are nearly the same with the naive approach. There must be some deep magic happening down there in the implementation. Iterators and RegExp are unacceptable compared to the effort.</p> <p>To sum things up: use the naive approach. Its hard to get something better. Also: if you simply guess about timings, you will be almost always wrong.</p>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POOctet (byte) stuffing and unstuffing, i.e. replacing one byte with two or v.v
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USPeter Schneider
UserOwnerUserId
1. USPeter Schneider
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POOctet (byte) stuffing and unstuffing, i.e. replacing one byte with two or v.v
  singulars
  PostTypePostTypeId
  PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
2. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
3. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.