StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3800233
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2010-09-26T23:12:12.823
FavoriteCount
0
LastActivityDate
2010-09-26T23:12:12.823
LastEditDate
LastEditorUserId
0
OwnerUserId
192812
ParentId
3799407
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
I'm not sure how easy this is, since it does make use of some more advanced concepts like generators, but it's at least robust and well-documented. The actual code is at the bottom and is fairly concise. The basic idea is that the function <code>iter_delim_sets</code> returns an iterator over (aka a sequence of) tuples containing the line number, the set of indices in the "expected" string where the delimiter was found, and a similar set for the "actual" string. There's one such tuple generated for each pair of (expected, result) lines. Those tuples are succinctly formalized into a <code>collections.namedtuple</code> type called <code>DelimLocations</code>. Then the function <code>analyze</code> just returns higher-level information based on such a data set, stored in a <code>DelimAnalysis</code> <code>namedtuple</code>. This is done using basic set algebra. <pre><code>"""Compare two sequences of strings. Test data: >>> from pprint import pprint >>> delimiter = '||' >>> expected = ( ... delimiter.join(("one", "fish", "two", "fish")), ... delimiter.join(("red", "fish", "blue", "fish")), ... delimiter.join(("I do not like them", "Sam I am")), ... delimiter.join(("I do not like green eggs and ham.",))) >>> actual = ( ... delimiter.join(("red", "fish", "blue", "fish")), ... delimiter.join(("one", "fish", "two", "fish")), ... delimiter.join(("I do not like spam", "Sam I am")), ... delimiter.join(("I do not like", "green eggs and ham."))) The results: >>> pprint([analyze(v) for v in iter_delim_sets(delimiter, expected, actual)]) [DelimAnalysis(index=0, correct=2, incorrect=1, count_diff=0), DelimAnalysis(index=1, correct=2, incorrect=1, count_diff=0), DelimAnalysis(index=2, correct=1, incorrect=0, count_diff=0), DelimAnalysis(index=3, correct=0, incorrect=1, count_diff=1)] What they mean: >>> pprint(delim_analysis_doc) (('index', ('The number of the lines from expected and actual', 'used to perform this analysis.')), ('correct', ('The number of delimiter placements in ``actual``', 'which were correctly placed.')), ('incorrect', ('The number of incorrect delimiters in ``actual``.',)), ('count_diff', ('The difference between the number of delimiters', 'in ``expected`` and ``actual`` for this line.'))) And a trace of the processing stages: >>> def dump_it(it): ... '''Wraps an iterator in code that dumps its values to stdout.''' ... for v in it: ... print v ... yield v >>> for v in iter_delim_sets(delimiter, ... dump_it(expected), dump_it(actual)): ... print v ... print analyze(v) ... print '======' one||fish||two||fish red||fish||blue||fish DelimLocations(index=0, expected=set([9, 3, 14]), actual=set([9, 3, 15])) DelimAnalysis(index=0, correct=2, incorrect=1, count_diff=0) ====== red||fish||blue||fish one||fish||two||fish DelimLocations(index=1, expected=set([9, 3, 15]), actual=set([9, 3, 14])) DelimAnalysis(index=1, correct=2, incorrect=1, count_diff=0) ====== I do not like them||Sam I am I do not like spam||Sam I am DelimLocations(index=2, expected=set([18]), actual=set([18])) DelimAnalysis(index=2, correct=1, incorrect=0, count_diff=0) ====== I do not like green eggs and ham. I do not like||green eggs and ham. DelimLocations(index=3, expected=set([]), actual=set([13])) DelimAnalysis(index=3, correct=0, incorrect=1, count_diff=1) ====== """ from collections import namedtuple # Data types ## Here ``expected`` and ``actual`` are sets DelimLocations = namedtuple('DelimLocations', 'index expected actual') DelimAnalysis = namedtuple('DelimAnalysis', 'index correct incorrect count_diff') ## Explanation of the elements of DelimAnalysis. ## There's no real convenient way to add a docstring to a variable. delim_analysis_doc = ( ('index', ("The number of the lines from expected and actual", "used to perform this analysis.")), ('correct', ("The number of delimiter placements in ``actual``", "which were correctly placed.")), ('incorrect', ("The number of incorrect delimiters in ``actual``.",)), ('count_diff', ("The difference between the number of delimiters", "in ``expected`` and ``actual`` for this line."))) # Actual functionality def iter_delim_sets(delimiter, expected, actual): """Yields a DelimLocations tuple for each pair of strings. ``expected`` and ``actual`` are sequences of strings. """ from re import escape, compile as compile_ from itertools import count, izip index = count() re = compile_(escape(delimiter)) def delimiter_locations(string): """Set of the locations of matches of ``re`` in ``string``.""" return set(match.start() for match in re.finditer(string)) string_pairs = izip(expected, actual) return (DelimLocations(index=index.next(), expected=delimiter_locations(e), actual=delimiter_locations(a)) for e, a in string_pairs) def analyze(locations): """Returns an analysis of a DelimLocations tuple. ``locations.expected`` and ``locations.actual`` are sets. """ return DelimAnalysis( index=locations.index, correct=len(locations.expected & locations.actual), incorrect=len(locations.actual - locations.expected), count_diff=(len(locations.actual) - len(locations.expected))) </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POPython - Best way to compare two strings, record stats comparing serial position of particular item?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USintuited
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POPython - Best way to compare two strings, record stats comparing serial position of particular item?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.