StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Okay, so I found your question to be a fascinating puzzle. I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length.</p> <p>This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect. I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack).</p> <pre><code>import re import glob from itertools import groupby from operator import itemgetter def classifyGroups(iterable, reObj=re.compile('\d+')): """Yields successive match lists, where each item in the list is either static text content, or a list of matching values. * `iterable` is a list of strings, such as glob('images/*') * `reObj` is a compiled regular expression that describes the variable section of the iterable you want to match and classify """ def classify(text, pos=0): """Use a regular expression object to split the text into match and non-match sections""" r = [] for m in reObj.finditer(text, pos): m0 = m.start() r.append((False, text[pos:m0])) pos = m.end() r.append((True, text[m0:pos])) r.append((False, text[pos:])) return r def matchGrouper(each): """Returns index of matches or origional text for non-matches""" return [(i if t else v) for i,(t,v) in enumerate(each)] def unpack(k,matches): """If the key is an integer, unpack the value array from matches""" if isinstance(k, int): k = [m[k][1] for m in matches] return k # classify each item into matches matchLists = (classify(t) for t in iterable) # group the matches by their static content for key, matches in groupby(matchLists, matchGrouper): matches = list(matches) # Yield a list of content matches. Each entry is either text # from static content, or a list of matches yield [unpack(k, matches) for k in key] </code></pre> <p>Finally, we add enough logic to perform pretty printing of the output, and run an example.</p> <pre><code>def makeResultPretty(res): """Formats data somewhat like the question""" r = [] for e in res: if isinstance(e, list): # TODO: collapse and simplify ranges as desired here if len(set(e))<=1: # it's a list of the same element e = e[0] else: # prettify the list e = '['+' '.join(e)+']' r.append(e) return ''.join(r) fnList = sorted(glob.glob('images/*')) re_digits = re.compile(r'\d+') for res in classifyGroups(fnList, re_digits): print makeResultPretty(res) </code></pre> <p>My directory of images was created from your example. You can replace fnList with the following list for testing:</p> <pre><code>fnList = [ 'images/image_0001.jpg', 'images/image_0002.jpg', 'images/image_0003.jpg', 'images/image_0010.jpg', 'images/image_0011-1.jpg', 'images/image_0011-2.jpg', 'images/image_0011-3.jpg', 'images/image_0011.jpg', 'images/image_9999.jpg'] </code></pre> <p>And when I run against this directory, my output looks like:</p> <pre><code>StackOverflow/3926936% python classify.py images/image_[0001 0002 0003 0010].jpg images/image_0011-[1 2 3].jpg images/image_[0011 9999].jpg </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload