StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PODifferences in re.findall and re.finditer -- bug in Python 2.7 re module?
primarykey
Id
20424661
data
AcceptedAnswerId
20424721
AnswerCount
1
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-12-06T13:05:42.617
FavoriteCount
0
LastActivityDate
2013-12-06T13:08:56.740
LastEditDate
LastEditorUserId
0
OwnerUserId
2022326
ParentId
0
PostTypeId
1
Score
2
ViewCount
282
LastEditorDisplayName
text
Body
While demonstrating Python's regex functionality, I wrote a small program to compare the return values of <code>re.search()</code>, <code>re.findall()</code> and <code>re.finditer()</code>. I'm aware that <code>re.search()</code> will only find one match per line and that <code>re.findall()</code> only returns the matched substring(s) and not any location information. However, I was surprised see to see that the matched substring can differ between the three functions. Code (<a href="https://gist.github.com/palday/7805267" rel="nofollow">available on GitHub</a>): <pre><code>#! /usr/bin/env python # -*- coding: utf-8 -*- # License: CC-BY-NC-SA 3.0 import re import codecs # download kate_chopin_the_awakening_and_other_short_stories.txt # from Project Gutenberg: # http://www.gutenberg.org/ebooks/160.txt.utf-8 # with wget: # wget http://www.gutenberg.org/ebooks/160.txt.utf-8 -O kate_chopin_the_awakening_and_other_short_stories.txt # match for something o'clock, with valid numerical time or # any English word with proper capitalization oclock = re.compile(r""" ( [A-Z]?[a-z]+ # word mit max. 1 capital letter | 1[012] # 10,11,12 | [1-9] # 1,2,3,5,6,7,8,9 ) \s o'clock""", re.VERBOSE) path = "kate_chopin_the_awakening_and_other_short_stories.txt" print print "re.search()" print print u"{:>6} {:>6} {:>6}\t{}".format("Line","Start","End","Match") print u"{:=>6} {:=>6} {:=>6}\t{}".format('','','','=====') with codecs.open(path,mode='r',encoding='utf-8') as f: for lineno, line in enumerate(f): atime = oclock.search(line) if atime: print u"{:>6} {:>6} {:>6}\t{}".format(lineno, atime.start(), atime.end(), atime.group()) print print "re.findall()" print print u"{:>6} {:>6} {:>6}\t{}".format("Line","Start","End","Match") print u"{:=>6} {:=>6} {:=>6}\t{}".format('','','','=====') with codecs.open(path,mode='r',encoding='utf-8') as f: for lineno, line in enumerate(f): times = oclock.findall(line) if times: print u"{:>6} {:>6} {:>6}\t{}".format(lineno, '', '', ' '.join(times)) print print "re.finditer()" print print u"{:>6} {:>6} {:>6}\t{}".format("Line","Start","End","Match") print u"{:=>6} {:=>6} {:=>6}\t{}".format('','','','=====') with codecs.open(path,mode='r',encoding='utf-8') as f: for lineno, line in enumerate(f): times = oclock.finditer(line) for m in times: print u"{:>6} {:>6} {:>6}\t{}".format(lineno, m.start(), m.end(), m.group()) </code></pre> and Output (tested on Python 2.7.3 and 2.7.5): <pre><code>re.search() Line Start End Match ====== ====== ====== ===== 248 7 21 eleven o'clock 1520 24 35 one o'clock 1975 21 33 nine o'clock 2106 4 16 four o'clock 4443 19 30 ten o'clock re.findall() Line Start End Match ====== ====== ====== ===== 248 eleven 1520 one 1975 nine 2106 four 4443 ten re.finditer() Line Start End Match ====== ====== ====== ===== 248 7 21 eleven o'clock 1520 24 35 one o'clock 1975 21 33 nine o'clock 2106 4 16 four o'clock 4443 19 30 ten o'clock </code></pre> What am I missing something here? Why doesn't <code>re.findall()</code> return the <code>o'clock</code> bit?
Tags
<python><regex><python-2.7>
Title
Differences in re.findall and re.finditer -- bug in Python 2.7 re module?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USLivius
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PODifferences in re.findall and re.finditer -- bug in Python 2.7 re module?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PODifferences in re.findall and re.finditer -- bug in Python 2.7 re module?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.