Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This will work:</p> <pre><code>&gt;&gt;&gt; import re &gt;&gt;&gt; rx_sequence=re.compile(r"^(.+?)\n\n((?:[A-Z]+\n)+)",re.MULTILINE) &gt;&gt;&gt; rx_blanks=re.compile(r"\W+") # to remove blanks and newlines &gt;&gt;&gt; text="""Some varying text1 ... ... AAABBBBBBCCCCCCDDDDDDD ... EEEEEEEFFFFFFFFGGGGGGG ... HHHHHHIIIIIJJJJJJJKKKK ... ... Some varying text 2 ... ... LLLLLMMMMMMNNNNNNNOOOO ... PPPPPPPQQQQQQRRRRRRSSS ... TTTTTUUUUUVVVVVVWWWWWW ... """ &gt;&gt;&gt; for match in rx_sequence.finditer(text): ... title, sequence = match.groups() ... title = title.strip() ... sequence = rx_blanks.sub("",sequence) ... print "Title:",title ... print "Sequence:",sequence ... print ... Title: Some varying text1 Sequence: AAABBBBBBCCCCCCDDDDDDDEEEEEEEFFFFFFFFGGGGGGGHHHHHHIIIIIJJJJJJJKKKK Title: Some varying text 2 Sequence: LLLLLMMMMMMNNNNNNNOOOOPPPPPPPQQQQQQRRRRRRSSSTTTTTUUUUUVVVVVVWWWWWW </code></pre> <hr> <p>Some explanation about this regular expression might be useful: <code>^(.+?)\n\n((?:[A-Z]+\n)+)</code></p> <ul> <li>The first character (<code>^</code>) means "starting at the beginning of a line". Be aware that it does not match the newline itself (same for $: it means "just before a newline", but it does not match the newline itself).</li> <li>Then <code>(.+?)\n\n</code> means "match as few characters as possible (all characters are allowed) until you reach two newlines". The result (without the newlines) is put in the first group.</li> <li><code>[A-Z]+\n</code> means "match as many upper case letters as possible until you reach a newline. This defines what I will call a <em>textline</em>.</li> <li><code>((?:</code><em>textline</em><code>)+)</code> means match one or more <em>textlines</em> but do not put each line in a group. Instead, put <strong>all</strong> the <em>textlines</em> in one group.</li> <li>You could add a final <code>\n</code> in the regular expression if you want to enforce a double newline at the end.</li> <li>Also, if you are not sure about what type of newline you will get (<code>\n</code> or <code>\r</code> or <code>\r\n</code>) then just fix the regular expression by replacing every occurrence of <code>\n</code> by <code>(?:\n|\r\n?)</code>.</li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload