Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I believe this is a solution that uses some of the essentials of your approach. When it recognises a date it lops it off the beginning of the line and saves it for subsequent use. Similarly it lops numeric items from the right ends of lines when they are present leaving the unstructured text.</p> <pre><code>lines = '''\ 20 Sep This is the first record, bla bla bla 10.45 Text unstructured of the second record bla bla 406.25 10001 6 Oct Text of the third record thatspans on many lines bla bla bla 60 28 Nov Fourth record 27.43 Second record of the day/month BUT the fifth record of the file 500 90.25''' from string import split, join days_in_month = [ str ( item ) for item in range ( 1, 31 ) ] months_in_year = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ] lines = [ line . strip ( ) for line in split ( lines, '\n' ) if line ] previous_date = None previous_month = None for line in lines : item = split ( line ) #~ print item if len ( item ) &gt;= 2 and item [ 0 ] in days_in_month and item [ 1 ] in months_in_year : previous_date = item [ 0 ] previous_month = item [ 1 ] item . pop ( 0 ) item . pop ( 0 ) try : number_2 = float ( item [ -1 ] ) item . pop ( -1 ) except : number_2 = None number_1 = None if not number_2 is None : try : number_1 = float ( item [ -1 ] ) item . pop ( -1 ) except : number_1 = None if number_1 is None and not number_2 is None : number_1 = number_2 number_2 = None if number_1 and number_1 == int ( number_1 ) : number_1 = int ( number_1 ) if number_2 and number_2 == int ( number_2 ) : number_2 = int ( number_2 ) print previous_date, previous_month, join ( item ), number_1, number_2 </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload