Note that there are some explanatory texts on larger screens.

plurals
  1. POExtracting data from a string where the data structure is embedded in the string itself
    primarykey
    data
    text
    <p>In a project we are doing we encounter log files of which each line has the following structure:</p> <p>2012-01-02,12:50:32,658,2,1,2,0,0,0,0,1556,1555,62,60,2,3,0,0,0,0,1559,1557,1557,63,64,65,0.305,0.265,0.304,0.308,0.309</p> <p>The structure of the string is embedded in the string itself.</p> <p>First we have some metadata:</p> <ul> <li>date: 2012/01/02</li> <li>time: 12:50:32</li> <li>measurement number: 658</li> <li>number of measurement groups: 2</li> </ul> <p>This is then followed by the data of each group sequentially.</p> <ul> <li>Measurement group 1: 1,2,0,0,0,0,1556,1555,62,60</li> <li>Measurement group 2: 2,3,0,0,0,0,1559,1557,1557,63,64,65</li> </ul> <p>Group data has the following structure (measurement group 1 used below as an example):</p> <ul> <li>number of the measurement group:1</li> <li>number of sensors in this group:2</li> <li>control field 1 to 4 (0 most of the time):0,0,0,0</li> <li>raw values of type 1 for each sensor (>1500 in the examples):1556,1555</li> <li>raw values of type 2 for each sensor (~60 in the examples),62,60</li> </ul> <p>The line continues with the calculated values for all sensors mentioned above consecutively (i.e. no more control values, or raw values)</p> <p>In the example, the total number of sensors = 2 + 3 = 5 so the calculated line is:</p> <p>0.305,0.265,0.304,0.308,0.309</p> <p>My question is this: If we want to normalize the values for each sensor like this:</p> <p>date, time, number of measurement group, number sensor in group, (raw value type 1, raw value type 2, calculated value)</p> <p>What would be a flexible solution, given that a any date-time each variable is well... variable (meaning that the number of measurement group is variable, and the number of sensors in each group can also be variable?</p> <p>For the example final output should be something like:</p> <ul> <li>2012/01/02,12:50:32,1,1,(1556,62,0.305)</li> <li>2012/01/02,12:50:32,1,2,(1555,60,0.265)</li> <li>2012/01/02,12:50:32,2,1,(1559,63,0.304)</li> <li>2012/01/02,12:50:32,2,2,(1557,64,0.308)</li> <li>2012/01/02,12:50:32,2,3,(1557,65,0.309)</li> </ul> <p>What I did up to now is to segment the measurement into cases over time and define "statically" which columns are to be inserted for a line belonging to a case, which group a sensor belongs to, what its sensornumber is,...</p> <p>This is hardly a good solution as each change in the measurement setup results in more changes to the code.</p> <pre><code>line="""2012-01-02,12:50:32,658,2,1,2,0,0,0,0,1556,1555,62,60,2,3,0,0,0,0,1559,1557,1557,63,64,65,0.305,0.265,0.304,0.308,0.309""" parts=line.split(",") date=parts[0] groupnames=[1,1,2,2,2] sensornumbers=[1,2,1,2,3] raw_type1_idx=[10,11,20,21,22] raw_type2_idx=[12,13,23,24,25] calc_idx=[26,27,28,29,30] for i,j,k,l,m in zip(groupnames,sensornumbers,raw_type1_idx,raw_type2_idx,calc_idx): output_tpl= parts[k],parts[l],parts[m] print "%s,%s,%s,%s" % (date,i,j,output_tpl) </code></pre> <p>Is there a better Python way of doing stuff like this?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload