Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Usually <code>textscan</code> and <code>regexp</code> is the way to go when parsing string fields (as shown <a href="https://stackoverflow.com/q/17818267/1336150#17819161">here</a>):</p> <ol> <li><p>Read the input lines as strings with <code>textscan</code>:</p> <pre><code>fid = fopen('input.px', 'r'); C = textscan(fid, '%s', 'Delimiter', '\n'); fclose(fid); </code></pre></li> <li><p>Parse the header field names and values using <code>regexp</code>. Picking the right regular expression should do the trick!</p> <pre><code>X = regexp(C{:}, '^\s*([^=\(\)]+)\s*=\s*"([^"]+)"\s*', 'tokens'); X = [X{:}]; %// Flatten the cell array X = reshape([X{:}], 2, []); %// Reshape into name-value pairs </code></pre></li> <li><p>The "VALUE" fields may span over multiple lines, so they need to be concatenated first:</p> <pre><code>idx_data = find(~cellfun('isempty', regexp(C{:}, '^\s*Data')), 1); idx_values = find(~cellfun('isempty', regexp(C{:}, '^\s*VALUES'))); Y = arrayfun(@(m, n){[C{:}{m:m + n - 1}]}, ... idx_values(idx_values &lt; idx_data), diff([idx_values; idx_data])); </code></pre> <p>... and then tokenized:</p> <pre><code>Y = regexp(Y, '"([^,"]+)"', 'tokens'); %// Tokenize values Y = cellfun(@(x){{x{1}{1}, {[x{2:end}]}}}, Y); %// Group values in one array Y = reshape([Y{:}], 2, []); %// Reshape into name-value pairs </code></pre></li> <li><p>Make sure the field names are legal (I've decided to convert everything to lowercase and replace apostrophes and any whitespace with underscores), and plug them into a struct:</p> <pre><code>X = [X, Y]; %// Store all fields in one array X(1, :) = lower(regexprep(X(1, :), '-+|\s+', '_')); S = struct(X{:}); </code></pre></li> </ol> <p>Here's what I get for your input file (only the header fields):</p> <pre><code>S = charset: 'ANSI' matrix: 'BE001' subject_code: 'BE' subject_area: 'Population' title: 'Population by region, time, marital status and sex.' month: {1x12 cell} region: {1x5 cell} </code></pre> <p>As for the data itself, it needs to be handled separately:</p> <ol> <li><p>Extract data lines after the "Data" field and replace all <code>".."</code> values with default values (say, <code>NaN</code>):</p> <pre><code>D = strrep(C{:}(idx_data + 1:end), '".."', 'NaN'); </code></pre> <p>Obviously this assumes that there are only numerical data after the "Data" field. However, this can be easily modified if this is not case.</p></li> <li><p>Convert the data to a numerical matrix and add it to the structure:</p> <pre><code>D = cellfun(@str2num, D, 'UniformOutput', false); S.data = vertcat(D{:}) </code></pre></li> </ol> <p>And here's <code>S.data</code> for your input file:</p> <pre><code>S.data = NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 24.80000 34.20000 52.00000 23.00000 NaN 32.10000 40.30000 50.70000 1.00000 NaN 31.60000 35.00000 49.10000 2.30000 41.20000 43.00000 50.80000 60.10000 0.00000 50.90000 52.00000 53.90000 65.90000 0.00000 </code></pre> <p>Hope this helps!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload