StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POParse a PC-Axis (.px) file in Matlab
text
Body
copied!<p><strong>Background</strong>: PC-Axis is a file format format used for dissemination of statistical information. The format is used by a number of national statistical organisations to disseminate official statistics.</p> <p>A PC-Axis file looks a little like this, although they're usually a lot longer:</p> <pre><code>CHARSET=”ANSI”; MATRIX="BE001"; SUBJECT-CODE="BE"; SUBJECT-AREA="Population"; TITLE="Population by region, time, marital status and sex."; Data= ".." ".." ".." ".." ".." ".." ".." ".." ".." ".." ".." 24.80 34.20 52.00 23.00 ".." 32.10 40.30 50.70 1.00 ".." 31.60 35.00 49.10 2.30 41.20 43.00 50.80 60.10 0.00 50.90 52.00 53.90 65.90 0.00 28.90 31.80 39.60 51.00 0.00; </code></pre> <p>More details about PC-Axis files can be found at the <a href="http://www.scb.se/Pages/List____314011.aspx" rel="nofollow">Statistics Sweden website</a>, but the basic gist is that the metadata is positioned at the top of the file and after "DATA=" is the actual data itself. It's also worth noting that the data is organized more like a data-table rather than in columns.</p> <p><strong>The Problem</strong>: I'd like to parse a PC-Axis file using Matlab, but I'm a little stumped as to how to go about doing it. Does anyone know how to parse one of these files in Matlab? Would it be easier to parse this type of file using some other language, like Perl, and then import the data into Matlab, or, would Matlab be a suitable enough tool for the job? Note that the plan would be to analyze the data in Matlab after the text processing stage.</p> <p>I've tried using Matlab's text processing tools such as fgetl, textscan, fscanf, and a few others, but it's terribly tricky. Does anyone have any pointers on how to go about doing it? </p> <p>Essentially, I'd like to store each of the keywords (CHARSET, MATRIX, etc.) and their corresponding values (ANSI, BE001, etc.) as metadata in Matlab - as a structure, perhaps. I'd like to have the data stored in Matlab also - as a matrix, for example.</p> <p><strong>Note</strong>: I'm aware of the <a href="http://cran.r-project.org/web/packages/pxR/index.html" rel="nofollow">pxR package (CRAN)</a> in R, which works a treat for reading .px files into the workspace as a data.frame object. There's also a Perl module called <a href="http://search.cpan.org/~fod/Data-PcAxis-0.0.6/" rel="nofollow">Data::PcAxis (CPAN)</a> that is also very good, but I'm specifically wanting to know how to parse a .px file using Matlab.</p> <p><strong>UPDATE:</strong> I should have mentioned that in addition to <em>metadata</em> and <em>data</em>, there are also <em>variables</em>. This is best explained by an example. The example PC-Axis file below is the same as the one above except I've added two variables. They're named VALUES("Month") and VALUES("region") and are positioned <em>after</em> the metadata and <em>before</em> the data.</p> <pre><code>CHARSET=”ANSI”; MATRIX="BE001"; SUBJECT-CODE="BE"; SUBJECT-AREA="Population"; TITLE="Population by region, time, marital status and sex."; VALUES("Month")="1976M01","1976M02","1976M03","1976M04", "1976M05","1976M06","1976M07","1976M08", "1976M09","1976M10","1976M11","1976M12"; VALUES("region")="Sweden","Germany","France", "Ireland","Finland"; Data= ".." ".." ".." ".." ".." ".." ".." ".." ".." ".." ".." 24.80 34.20 52.00 23.00 ".." 32.10 40.30 50.70 1.00 ".." 31.60 35.00 49.10 2.30 41.20 43.00 50.80 60.10 0.00 50.90 52.00 53.90 65.90 0.00 28.90 31.80 39.60 51.00 0.00; </code></pre> <p>Textscan works a treat when reading in each line of the text file as a string (in a cell array). However, the elements after the "=" sign for both of the variables (i.e. VALUES("Month") and VALUES("region")) span more than one line. It seems that using textscan in this case means that some strings would have to be concatenated, say, for example, in order to collect the list of months (1976M01 to 1976M12). </p> <p><strong>Question</strong>: What's the best way to collect the variables data? Read the text file as a single string and then use strtok twice to extract the substring of dates? Perhaps, there's a better (more systematic) way?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload