Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to remove trailing comments via regexp?
    text
    copied!<p>For non-MATLAB-savvy readers: not sure what family they belong to, but the MATLAB regexes are described <a href="http://www.mathworks.nl/help/matlab/matlab_prog/regular-expressions.html" rel="noreferrer">here</a> in full detail. MATLAB's comment character is <code>%</code> (percent) and its string delimiter is <code>'</code> (apostrophe). A string delimiter inside a string is written as a double-apostophe (<code>'this is how you write "it''s" in a string.'</code>). To complicate matters more, the matrix transpose operators are <em>also</em> apostrophes (<code>A'</code> (Hermitian) or <code>A.'</code> (regular)). </p> <p>Now, for dark reasons (that I will <strong>not</strong> elaborate on :), I'm trying to interpret MATLAB code in MATLAB's own language. </p> <p>Currently I'm trying to remove all trailing comments in a cell-array of strings, each containing a line of MATLAB code. At first glance, this might seem simple: </p> <pre><code>&gt;&gt; str = 'simpleCommand(); % simple trailing comment'; &gt;&gt; regexprep(str, '%.*$', '') ans = simpleCommand(); </code></pre> <p>But of course, something like this might come along:</p> <pre><code>&gt;&gt; str = ' fprintf(''%d%*c%3.0f\n'', value, args{:}); % Let''s do this! '; &gt;&gt; regexprep(str, '%.*$', '') ans = fprintf(' %// &lt;-- WRONG! </code></pre> <p>Obviously, we need to exclude all comment characters that reside inside strings from the match, while also taking into account that a single apostrophe (or a dot-aposrotphe) directly following a statement is an <em>operator</em>, not a string delimiter.</p> <p>Based on the assumption that the amount of string opening/closing characters <em>before</em> the comment character must be <em>even</em> (which I know is incomplete, because of the matrix-transpose operator), I conjured up the following dynamic regex to handle this sort of case: </p> <pre><code>&gt;&gt; str = { 'myFun( {''test'' ''%''}); % let''s ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); % it''s ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); % let''s ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); ' 'A = A.'';%tight trailing comment' }; &gt;&gt; &gt;&gt; C = regexprep(str, '(^.*)(?@mod(sum(\1==''''''''),2)==0;)(%.*$)', '$1') </code></pre> <p>However, </p> <pre><code>C = 'myFun( {'test' '%'}); ' %// sucess 'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess 'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess 'sprintf(str, '%*8.0f%*s%c' %// FAIL 'A = A.';' %// success (although I'm not sure why) </code></pre> <p>so I'm <em>almost</em> there, but not quite yet :) </p> <p>Unfortunately I've exhausted the amount of time I can spend thinking about this and need to continue with other things, so perhaps someone else who has more time is friendly enough to think about these questions: </p> <ol> <li>Are comment characters inside strings the <em>only</em> exception I need to look out for? </li> <li>What is the correct and/or more efficient way to do this? </li> </ol>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload