Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Bruce Boughton's guess is correct. The GetChangedSpans method is not intended to be a general-purpose syntax diffing mechanism to take the difference between two syntax trees that have no shared history. Rather, it is intended to take two trees that have been produced by edits to a common tree, and determine which portions of the trees are different because of edits.</p> <p>If you had taken your first parse tree and inserted the new statement into it as an edit, then you would see a far smaller set of changes. </p> <p>It might help if I briefly describe how the Roslyn lexer and parser work, at a high level. </p> <p>The basic idea is that lexer-produced "syntax tokens" and parser-produced "syntax trees" are <em>immutable</em>. They never change. Because they never change, we can re-use parts of previous parse trees in new parse trees. (Data structures which have this property are often called "persistent" data structures.)</p> <p>Because we can re-use existing parts, we can, for example, use the same value for every instance of a given token, say <code>class</code>, that appears in the program. The length and content of every <code>class</code> token is exactly the same; the only things that distinguish two different <code>class</code> tokens are their <em>trivia</em>, (what spacing and comments surround them) and their <em>position</em>, and their <em>parent</em> -- what larger syntax node contains the token.</p> <p>When you parse a block of text we generate syntax tokens and syntax trees in a peristent, immutable form, which we call the "green" form. We then wrap up the green nodes in a "red" layer. The green layer knows nothing about position, parents, and so on. The red layer does. (The whimsical names are due to the fact that when we first drew this data structure on a whiteboard, those are the colours that we used.) When you create an edit to a given syntax tree, we look at the previous syntax tree, identify the nodes which changed, and then build new nodes <em>only on the spine of the changes</em>. All the other branches of the green tree stay the same.</p> <p>When diffing two trees, basically what we do is <em>take the set difference of the green nodes</em>. If one of the trees was produced by editing the other, then <em>almost all the green nodes will be the same</em> because only the spine was rebuilt. The tree diffing algorithm will identify the changed nodes and work out the affected spans.</p> <p>If the two trees have no history in common then the only green nodes they'll have in common are the individual tokens, which, as I said before, are re-used everywhere. Every higher-level green syntax node will be a different green node, and therefore be treated as different by the tree difference engine, even if its text is the same.</p> <p>The <em>purpose</em> of this method is to allow the editor code to rapidly make a conservative guess about what portions of a text buffer need to be, say, recolourized, after an edit, or an undo, or some such thing. The assumption is that the trees have a historical relationship. The intention is not to provide a general-purpose textual difference mechanism; there are plenty of great tools for that already.</p> <p>Imagine, for example, that you had pasted your first program into the editor, then highlighted the whole thing, then pasted the second program into the editor. One would reasonably expect that the editor would not waste time trying to figure out what portions of the pasted-down code happened to be identical with the previously-pasted code. That could be very expensive and the answer is likely to be "not much". Rather, the editor makes the conservative assumption that the entire pasted-over region is brand-new and entirely different code. It doesn't spend any time trying to make correspondences between the old code and the new code; it reparses and therefore recolourizes the whole thing. </p> <p>If, on the other hand you had just pasted in the single different statement, then the editing engine would simply insert the edit into the right place. The parse tree would be regenerated <em>re-using the existing green nodes where possible</em>, and the difference engine would identify what spans need to be re-colourized: the ones with different green nodes.</p> <p>Does that all make sense?</p> <h2>UPDATE:</h2> <p>Ha, apparently Kevin and I were both typing out the same answer at the same time, in adjoining offices. A bit of duplicated effort, but I think both answers have good perspectives on the situation. :-)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload