Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The first step would be to create a function that can generate an n-gram for a given string. One way to do this in a vectorized fashion is with some clever indexing.</p> <pre><code>function [subStrings, counts] = n_gram(fullString, N) if (N == 1) [subStrings, ~, index] = unique(cellstr(fullString.')); %.'# Simple case else nString = numel(fullString); index = hankel(1:(nString-N+1), (nString-N+1):nString); [subStrings, ~, index] = unique(cellstr(fullString(index))); end counts = accumarray(index, 1); end </code></pre> <p>This uses the function <a href="http://www.mathworks.com/help/techdoc/ref/hankel.html" rel="nofollow noreferrer">HANKEL</a> to first create a matrix of indices that will select each set of unique N-length substrings from the given string. Indexing the given string with this index matrix will create a character array with one N-length substring per row. The function <a href="http://www.mathworks.com/help/techdoc/ref/cellstr.html" rel="nofollow noreferrer">CELLSTR</a> then places each row of the character array into a cell of a cell array. The function <a href="http://www.mathworks.com/help/techdoc/ref/unique.html" rel="nofollow noreferrer">UNIQUE</a> then removes repeated substrings, and the function <a href="http://www.mathworks.com/help/techdoc/ref/accumarray.html" rel="nofollow noreferrer">ACCUMARRAY</a> is used to count the occurrences of each unique substring (if they are needed for any reason).</p> <p>With the above function you can then easily count the number of n-grams shared between two strings using the <a href="http://www.mathworks.com/help/techdoc/ref/intersect.html" rel="nofollow noreferrer">INTERSECT</a> function:</p> <pre><code>subStrings1 = n_gram('tool',2); subStrings2 = n_gram('fool',2); sharedStrings = intersect(subStrings1,subStrings2); nShared = numel(sharedStrings); </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload