Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I don't know how this is commonly done, but I can think of one crude way to define a notion of correlation that captures word adjacency.</p> <p>Suppose the text has length N, say it is an array</p> <pre><code>text[0], text[1], ..., text[N-1] </code></pre> <p>Suppose the following words appear in the text</p> <pre><code>word[0], word[1], ..., word[k] </code></pre> <p>For each word word[i], define a vector of length N-1</p> <pre><code>X[i] = array(); // of length N-1 </code></pre> <p>as follows: the ith entry of the vector is 1 if the word is either the ith word or the (i+1)th word, and zero otherwise. </p> <pre><code>// compute the vector X[i] for (j = 0:N-2){ if (text[j] == word[i] OR text[j+1] == word[i]) X[i][j] = 1; else X[i][j] = 0; } </code></pre> <p>Then you can compute the correlation coefficient between word[a] and word[b] as the dot product between X[a] and X[b] (note that the dot product is the number of times these words are adjacent) divided by the lenghts (the length is the square root of the number of appearances of the word, well maybe twice that). Call this quantity COR(X[a],X[b]). Clearly COR(X[a],X[a]) = 1, and COR(X[a],X[b]) is larger if word[a], word[b] are often adjacent.</p> <p>This can be generalized from "adjacent" to other notions of near - for example we could have chosen to use 3 word (or 4, 5, etc.) blocks instead. One can also add weights, probably do many more things as well if desired. One would have to experiment to see what is useful, if any of it is of use at all.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload