Note that there are some explanatory texts on larger screens.

plurals
  1. POImplementing proximity matrix for clustering
    primarykey
    data
    text
    <p>Please I am a little new to this field so pardon me if the question sound trivial or basic.</p> <p>I have a group of dataset(Bag of words to be specific) and I need to generate a proximity matrix by using their edit distance from each other to find and generate the proximity matrix .</p> <p>I am however quite confused how I will keep track of my data/strings in the matrix. I need the proximity matrix for the purpose of clustering.</p> <p>Or How generally do you approach this kinds of problem in the field. I am using perl and R to implement this.</p> <p>Here is a typical code in perl I have written that reads from a text file containing my bag of words</p> <pre><code>use strict ; use warnings ; use Text::Levenshtein qw(distance) ; main(@ARGV); sub main { my @TokenDistances ; my $Tokenfile = 'TokenDistinct.txt'; my @Token ; my $AppendingCount = 0 ; my @Tokencompare ; my %Levcount = (); open (FH ,"&lt; $Tokenfile" ) or die ("Error opening file . $!"); while(&lt;FH&gt;) { chomp $_; $_ =~ s/^(\s+)$//g; push (@Token , $_ ); } close(FH); @Tokencompare = @Token ; foreach my $tokenWord(@Tokencompare) { my $lengthoffile = scalar @Tokencompare; my $i = 0 ; chomp $tokenWord ; #@TokenDistances = levDistance($tokenWord , \@Tokencompare ); for($i = 0 ; $i &lt; $lengthoffile ;$i++) { if(scalar @TokenDistances == scalar @Tokencompare) { print "Yipeeeeeeeeeeeeeeeeeeeee\n"; } chomp $tokenWord ; chomp $Tokencompare[$i]; #print $tokenWord. " {$Tokencompare[$i]} " . " $TokenDistances[$i] " . "\n"; #$Levcount{$tokenWord}{$Tokencompare[$i]} = $TokenDistances[$i]; $Levcount{$tokenWord}{$Tokencompare[$i]} = levDistance($tokenWord , $Tokencompare[$i] ); } StoreSortedValues ( \%Levcount ,\$tokenWord , \$AppendingCount); $AppendingCount++; %Levcount = () ; } # %Levcount = (); } sub levDistance { my $string1 = shift ; #my @StringList = @{(shift)}; my $string2 = shift ; return distance($string1 , $string2); } sub StoreSortedValues { my $Levcount = shift; my $tokenWordTopMost = ${(shift)} ; my $j = ${(shift)}; my @ListToken; my $Tokenfile = 'LevResult.txt'; if($j == 0 ) { open (FH ,"&gt; $Tokenfile" ) or die ("Error opening file . $!"); } else { open (FH ,"&gt;&gt; $Tokenfile" ) or die ("Error opening file . $!"); } print $tokenWordTopMost; my %tokenWordMaster = %{$Levcount-&gt;{$tokenWordTopMost}}; @ListToken = sort { $tokenWordMaster{$a} cmp $tokenWordMaster{$b} } keys %tokenWordMaster; #@ListToken = keys %tokenWordMaster; print FH "-------------------------- " . $tokenWordTopMost . "-------------------------------------\n"; #print FH map {"$_ \t=&gt; $tokenWordMaster{$_} \n "} @ListToken; foreach my $tokey (@ListToken) { print FH "$tokey=&gt;\t" . $tokenWordMaster{$tokey} . "\n" } close(FH) or die ("Error Closing File. $!"); } </code></pre> <p>the problem is how can I represent the proximity matrix from this and still be able to keep track of which comparison represent which in my matrix.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload