Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I'm guessing that new files get introduced over time, and that's how things change?</p> <p>I reckon your best bet would be to go with something like your option 2. There's not much point pre-processing the files, if all you want to do is count occurrences of keywords. I'd just go through each file once, counting each time a word in your list appears. Personally I'd do it in Ruby, but a language like perl or python would also make this task pretty straightforward. E.g., you could use an associative array with the keywords as the keys, and a count of occurrences as the values. (But this might be too simplistic if you need to store more information about the occurrences).</p> <p>I'm not sure if you want to store information per file, or about the whole dataset? I guess that wouldn't be too hard to incorporate.</p> <p>I'm not sure about what to do with the data once you've got it -- exporting it to a spreadsheet would be fine, if that gives you what you need. Or you might find it easier in the long-run just to write a bit of extra code that displays the data nicely for you. Depends on what you want to do with the data (e.g. if you want to produce just a few charts at the end of the exercise and put them into a report, then exporting to CSV would probably make most sense, whereas if you want to generate a new set of data every day for a year then building a tool to do that automatically is almost certainly the best idea.</p> <p>Edit: I just figured out that since you're studying history, the chances are your documents are not changing over time, but rather reflect a set of changes that happened already. Sorry for misunderstanding that. Anyway, I think pretty much everything I said above still applies, but I guess you'll lean towards going with exporting to CSV or what have you rather than an automated display.</p> <p>Sounds like a fun project -- good luck!</p> <p>Ben</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload