Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is how I've tried to solve the problem previously. Basically you have a producer thread, like you have here, that reads the file and puts items onto the queue. Then you have a worker thread that reads things from the queue and processes them. Code is below, but it looks essentially the same as what you're doing. What I found is that this gives me just about no speed up, because the processing I need to do per line is pretty quick, relative to the disk read. If the parsing you have to do is pretty intensive, or the chunks are pretty large, you could see some speed up doing it this way. But if it's pretty minimal, don't expect to see much in the way of performance improvement, because the process is IO bound. In those situations, you need to parallelize the disk access, which you can't really do on a single machine.</p> <pre><code>public static LinkedBlockingQueue&lt;Pair&lt;String, String&gt;&gt; mappings; public static final Pair&lt;String, String&gt; end = new Pair&lt;String, String&gt;("END", "END"); public static AtomicBoolean done; public static NpToEntityMapping mapping; public static Set&lt;String&gt; attested_nps; public static Set&lt;Entity&gt; possible_entities; public static class ProducerThread implements Runnable { private File f; public ProducerThread(File f) { this.f = f; } public void run() { try { BufferedReader reader = new BufferedReader(new FileReader(f)); String line; while ((line = reader.readLine()) != null) { String entities = reader.readLine(); String np = line.trim(); mappings.put(new Pair&lt;String, String&gt;(np, entities)); } reader.close(); for (int i=0; i&lt;num_threads; i++) { mappings.put(end); } } catch (InterruptedException e) { System.out.println("Producer thread interrupted"); } catch (IOException e) { System.out.println("Producer thread threw IOException"); } } } public static class WorkerThread implements Runnable { private Dictionary dict; private EntityFactory factory; public WorkerThread(Dictionary dict, EntityFactory factory) { this.dict = dict; this.factory = factory; } public void run() { try { while (!done.get()) { Pair&lt;String, String&gt; np_ent = mappings.take(); if (np_ent == end) { done.set(false); continue; } String entities = np_ent.getRight(); String np = np_ent.getLeft().toLowerCase(); if (attested_nps == null || attested_nps.contains(np)) { int np_index = dict.getIndex(np); HashSet&lt;Entity&gt; entity_set = new HashSet&lt;Entity&gt;(); for (String entity : entities.split(", ")) { Entity e = factory.createEntity(entity.trim()); if (possible_entities != null) { possible_entities.add(e); } entity_set.add(e); } mapping.put(np_index, entity_set); } } } catch (InterruptedException e) { System.out.println("Worker thread interrupted"); } } } </code></pre> <p>EDIT:</p> <p>Here's code for the main thread that starts the producer and worker threads:</p> <pre><code> Thread producer = new Thread(new ProducerThread(f), "Producer"); producer.start(); ArrayList&lt;Thread&gt; workers = new ArrayList&lt;Thread&gt;(); for (int i=0; i&lt;num_threads; i++) { workers.add(new Thread(new WorkerThread(dict, factory), "Worker")); } for (Thread t : workers) { t.start(); } try { producer.join(); for (Thread t : workers) { t.join(); } } catch (InterruptedException e) { System.out.println("Main thread interrupted..."); } </code></pre> <p>It should also be fine to have the work done in the producer thread just be done in the main thread, taking out the need to start and join with another thread in the main code. Be sure to start the worker threads before going through the file, though, and join with them after you've done the work. I'm not sure about the performance differences between that way and the way I have here, though.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload