StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I/O and non-blocking I/O selection depends from your server activity profile. E.g. if you use long-living connections and thousands of clients I/O may become too expensive because of system resources exhaustion. However, direct I/O that doesn't crowd out CPU cache is faster than non-blocking I/O. There is a good article about that - <a href="http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html" rel="noreferrer">Writing Java Multithreaded Servers - whats old is new</a>.</p> <p>About context switch cost - it's rather chip operation. Consider the simple test below:</p> <pre><code>package com; import java.util.ArrayList; import java.util.List; import java.util.Random; import java.util.Set; import java.util.concurrent.ConcurrentSkipListSet; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; public class AAA { private static final long DURATION = TimeUnit.NANOSECONDS.convert(30, TimeUnit.SECONDS); private static final int THREADS_NUMBER = 2; private static final ThreadLocal<AtomicLong> COUNTER = new ThreadLocal<AtomicLong>() { @Override protected AtomicLong initialValue() { return new AtomicLong(); } }; private static final ThreadLocal<AtomicLong> DUMMY_DATA = new ThreadLocal<AtomicLong>() { @Override protected AtomicLong initialValue() { return new AtomicLong(); } }; private static final AtomicLong DUMMY_COUNTER = new AtomicLong(); private static final AtomicLong END_TIME = new AtomicLong(System.nanoTime() + DURATION); private static final List<ThreadLocal<CharSequence>> DUMMY_SOURCE = new ArrayList<ThreadLocal<CharSequence>>(); static { for (int i = 0; i < 40; ++i) { DUMMY_SOURCE.add(new ThreadLocal<CharSequence>()); } } private static final Set<Long> COUNTERS = new ConcurrentSkipListSet<Long>(); public static void main(String[] args) throws Exception { final CountDownLatch startLatch = new CountDownLatch(THREADS_NUMBER); final CountDownLatch endLatch = new CountDownLatch(THREADS_NUMBER); for (int i = 0; i < THREADS_NUMBER; i++) { new Thread() { @Override public void run() { initDummyData(); startLatch.countDown(); try { startLatch.await(); } catch (InterruptedException e) { e.printStackTrace(); } while (System.nanoTime() < END_TIME.get()) { doJob(); } COUNTERS.add(COUNTER.get().get()); DUMMY_COUNTER.addAndGet(DUMMY_DATA.get().get()); endLatch.countDown(); } }.start(); } startLatch.await(); END_TIME.set(System.nanoTime() + DURATION); endLatch.await(); printStatistics(); } private static void initDummyData() { for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { threadLocal.set(getRandomString()); } } private static CharSequence getRandomString() { StringBuilder result = new StringBuilder(); Random random = new Random(); for (int i = 0; i < 127; ++i) { result.append((char)random.nextInt(0xFF)); } return result; } private static void doJob() { Random random = new Random(); for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { for (int i = 0; i < threadLocal.get().length(); ++i) { DUMMY_DATA.get().addAndGet(threadLocal.get().charAt(i) << random.nextInt(31)); } } COUNTER.get().incrementAndGet(); } private static void printStatistics() { long total = 0L; for (Long counter : COUNTERS) { total += counter; } System.out.printf("Total iterations number: %d, dummy data: %d, distribution:%n", total, DUMMY_COUNTER.get()); for (Long counter : COUNTERS) { System.out.printf("%f%%%n", counter * 100d / total); } } } </code></pre> <p>I made four tests for two and ten thread scenarios and it shows performance loss is about 2.5% (78626 iterations for two threads and 76754 for ten threads), System resources are used by the threads approximately equally.</p> <p>Also <em>'java.util.concurrent'</em> authors suppose context switch time to be about 2000-4000 CPU cycles:</p> <pre><code>public class Exchanger<V> { ... private static final int NCPU = Runtime.getRuntime().availableProcessors(); .... /** * The number of times to spin (doing nothing except polling a * memory location) before blocking or giving up while waiting to * be fulfilled. Should be zero on uniprocessors. On * multiprocessors, this value should be large enough so that two * threads exchanging items as fast as possible block only when * one of them is stalled (due to GC or preemption), but not much * longer, to avoid wasting CPU resources. Seen differently, this * value is a little over half the number of cycles of an average * context switch time on most systems. The value here is * approximately the average of those across a range of tested * systems. */ private static final int SPINS = (NCPU == 1) ? 0 : 2000; </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload