Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Your question got me wondering about performance. Lately I've been using Guava's Splitter where possible, just because I dig the syntax. I've never measured performance, so I put together a quick test of four parsing styles. I put these together really quickly, so pardon mistakes in style and edge-case correctness. They're based on the understanding that we're only interested in the second and fourth items.</p> <p>What I found interesting is that the "homeGrown" (really crude code) solution is the fastest when parsing a 350MB tab-delimited text file (with four columns), ex:</p> <pre><code>head test.txt 0 0 0 0 1 2 3 4 2 4 6 8 3 6 9 12 </code></pre> <p>When operating over 350MB of data on my laptop, I got the following results:</p> <ul> <li>homegrown: 2271ms </li> <li>guavaSplit: 3367ms</li> <li>regex: 7302ms </li> <li>tokenize: 3466ms</li> </ul> <p>Given that, I think I'll stick with Guava's splitter for most work and consider custom code for larger data sets.</p> <pre><code> public static List&lt;String&gt; tokenize(String line){ List&lt;String&gt; result = Lists.newArrayList(); StringTokenizer st = new StringTokenizer(line, "\t"); st.nextToken(); //get rid of the first token result.add(st.nextToken()); //show me the second token st.nextToken(); //get rid of the third token result.add(st.nextToken()); //show me the fourth token return result; } static final Splitter splitter = Splitter.on('\t'); public static List&lt;String&gt; guavaSplit(String line){ List&lt;String&gt; result = Lists.newArrayList(); int i=0; for(String str : splitter.split(line)){ if(i==1 || i==3){ result.add(str); } i++; } return result; } static final Pattern p = Pattern.compile("^(.*?)\\t(.*?)\\t(.*?)\\t(.*)$"); public static List&lt;String&gt; regex(String line){ List&lt;String&gt; result = null; Matcher m = p.matcher(line); if(m.find()){ if(m.groupCount()&gt;=4){ result= Lists.newArrayList(m.group(2),m.group(4)); } } return result; } public static List&lt;String&gt; homeGrown(String line){ List&lt;String&gt; result = Lists.newArrayList(); String subStr = line; int cnt = -1; int indx = subStr.indexOf('\t'); while(++cnt &lt; 4 &amp;&amp; indx != -1){ if(cnt==1||cnt==3){ result.add(subStr.substring(0,indx)); } subStr = subStr.substring(indx+1); indx = subStr.indexOf('\t'); } if(cnt==1||cnt==3){ result.add(subStr); } return result; } </code></pre> <p>Note that all of these would likely be slower with proper bound checking and more elegant implementation.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload