StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
14149257
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2013-01-04T00:03:21.560
FavoriteCount
0
LastActivityDate
2013-01-04T00:12:12.817
LastEditDate
2013-01-04T00:12:12.817
LastEditorUserId
247533
OwnerUserId
247533
ParentId
14148296
PostTypeId
2
Score
7
ViewCount
0
LastEditorDisplayName
text
Body
An arbitrary <code>Long</code> is about 19.5 ASCII digits long, but only 8 bytes long, so you'll gain a savings of a factor of ~2 if you write it in binary. Now, it may be that most of the values are not actually taking all 8 bytes, in which case you could define some compression scheme yourself. In any case, you are probably best off writing block data using <code>java.nio.ByteBuffer</code> and friends. Binary data is most efficiently read in blocks, and you might want your file to be randomly accessible, in which case you want your data to look something like so: <pre><code><some unique binary header that lets you check the file type> <int saying how many records you have> <offset of the first record> <offset of the second record> ... <offset of the last record> <int><int><length of vector><long><long>...<long> <int><int><length of vector><long><long>...<long> ... <int><int><length of vector><long><long>...<long> </code></pre> This is a particularly convenient format for reading and writing using <code>ByteBuffer</code> because you know in advance how big everything is going to be. So you can <pre><code>val fos = new FileOutputStream(myFileName) val fc = fos.getChannel // java.nio.channel.FileChannel val header = ByteBuffer.allocate(28) header.put("This is my cool header!!".getBytes) header.putInt(data.length) fc.write(header) val offsets = ByteBuffer.allocate(8*data.length) data.foldLeft(28L+8*data.length){ (n,d) => offsets.putLong(n) n = n + 12 + d.vector.length*8 } fc.write(offsets) ... </code></pre> and on the way back in <pre><code>val fis = new FileInputStream(myFileName) val fc = fis.getChannel val header = ByteBuffer.allocate(28) fc.read(header) val hbytes = new Array[Byte](24) header.get(hbytes) if (new String(hbytes) != "This is my cool header!!") ??? val nrec = header.getInt val offsets = ByteBuffer.allocate(8*nrec) fc.read(offsets) val offsetArray = offsets.getLongs(nrec) // See below! ... </code></pre> There are some handy methods on <code>ByteBuffer</code> that are absent, but you can add them on with implicits (here for Scala 2.10; with 2.9 make it a plain class, drop the <code>extends AnyVal</code>, and supply an implicit conversion from <code>ByteBuffer</code> to <code>RichByteBuffer</code>): <pre><code>implicit class RichByteBuffer(val b: java.nio.ByteBuffer) extends AnyVal { def getBytes(n: Int) = { val a = new Array[Byte](n); b.get(a); a } def getShorts(n: Int) = { val a = new Array[Short](n); var i=0; while (i<n) { a(i)=b.getShort(); i+=1 } ; a } def getInts(n: Int) = { val a = new Array[Int](n); var i=0; while (i<n) { a(i)=b.getInt(); i+=1 } ; a } def getLongs(n: Int) = { val a = new Array[Long](n); var i=0; while (i<n) { a(i)=b.getLong(); i+=1 } ; a } def getFloats(n: Int) = { val a = new Array[Float](n); var i=0; while (i<n) { a(i)=b.getFloat(); i+=1 } ; a } def getDoubles(n: Int) = { val a = new Array[Double](n); var i=0; while (i<n) { a(i)=b.getDouble(); i+=1 } ; a } } </code></pre> Anyway, the reason to do things this way is that you'll end up with decent performance, which is also a consideration when you have tens of gigabytes of data (which it sounds like you have given hundreds of thousands of vectors of length up to ten thousand). If your problem is actually much smaller, then don't worry so much about it--pack it into XML or use JSON or some custom text solution (or use <code>DataOutputStream</code> and <code>DataInputStream</code>, which don't perform as well and won't give you random access). If your problem is actually bigger, you can define two lists of longs; first, the ones that will fit in an <code>Int</code>, say, and then the ones that actually need a full <code>Long</code> (with indices so you know where they are). Data compression is a very case-specific task--assuming you don't just want to use <code>java.util.zip</code>--so without a lot more knowledge about what the data looks like, it's hard to know what to recommend beyond just storing it as a weakly hierarchical binary file as I've described above.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POStoring a sequence of Long efficiently in a file in Scala
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USRex Kerr
UserOwnerUserId
1. USRex Kerr
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POStoring a sequence of Long efficiently in a file in Scala
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.