StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I think that your design sounds reasonable. However, there are a few things to keep in mind. First, are you sure that adding in the extra complexity is justified? That is, is the performance hit of writing to a bunch of files and then reading them back in an important bottleneck? If the wasted time isn't important, I would strongly caution you against making this change. You'd just be increasing the complexity of the system without much of a benefit. I assume that you've thought about this already, but just in case you haven't I thought I'd post that here.</p> <p>Second, have you considered using memory-mapped files via <code>MappedByteBuffer</code>? If you're dealing with huge objects that are exceeding the Java heap space and are willing to put in a bit of effort, you might want to consider designing the objects so that they are stored in memory-mapped files. You could do this by creating a wrapper class that is essentially a thin wrapper that translates requests into operations in the mapped byte buffer. For example, if you want to store a list of requests, you could do so by creating an object that uses <code>MappedByteBuffer</code> to store a list of strings on-disk. The strings could be stored separated by newlines or null terminators, for example. You could then iterate across the strings by walking across the bytes of the file and rehydrating them. The advantage of this approach is that it offloads the caching complexity to the operating system, which has been performance-tuned for decades (assuming you're using a major OS!) to handle this case efficiently. I've worked on a Java project once where I built a framework to automate this, and it worked wonderfully in many cases. It's definitely a bit of a learning curve to get over, but once it works you can hold way more data in Java heap space than you could have before. This does essentially what you proposed above, except it trades a bit of up-front implementation complexity to let the OS handle all of the caching.</p> <p>Third, is there a way to combine passes (1) and (2)? That is, could you generate the XML file at the same time that you're generating the database? I assume from your description that the issue is that you can't generate the XML until all of the entries are ready. However, you might want to consider creating several different files on disk that each store objects of one type in the serialized XML format, and could at the end of the pass use a standard command-line utility like <code>cat</code> to join them all together. Since this can be accomplished simply by doing bulk byte concatenation rather than having to parse the database contents, this could be much faster (and easier to implement) than your proposed approach. If the files are still hot in the OS cache (which they probably are, since you've just been writing to them) this might actually be faster than your current approach.</p> <p>Fourth, if performance is your concern, have you considered parallelizing your code? Given staggeringly huge files to process, you could consider splitting that file into lots of smaller regions. Each task would then read from the file and distribute the pieces into the proper output files. You could then have a final process to merge together identical files and produce the overall XML report. Since I assume that this is a mostly I/O-bound operation (it's mostly just file reading), this could give you a much bigger performance win than a single-threaded approach that tries to keep everything in memory.</p> <p>Hope this helps!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload