StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
19871770
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2013-11-09T02:38:19.847
FavoriteCount
0
LastActivityDate
2013-11-09T02:38:19.847
LastEditDate
LastEditorUserId
0
OwnerUserId
479863
ParentId
19866226
PostTypeId
2
Score
4
ViewCount
0
LastEditorDisplayName
text
Body
I see a few possible problems. First of all, this: <pre><code>@doc = Nokogiri::XML(sympFile) </code></pre> will slurp the whole XML file into memory as some sort of libxml2 data structure and that will probably be larger than the raw XML file. Then you do things like this: <pre><code>@doc.xpath(...).each </code></pre> That may not be smart enough to produce an enumerator that just maintains a pointer to the internal form of the XML, it might be producing a copy of everything when it builds the <code>NodeSet</code> that <code>xpath</code> returns. That would give you another copy of most of the expanded-in-memory version of the XML. I'm not sure how much copying and array construction happens here but there is room for a fair bit of memory and CPU overhead even if it doesn't copy duplicate everything. Then you make your copy of what you're interested in: <pre><code>symptomsList.push([signId, name]) </code></pre> and finally iterate over that array: <pre><code>symptomsList.each do |x| Symptom.where(:name => x[1], :signid => Integer(x[0])).first_or_create end </code></pre> I find that <a href="http://nokogiri.org/Nokogiri/HTML/SAX.html" rel="nofollow">SAX parsers</a> work better with large data sets but they are more cumbersome to work with. You could try creating your own SAX parser something like this: <pre><code>class D < Nokogiri::XML::SAX::Document def start_element(name, attrs = [ ]) if(name == 'DisorderSign') @data = { } elsif(name == 'ClinicalSign') @key = :sign @data[@key] = '' elsif(name == 'SignFreq') @key = :freq @data[@key] = '' elsif(name == 'Name') @in_name = true end end def characters(str) @data[@key] += str if(@key && @in_name) end def end_element(name, attrs = [ ]) if(name == 'DisorderSign') # Dump @data into the database here. @data = nil elsif(name == 'ClinicalSign') @key = nil elsif(name == 'SignFreq') @key = nil elsif(name == 'Name') @in_name = false end end end </code></pre> The structure should be pretty clear: you watch for the opening of the elements that you're interested in and do a bit of bookkeeping set up when the do, then cache the strings if you're inside an element you care about, and finally clean up and process the data as the elements close. You're database work would replace the <pre><code># Dump @data into the database here. </code></pre> comment. This structure makes it pretty easy to watch for the <code><Disorder id="17601"></code> elements so that you can keep track of how far you've gone. That way you can stop and restart the import with some small modifications to your script.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POParsing Large XML with Nokogiri
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USmu is too short
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POParsing Large XML with Nokogiri
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.