Note that there are some explanatory texts on larger screens.

plurals
  1. POImport all fields (and subfields) of XML as dataframe
    primarykey
    data
    text
    <p>To do some analysis I want to import a XML to a dataframe using R and the XML package. Example of XML file:</p> <pre><code>&lt;watchers shop_name="TEST" created_at="September 14, 2012 05:44"&gt; &lt;watcher channel="Site Name"&gt; &lt;code&gt;123456&lt;/code&gt; &lt;search_key&gt;TestKey&lt;/search_key&gt; &lt;date&gt;September 14, 2012 04:15&lt;/date&gt; &lt;result&gt;Found&lt;/result&gt; &lt;link&gt;http://www.test.com/fakeurl&lt;/link&gt; &lt;price&gt;100.0&lt;/price&gt; &lt;shipping&gt;0.0&lt;/shipping&gt; &lt;origposition&gt;0&lt;/origposition&gt; &lt;name&gt;Name Test&lt;/name&gt; &lt;results&gt; &lt;result position="1"&gt; &lt;c_name&gt;CTest1&lt;/c_name&gt; &lt;c_price&gt;599.49&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;599.49&lt;/c_total_price&gt; &lt;c_rating&gt;8.3&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt;&lt;result position="2"&gt; &lt;c_name&gt;CTest2&lt;/c_name&gt; &lt;c_price&gt;654.0&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;654.0&lt;/c_total_price&gt; &lt;c_rating&gt;9.8&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt; &lt;result position="3"&gt; &lt;c_name&gt;CTest3&lt;/c_name&gt; &lt;c_price&gt;654.0&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;654.0&lt;/c_total_price&gt; &lt;c_rating&gt;8.8&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt; &lt;/results&gt; &lt;/watcher&gt; &lt;/watchers&gt; </code></pre> <p>I want to have the rows of the dataframe containing the following fields:</p> <pre><code>shop_name created_at code search_key date result link price shipping origposition name position c_name c_price c_shipping c_total_price c_rating c_delivery </code></pre> <p>This means that the child nodes must be taken into account as well, which would result in a dataframe of three rows in this example (since the results show 3 positions). The fields</p> <pre><code>shop_name created_at code search_key date result link price shipping origposition name </code></pre> <p>are the same for each of these rows. </p> <p>I am able to go through the XML file, but I am unable to get a dataframe with the fields i want. When I convert the dataframe to a dataframe I get the following fields:</p> <pre><code>"code" "search_key" "date" "result" "link" "price" "shipping" "origposition" "name" "results" </code></pre> <p>Here the fields </p> <pre><code>shop_name created_at </code></pre> <p>are missing at the beginning and the 'results' are put together in a String under the column "results". </p> <p>It must be possible to get the wanted dataframe, but I do not know how to do this exactly. </p> <p><strong>UPDATE</strong></p> <p>The solution provided by @MvG works brilliantly on the test XML file stated above. However the column 'result' can also have the value "Not Found". Entries with this value will miss certain fields (always the same filed) and therefore yield a "number of columns of arguments do not match"-error when running the solution. I would like these entries to be put in the dataframe as well, with the fields that are not present left empty. I do not understand how to incorporate this scenario.</p> <p><strong>test.xml</strong></p> <pre><code>&lt;watchers shop_name="TEST" created_at="September 14, 2012 05:44"&gt; &lt;watcher channel="Site Name"&gt; &lt;code&gt;123456&lt;/code&gt; &lt;search_key&gt;TestKey&lt;/search_key&gt; &lt;date&gt;September 14, 2012 04:15&lt;/date&gt; &lt;result&gt;Found&lt;/result&gt; &lt;link&gt;http://www.test.com/fakeurl&lt;/link&gt; &lt;price&gt;100.0&lt;/price&gt; &lt;shipping&gt;0.0&lt;/shipping&gt; &lt;origposition&gt;0&lt;/origposition&gt; &lt;name&gt;Name Test&lt;/name&gt; &lt;results&gt; &lt;result position="1"&gt; &lt;c_name&gt;CTest1&lt;/c_name&gt; &lt;c_price&gt;599.49&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;599.49&lt;/c_total_price&gt; &lt;c_rating&gt;8.3&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt;&lt;result position="2"&gt; &lt;c_name&gt;CTest2&lt;/c_name&gt; &lt;c_price&gt;654.0&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;654.0&lt;/c_total_price&gt; &lt;c_rating&gt;9.8&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt; &lt;result position="3"&gt; &lt;c_name&gt;CTest3&lt;/c_name&gt; &lt;c_price&gt;654.0&lt;/c_price&gt; &lt;c_shipping&gt;0.0&lt;/c_shipping&gt; &lt;c_total_price&gt;654.0&lt;/c_total_price&gt; &lt;c_rating&gt;8.8&lt;/c_rating&gt; &lt;c_delivery/&gt; &lt;/result&gt; &lt;/results&gt; &lt;/watcher&gt; &lt;watcher channel="Shopping"&gt; &lt;code&gt;12804&lt;/code&gt; &lt;search_key&gt;&lt;/search_key&gt; &lt;date&gt;&lt;/date&gt; &lt;result&gt;Not found&lt;/result&gt; &lt;link&gt;https://www.test.com/testing1323p&lt;/link&gt; &lt;price&gt;0.0&lt;/price&gt; &lt;shipping&gt;0.0&lt;/shipping&gt; &lt;origposition&gt;0&lt;/origposition&gt; &lt;name&gt;MOOVM6002020&lt;/name&gt; &lt;results&gt; &lt;/results&gt; &lt;/watcher&gt; &lt;/watchers&gt; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload