Note that there are some explanatory texts on larger screens.

plurals
  1. POSpring Batch: migrating 1 to n relationship where n is potentially huge
    primarykey
    data
    text
    <p>I am experienced with Spring, but new to Spring Batch. Now I have the task to migrate a data structure from a simple structure in one database to a complexer one in the other. The data structure corresponds to an object hierarchy that I will name like this </p> <pre><code>OldParent 1 --&gt; n OldChild // old system NewParent 1 --&gt; n NewChild // new system </code></pre> <p>In the old db, there are only two tables, in the new system, things get a lot more complex and there are 8 tables, but that is irrelevant for now.</p> <p>Basically I would like to use a simple JDBC-based solution with rowmappers reading from OldParent and converting to NewParent.</p> <p>So here would be a basic configuration snippet:</p> <pre><code>&lt;batch:job id="migration"&gt; &lt;batch:step id="convertLegacyData"&gt; &lt;batch:tasklet&gt; &lt;batch:chunk reader="parentReader" writer="parentWriter" commit-interval="200" /&gt; &lt;/batch:tasklet&gt; &lt;/batch:step&gt; &lt;/batch:job&gt; </code></pre> <p>In this scenario, the parentReader would acquire and convert the OldChild objects, probably delegating to a childReader / childWriter objects.</p> <p>The problem is this: while there are several hundred thousand Parents, each Parent can have zero to several million children, so the commit-interval based on parent would not help at all, but I would very much like to have a configurable commit interval.</p> <p>So another solution would be to make the workflow child-based:</p> <pre><code>&lt;batch:job id="migration"&gt; &lt;batch:step id="convertLegacyData"&gt; &lt;batch:tasklet&gt; &lt;batch:chunk reader="childReader" writer="childWriter" commit-interval="200" /&gt; &lt;/batch:tasklet&gt; &lt;/batch:step&gt; &lt;/batch:job&gt; </code></pre> <p>In this scenario, the childReader would have to also read OldParent objects and write NewParents, delegating to parentReader and parentWriter objects. The major drawback here is that I am losing all OldParents that don't have associated OldChild objects.</p> <p>The third possible scenario would be to have two different workflows for <code>OldParent -&gt; NewParent</code> and <code>OldChild -&gt; NewChild</code>. (I would have to maintain a mapping table that stores the relationship between OldParent and NewParent ids, but I could use standard configurations including commit-interval.</p> <p>Are there other possibilities? Which of these would you recommend as best practice?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload