Note that there are some explanatory texts on larger screens.

plurals
  1. POParallelizing massive inserts in SQL Server from C# (for better time performance)
    text
    copied!<p><strong>Problem statement : How to parallelize inserts in SQL Server (2008)</strong></p> <p>I am performing massive numeric computation for scientific research in C# multithreaded workers that basically do one thing : Test thousands of possible configurations (matrix combinations) through a time period (in days) and store the results into an SQL Server Database. <br><br>If i store the results one by one into DB (~300.000 lines per computing session * 100's of sessions), one after the other, I end up waiting for hours for the storing process to end.<br></p> <p>The database design is very simple : </p> <ul> <li><strong>Combination Sets</strong><br> CS_ID1, Value A1, Value B1, Value C1<br> CS_ID2, Value A2, Value B2, Value C2<br> <strong>.........</strong><br><br> </li> <li><strong>Results per Day</strong><br> CS_ID1, Day1,Result 1 <br> CS_ID1, Day2,Result 2 <br> CS_ID1, Day3,Result 3 <br> <strong>.........</strong><br><br> <strong>.........</strong><br> CS_ID2, Day1, Result N <br> CS_ID2, Day2, Result N+1 <br> CS_ID2, Day3, Result N+2 <br></li> </ul> <p>Each "Combination Set" is tested against sample days and its per-day results are processed in a single C# thread, where a LINQ/SQL query is generated and sent to DB just before the end of the thread. Except combination set IDs sequences, <strong>there is NO logical relation between Results</strong>. This is very important : This is why I thought of parallelizing the insert stuff as <strong><em>it basically amounts to a bulk dump of result blocks</em></strong> <br><br>Another detail that could be important is that <strong><em>it is possible to determine beforehand how much rows will be inserted into the Database</em></strong> (per block and in total). This probably could help organize table spaces, split them through pages, pre-fix id ranges in order to store blocks simultaneously, or something like that (No, i'm not "high" or something :-) )<br></p> <p>I welcome any kind of suggestions in order to make this insert time as short as possible.<br> <br>Please take into account that I am a C# developer, with very basic SQL Server knowledge and not very familiar with deep technical DBA concepts (I saw that Locking tweaks are VERY numerous, that there are multithreaded and asynchronous capabilities, too, but I have to admit I am lost alone in the forest :-) )<br><br> I Have 12 CPU Cores available, and 24Go RAM</p> <hr> <p><em>EDIT:</em> <strong>Tiebreaker</strong> <br> <em>I welcome any clever suggestion on monitoring time for the whole process : From C# threads inception/end to detailed SQl server insert reports (What happens when, how, and where).<br> I tried logging whith NLog but it drastically biases the processing time so I am looking for some smart workarounds that are pretty seamless with minimum impact. Same for the SQL server part : I know there are a couple of Logs and monitoring SP's available. I did not figure out yet which ones suit my situation.</em></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload