Note that there are some explanatory texts on larger screens.

plurals
  1. POincomprehensible time consumed in using memory mapped file
    text
    copied!<p>I am writing a routine to compare two files using memory-mapped file. In case files are too big to be mapped at one go. I split the files and map them part by part. For example, to map a 1049MB file, I split it into 512MB + 512MB + 25MB.</p> <p>Every thing works fine except one thing: it always take much, much longer to compare the remainder (25MB in this example), though the code logic is exactly the same. 3 observations:</p> <ol> <li>it does not matter which is compared first, whether the main part (512MB * N) or the remainder (25MB in this example) comes first, the result remains the same</li> <li>the extra time in the remainder seems to be spent in the <strong><em>user mode</em></strong></li> <li>Profiling in VS2010 beta 1 shows, the time is spent inside t <code>std::_Equal()</code>, but this function is mostly (profiler says 100%) waiting for I/O and other threads.</li> </ol> <p>I tried</p> <ul> <li>changing the VIEW_SIZE_FACTOR to another value</li> <li>replacing the lambda functor with a member function</li> <li>changing the file size under test</li> <li>changing the order of execution of the remainder to before/after the loop</li> </ul> <p>The result was quite consistent: it takes a lot more time in the remainder part and in the <strong>User Mode</strong>. </p> <p>I suspect it has something to do with the fact that the mapped size is not a multiple of mapping alignment (64K on my system), but not sure how.</p> <p>Below is the complete code for the routine and a timing measured for a 3G file.</p> <p>Can anyone please explain it, Thanks?</p> <pre><code>// using memory-mapped file template &lt;size_t VIEW_SIZE_FACTOR&gt; struct is_equal_by_mmapT { public: bool operator()(const path_type&amp; p1, const path_type&amp; p2) { using boost::filesystem::exists; using boost::filesystem::file_size; try { if(!(exists(p1) &amp;&amp; exists(p2))) return false; const size_t segment_size = mapped_file_source::alignment() * VIEW_SIZE_FACTOR; // lanmbda boost::function&lt;bool(size_t, size_t)&gt; segment_compare = [&amp;](size_t seg_size, size_t offset)-&gt;bool { using boost::iostreams::mapped_file_source; boost::chrono::run_timer t; mapped_file_source mf1, mf2; mf1.open(p1, seg_size, offset); mf2.open(p2, seg_size, offset); if(! (mf1.is_open() &amp;&amp; mf2.is_open())) return false; if(!equal (mf1.begin(), mf1.end(), mf2.begin())) return false; return true; }; boost::uintmax_t size = file_size(p1); size_t round = size / segment_size; size_t remainder = size &amp; ( segment_size - 1 ); // compare the remainder if(remainder &gt; 0) { cout &lt;&lt; "segment size = " &lt;&lt; remainder &lt;&lt; " bytes for the remaining round"; if(!segment_compare(remainder, segment_size * round)) return false; } //compare the main part. take much less time, even for(size_t i = 0; i &lt; round; ++i) { cout &lt;&lt; "segment size = " &lt;&lt; segment_size &lt;&lt; " bytes, round #" &lt;&lt; i; if(!segment_compare(segment_size, segment_size * i)) return false; } } catch(std::exception&amp; e) { cout &lt;&lt; e.what(); return false; } return true; } }; typedef is_equal_by_mmapT&lt;(8&lt;&lt;10)&gt; is_equal_by_mmap; // 512MB </code></pre> <p>output:</p> <p>segment size = 354410496 bytes for the remaining round</p> <p>real 116.892s, cpu 56.201s (48.1%), user 54.548s, system 1.652s</p> <p>segment size = 536870912 bytes, round #0</p> <p>real 72.258s, cpu 2.273s (3.1%), user 0.320s, system 1.953s</p> <p>segment size = 536870912 bytes, round #1</p> <p>real 75.304s, cpu 1.943s (2.6%), user 0.240s, system 1.702s</p> <p>segment size = 536870912 bytes, round #2</p> <p>real 84.328s, cpu 1.783s (2.1%), user 0.320s, system 1.462s</p> <p>segment size = 536870912 bytes, round #3</p> <p>real 73.901s, cpu 1.702s (2.3%), user 0.330s, system 1.372s </p> <hr> <h2>More observations after the suggestions by responders</h2> <p>Further split the remainder into body and tail(remainder = body + tail), where </p> <ul> <li>body = N * alignment(), and tail &lt; 1 * alignment()</li> <li>body = m * alignment(), and tail &lt; 1 * alignment() + n * alignment(), where m is even.</li> <li>body = m * alignment(), and tail &lt; 1 * alignment() + n * alignment(), where m is exponents of 2.</li> <li>body = N * alignment(), and tail = remainder - body. N is random.</li> </ul> <p>the total time remains unchanged, but I can see that the time does not necessary relate to tail, but to size of body and tail. the bigger part takes more time. The time is USER TIME, which is most incomprehensible to me.</p> <p>I also look at the pages faults through Procexp.exe. the remainder does NOT take more faults than the main loop. </p> <hr> <h2>Updates 2</h2> <p>I've performed some test on other workstations, and it seem the issue is related to the hardware configuration.</p> <h2>Test Code</h2> <pre><code>// compare the remainder, alternative way if(remainder &gt; 0) { //boost::chrono::run_timer t; cout &lt;&lt; "Remainder size = " &lt;&lt; remainder &lt;&lt; " bytes \n"; size_t tail = (alignment_size - 1) &amp; remainder; size_t body = remainder - tail; { boost::chrono::run_timer t; cout &lt;&lt; "Remainder_tail size = " &lt;&lt; tail &lt;&lt; " bytes"; if(!segment_compare(tail, segment_size * round + body)) return false; } { boost::chrono::run_timer t; cout &lt;&lt; "Remainder_body size = " &lt;&lt; body &lt;&lt; " bytes"; if(!segment_compare(body, segment_size * round)) return false; } } </code></pre> <h2>Observation:</h2> <p>On another 2 PCs with the same h/w configurations with mine, the result is consistent as following:</p> <p>------VS2010Beta1ENU_VSTS.iso [1319909376 bytes] ------</p> <p>Remainder size = 44840960 bytes </p> <p>Remainder_tail size = 14336 bytes</p> <p>real 0.060s, cpu 0.040s (66.7%), user 0.000s, system 0.040s</p> <p>Remainder_body size = 44826624 bytes</p> <p>real 13.601s, <strong><em>cpu 7.731s (56.8%), user 7.481s</em></strong>, system 0.250s</p> <p>segment size = 67108864 bytes, total round# = 19</p> <p>real 172.476s, cpu 4.356s (2.5%), user 0.731s, system 3.625s</p> <p>However, running the same code on a PC with a different h/w configuration yielded:</p> <p>------VS2010Beta1ENU_VSTS.iso [1319909376 bytes] ------ Remainder size = 44840960 bytes </p> <p>Remainder_tail size = 14336 bytes</p> <p>real 0.013s, cpu 0.000s (0.0%), user 0.000s, system 0.000s</p> <p>Remainder_body size = 44826624 bytes</p> <p>real 2.468s, <em>cpu 0.188s (7.6%), user 0.047s</em>, system 0.141s</p> <p>segment size = 67108864 bytes, total round# = 19</p> <p>real 65.587s, cpu 4.578s (7.0%), user 0.844s, system 3.734s</p> <h2>System Info</h2> <p>My workstation yielding imcomprehensible timing:</p> <p>OS Name: Microsoft Windows XP Professional</p> <p>OS Version: 5.1.2600 Service Pack 3 Build 2600</p> <p>OS Manufacturer: Microsoft Corporation</p> <p>OS Configuration: Member Workstation</p> <p>OS Build Type: Uniprocessor Free</p> <p>Original Install Date: 2004-01-27, 23:08</p> <p>System Up Time: 3 Days, 2 Hours, 15 Minutes, 46 Seconds</p> <p>System Manufacturer: Dell Inc. </p> <p>System Model: OptiPlex GX520 </p> <p>System type: X86-based PC</p> <p>Processor(s): 1 Processor(s) Installed.</p> <pre><code> [01]: x86 Family 15 Model 4 Stepping 3 GenuineIntel ~2992 Mhz </code></pre> <p>BIOS Version: DELL - 7</p> <p>Windows Directory: C:\WINDOWS</p> <p>System Directory: C:\WINDOWS\system32</p> <p>Boot Device: \Device\HarddiskVolume2</p> <p>System Locale: zh-cn;Chinese (China)</p> <p>Input Locale: zh-cn;Chinese (China)</p> <p>Time Zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi</p> <p>Total Physical Memory: 3,574 MB</p> <p>Available Physical Memory: 1,986 MB</p> <p>Virtual Memory: Max Size: 2,048 MB</p> <p>Virtual Memory: Available: 1,916 MB</p> <p>Virtual Memory: In Use: 132 MB</p> <p>Page File Location(s): C:\pagefile.sys</p> <p>NetWork Card(s): 3 NIC(s) Installed.</p> <pre><code> [01]: VMware Virtual Ethernet Adapter for VMnet1 Connection Name: VMware Network Adapter VMnet1 DHCP Enabled: No IP address(es) [01]: 192.168.75.1 [02]: VMware Virtual Ethernet Adapter for VMnet8 Connection Name: VMware Network Adapter VMnet8 DHCP Enabled: No IP address(es) [01]: 192.168.230.1 [03]: Broadcom NetXtreme Gigabit Ethernet Connection Name: Local Area Connection 4 DHCP Enabled: Yes DHCP Server: 10.8.0.31 IP address(es) [01]: 10.8.8.154 </code></pre> <p>Another workstation yielding "correct" timing: OS Name: Microsoft Windows XP Professional</p> <p>OS Version: 5.1.2600 Service Pack 3 Build 2600</p> <p>OS Manufacturer: Microsoft Corporation</p> <p>OS Configuration: Member Workstation</p> <p>OS Build Type: Multiprocessor Free</p> <p>Original Install Date: 5/18/2009, 2:28:18 PM</p> <p>System Up Time: 21 Days, 5 Hours, 0 Minutes, 49 Seconds</p> <p>System Manufacturer: Dell Inc.</p> <p>System Model: OptiPlex 755 </p> <p>System type: X86-based PC</p> <p>Processor(s): 1 Processor(s) Installed.</p> <pre><code> [01]: x86 Family 6 Model 15 Stepping 13 GenuineIntel ~2194 Mhz </code></pre> <p>BIOS Version: DELL - 15</p> <p>Windows Directory: C:\WINDOWS</p> <p>System Directory: C:\WINDOWS\system32</p> <p>Boot Device: \Device\HarddiskVolume1</p> <p>System Locale: zh-cn;Chinese (China)</p> <p>Input Locale: en-us;English (United States)</p> <p>Time Zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi</p> <p>Total Physical Memory: 3,317 MB</p> <p>Available Physical Memory: 1,682 MB</p> <p>Virtual Memory: Max Size: 2,048 MB</p> <p>Virtual Memory: Available: 2,007 MB</p> <p>Virtual Memory: In Use: 41 MB</p> <p>Page File Location(s): C:\pagefile.sys</p> <p>NetWork Card(s): 3 NIC(s) Installed.</p> <pre><code> [01]: Intel(R) 82566DM-2 Gigabit Network Connection Connection Name: Local Area Connection DHCP Enabled: Yes DHCP Server: 10.8.0.31 IP address(es) [01]: 10.8.0.137 [02]: VMware Virtual Ethernet Adapter for VMnet1 Connection Name: VMware Network Adapter VMnet1 DHCP Enabled: Yes DHCP Server: 192.168.154.254 IP address(es) [01]: 192.168.154.1 [03]: VMware Virtual Ethernet Adapter for VMnet8 Connection Name: VMware Network Adapter VMnet8 DHCP Enabled: Yes DHCP Server: 192.168.2.254 IP address(es) [01]: 192.168.2.1 </code></pre> <p>Any explanation theory? Thanks.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload