Note that there are some explanatory texts on larger screens.

plurals
  1. PObenchmarks: does python have a faster way of walking a network folder?
    primarykey
    data
    text
    <p>I need to walk through a folder with approximately ten thousand files. My old vbscript is very slow in handling this. Since I've started using Ruby and Python since then, I made a benchmark between the three scripting languages to see which would be the best fit for this job.</p> <p>The results of the tests below on a subset of 4500 files on a shared network are</p> <pre><code>Python: 106 seconds Ruby: 5 seconds Vbscript: 124 seconds </code></pre> <p>That Vbscript would be slowest was no surprise but I can't explain the difference between Ruby and Python. Is my test for Python not optimal? Is there a faster way to do this in Python?</p> <p>The test for thumbs.db is just for the test, in reality there are more tests to do.</p> <p>I needed something that checks every file on the path and doesn't produce too much output to not disturb the timing. The results are a bit different each run but not by much.</p> <pre><code>#python2.7.0 import os def recurse(path): for (path, dirs, files) in os.walk(path): for file in files: if file.lower() == "thumbs.db": print (path+'/'+file) if __name__ == '__main__': import timeit path = '//server/share/folder/' print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1)) </code></pre> <pre class="lang-vb prettyprint-override"><code>'vbscript5.7 set oFso = CreateObject("Scripting.FileSystemObject") const path = "\\server\share\folder" start = Timer myLCfilename="thumbs.db" sub recurse(folder) for each file in folder.Files if lCase(file.name) = myLCfilename then wscript.echo file end if next for each subfolder in folder.SubFolders call Recurse(subfolder) next end Sub set folder = oFso.getFolder(path) recurse(folder) wscript.echo Timer-start </code></pre> <pre><code>#ruby1.9.3 require 'benchmark' def recursive(path, bench) bench.report(path) do Dir["#{path}/**/**"].each{|file| puts file if File.basename(file).downcase == "thumbs.db"} end end path = '//server/share/folder/' Benchmark.bm {|bench| recursive(path, bench)} </code></pre> <p>EDIT: since i suspected the print caused a delay i tested the scripts with printing all 4500 files and also printing none, the difference remains, R:5 P:107 in the first case and R:4.5 P:107 in the latter</p> <p>EDIT2: based on the answers and comments here a Python version that in some cases could run faster by skipping folders</p> <pre><code>import os def recurse(path): for (path, dirs, files) in os.walk(path): for file in files: if file.lower() == "thumbs.db": print (path+'/'+file) def recurse2(path): for (path, dirs, files) in os.walk(path): for dir in dirs: if dir in ('comics'): dirs.remove(dir) for file in files: if file.lower() == "thumbs.db": print (path+'/'+file) if __name__ == '__main__': import timeit path = 'f:/' print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1)) #6.20102692 print(timeit.timeit('recurse2("'+path+'")', setup="from __main__ import recurse2", number=1)) #2.73848228 #ruby 5.7 </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload