Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to delete almost duplicate files
    primarykey
    data
    text
    <p>Edit 2:</p> <p>Solved, see my answer waaaaaaay below.</p> <p>Edit:</p> <p>After banging my head a few times, I almost did it. Here's my (not cleaned up, you can tell I was troubleshooting a bunch of stuff) code:</p> <p><a href="http://pastebin.com/ve4Qkj2K" rel="nofollow">http://pastebin.com/ve4Qkj2K</a></p> <p>And here's the problem: It works sometimes and other times not so much. For example, it will work perfectly with some files, then leave one of the longest codes instead of the shortest one, and for others it will delete maybe 2 out of 5 duplicates, leaving 3 behind. If it just performed reliably, I might be able to fix it, but I don't understand the seemingly random behavior. Any ideas?</p> <h2>Original Post:</h2> <p>Just so you know, I'm just beginning with python, and I'm using python 3.3</p> <p>So here's my problem:</p> <p>Let's say I have a folder with about 5,000 files in it. Some of these files have very similar names, but different contents and possible different extensions. After a readable name, there is a code, always with a "(" or a "[" (no quotes) before it. The name and code are separated by a space. For example:</p> <pre><code> something (TZA).blah something [TZZ].another hello (YTYRRFEW).extension something (YJTR).another_ext </code></pre> <p>I'm trying to only get one of the something's.something, and delete the others. Another fact which may be important is that there are usually more than one code, such as "something (THTG) (FTGRR) [GTGEES!#!].yet_another_random_extension", all separated by spaces. Although it doesn't matter 100%, it would be best to save the one with the least codes.</p> <p>I made some (very very short) code to get a list of all files:</p> <pre><code> import glob files=[] files=glob.glob("*") </code></pre> <p>but after this I'm pretty much lost. Any help would be appreciated, even if it's just pointing me in the right direction!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload