Note that there are some explanatory texts on larger screens.

plurals
  1. POCheck if files have same name and store line count of files with same names
    primarykey
    data
    text
    <p>I'm relatively new to Python and I could really use some of you guys input.</p> <p>I have a script running which stores files in the following format:</p> <pre><code>201309030700__81.28.236.2.txt 201308240115__80.247.17.26.txt 201308102356__84.246.88.20.txt 201309030700__92.243.23.21.txt 201308030150__203.143.64.11.txt </code></pre> <p>Each file has has some lines of codes which I want to count total of it and then I want to store this. For example, I want to go through these files, if a file has the same date (first part of the file name) then I want to store that in the same file in the following format.</p> <pre><code>201309030700__81.28.236.2.txt has 10 lines 201309030700__92.243.23.21.txt has 8 lines </code></pre> <p>Create a file with the date 20130903 (the last 4 digits are time I don't want that). Create file: 20130903.txt Which has two lines of codes 10 8</p> <p>I have the following code but I'm not getting anywhere, please help.</p> <pre><code>import os, os.path asline = [] ipasline = [] def main(): p = './results_1/' np = './new/' fd = os.listdir(p) run(fd) def writeFile(fd, flines): fo = np+fd+'.txt' with open(fo, 'a') as f: r = '%s\t %s\n' % (fd, flines) f.write(r) def run(path): for root, dirs, files in os.walk(path): for cfile in files: stripFN = os.path.splitext(cfile)[0] fileDate = stripFN.split('_')[0] fileIP = stripFN.split('_')[-1] if cfile.startswith(fileDate): hp = 0 for currentFile in files.readlines()[1:]: hp += 1 writeFile(fdate, hp) </code></pre> <p>I tried to play around with this script:</p> <pre><code>if not os.path.exists(os.path.join(p, y)): os.mkdir(os.path.join(p, y)) np = '%s%s' % (datetime.now().strftime(FORMAT), path) if os.path.exists(os.path.join(p, m)): os.chdir(os.path.join(p, month, d)) np = '%s%s' % (datetime.now().strftime(FORMAT), path) </code></pre> <p>Where FORMAT has the following value </p> <blockquote> <p>20130903</p> </blockquote> <p>But I can't seem to get this to work.</p> <p>EDIT: I have modified the code as follows and it kinda does what I wanted to do but probably I'm doing things redundant and I still haven't taken into consideration that I'm processing huge number of files so maybe this isn't the most efficient way. Please have a look.</p> <pre><code>import re, os, os.path p = './results_1/' np = './new/' fd = os.listdir(p) star = "*" def writeFile(fd, flines): fo = './new/'+fd+'_v4.txt' with open(fo, 'a') as f: r = '%s\n' % (flines) f.write(r) for f in fd: pathN = os.path.join(p, f) files = open(pathN, 'r') fileN = os.path.basename(pathN) stripFN = os.path.splitext(fileN)[0] fileDate = stripFN.split('_')[0] fdate = fileDate[0:8] lnum = len(files.readlines()) writeFile(fdate, lnum) files.close() </code></pre> <p>At the moment it is writing to a file with new line for each number of lines counted on file. HOWEVER I have sorted this. I would appreciate some input, thank you very much.</p> <p>EDIT 2: Now I'm getting the output of each file with date as file name. The files now appear as:</p> <pre><code>20130813.txt 20130819.txt 20130825.txt </code></pre> <p>Each file now looks like:</p> <pre><code>15 17 18 21 14 18 14 13 17 11 11 18 15 15 12 17 9 10 12 17 14 17 13 </code></pre> <p>And it goes on for further 200+ lines each file. Ideally to now many times each occurrence happens and sorted with smallest number first would be the best desired outcome.</p> <p>I have tried something like:</p> <pre><code>import sys from collections import Counter p = '.txt' d = [] with open(p, 'r') as f: for x in f: x = int(x) d.append(x) d.sort() o = Counter(d) print o </code></pre> <p>Does this make sense?</p> <p>EDIT 3:</p> <p>I have the following script which count the unique for me but I'm still unable to sort by unique count.</p> <pre><code>import os from collections import Counter p = './newR' fd = os.listdir(p) for f in fd: pathN = os.path.join(p, f) with open(pathN, 'r') as infile: fileN = os.path.basename(pathN) stripFN = os.path.splitext(fileN)[0] fileDate = stripFN.split('_')[0] counts = Counter(l.strip() for l in infile) for line, count in counts.most_common(): print line, count </code></pre> <p>Has this the following results:</p> <pre><code>14 291 15 254 12 232 13 226 17 212 16 145 18 127 11 102 10 87 19 64 21 33 20 24 22 15 9 15 23 9 30 6 60 3 55 3 25 3 </code></pre> <p>The output should look like:</p> <pre><code>9 15 10 87 11 102 12 232 13 226 14 291 etc </code></pre> <p>What is the most efficient way of doing this?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload