Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Personally I wouldn't bother using <code>grep</code>, I'd simply use Python's own string filtering - however, that wasn't the question you asked.</p> <p>Since the filenames are remote and Python sees them as simply strings then we can't use any of Python's own file manipulation routines (e.g. <code>os.path.isdir()</code>). So, I think you have three basic approaches:</p> <ol> <li><p>Split each string by slashes and use this to build your own representation of the filesystem tree in memory. Then, do a pass through the tree and only display leaf nodes (i.e. files).</p></li> <li><p>If you can assume that files within a directory are always listed immediately after that directory, then you can do a quick check against the previous entries to see if this entry is a file within one of those directories.</p></li> <li><p>Use meta-information from <code>rsync</code>.</p></li> </ol> <p>I would suggest the third option. My experience with <code>rsync</code> is that it usually gives you full file information like this:</p> <pre><code>drwxr-xr-x 4096 2013/06/14 17:19:13 tmp/t -rwxrwxr-x 14532 2013/06/14 17:17:23 tmp/t/a.out -rwxrwxr-x 14539 2013/06/14 17:19:13 tmp/t/static-order </code></pre> <p>In your example I can't see any code which removes this additional information, and you could easily use this to filter out directories by looking for any line which starts with a <code>d</code> instead of a <code>-</code>.</p> <p>If you don't have this extended information, you'll need to do one of the other two. The first option is pretty simple - just split by slashes and then descend a standard tree structure, adding entries for directories and files which haven't been seen yet. Once all the entries have been parsed, you can traverse the tree and print out anything which is a node with no children.</p> <p>The second option is something more complicated, but more memory efficient, where you maintain a list of parent directories and check whether they're a prefix of the current item in the list. If so, you can be sure the previous one is a directory and the current one is a file, so you can mark the previous one as something not to show. You can also throw items off this list once you've recursed "out" of that directory, provided that <code>rsync</code> returns them in a predictable order. You have to make sure you only check for prefixes at slash boundaries (so <code>foo/dir</code> is not a parent of <code>foo/dir-bar</code>, but it is a parent of <code>foo/dir/bar</code>). Generally this approach is rather fiddly, and unless you're dealing with an awfully large directory tree then one of the other approaches is probably preferable.</p> <p>By the way, either of the pure string-based approaches also have the disadvantage that an empty directory will be indistinguishable from a file, since it's only the presence or absence of files within a directory which distinguishes them. This is another reason I suggest using the meta-information from <code>rsync</code>.</p> <p><strong>EDIT</strong></p> <p>As requested, an example using the <code>rsync</code> meta-data:</p> <pre><code>import subprocess cmdline = ["rsync", "-e", "ssh", "-r", "user@host:/dir"] proc = subprocess.Popen(cmdline, stdout=subprocess.PIPE) for entry in proc.stdout: items = entry.strip().split(None, 4) if not items[0].startswith("d") and "." in items[4]: print items[4] </code></pre> <p>In this example, I'm invoking <code>rsync</code> directly and having it use <code>ssh</code>, assuming that appropriate SSH keys are set up. I would strongly suggest using SSH keys instead of the <code>sshpass</code> utility - storing your passwords in plaintext is a really bad idea from a security perspective. You can always set up your keys with no passphrase if you're not worried about them being stolen. There are lots of pages which explain how to create SSH keys (<a href="http://www.debian.org/devel/passwordlessssh" rel="nofollow">this one</a>, for example).</p> <p>Replace <code>user</code>, <code>host</code> and <code>/dir</code> with your username on the remote machine, the hostname of the remote machine and the parent directory you wish to list on the remote machine (you can omit <code>/dir</code> if you want to list the user's home directory). Otherwise the code should run unmodified. If will print the path name of each file that it founds, skipping directories and items which don't contain a dot. If your dot filter was just an attempt to skip directories as well, you can omit 'and "." in items[4]'.</p> <p><strong>EDIT 2</strong></p> <p>This example just prints the entries, but of course you'll presumably want to do something else. If you want to be really clever, you could write it as a generator which calls <code>yield</code> on the items as they crop up. I've got an example of this below, which also prints the items but you can see how it can be used to do anything else. I've also added some better error handling to make sure the use of <code>subprocess</code> can't deadlock:</p> <p><strong>EDIT 3:</strong> <em>I've updated this example to also include file size and modification time. This is based on what I get back from my <code>rsync</code> - if yours has a different format you might need to use different members from <code>items</code> or possible change the format string to <code>strptime()</code> to match the formats returned by your <code>rsync</code>.</em></p> <pre><code>from datetime import datetime import os import subprocess def find_remote_files(hostspec): cmdline = ["rsync", "-e", "ssh", "-r", hostspec] with open(os.devnull, "w") as devnull: proc = subprocess.Popen(cmdline, stdout=subprocess.PIPE, stderr=devnull) try: for entry in proc.stdout: items = entry.strip().split(None, 4) if not items[0].startswith("d"): dt = datetime.strptime(" ".join(items[2:4]), "%Y/%m/%d %H:%M:%S") yield (int(items[1]), dt, items[4]) proc.wait() except: # On any exception, terminate process and re-raise exception. proc.terminate() proc.wait() raise for filesize, filedate, filename in find_remote_files("user@host:/dir"): print "Filename: %s" % (filename,) print "(%d bytes, modified %s)" % (filesize, filedate.strftime("%Y-%m-%d")) </code></pre> <p>You should be able to paste the whole <code>find_remote_files()</code> function into your code and use it directly, if you like.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload