StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
17423953
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2013-07-02T11:06:43.037
FavoriteCount
0
LastActivityDate
2013-07-03T12:56:18.513
LastEditDate
2013-07-03T12:56:18.513
LastEditorUserId
1955509
OwnerUserId
1955509
ParentId
17417885
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
Personally I wouldn't bother using <code>grep</code>, I'd simply use Python's own string filtering - however, that wasn't the question you asked. Since the filenames are remote and Python sees them as simply strings then we can't use any of Python's own file manipulation routines (e.g. <code>os.path.isdir()</code>). So, I think you have three basic approaches: <ol> <li>Split each string by slashes and use this to build your own representation of the filesystem tree in memory. Then, do a pass through the tree and only display leaf nodes (i.e. files).</li> <li>If you can assume that files within a directory are always listed immediately after that directory, then you can do a quick check against the previous entries to see if this entry is a file within one of those directories.</li> <li>Use meta-information from <code>rsync</code>.</li> </ol> I would suggest the third option. My experience with <code>rsync</code> is that it usually gives you full file information like this: <pre><code>drwxr-xr-x 4096 2013/06/14 17:19:13 tmp/t -rwxrwxr-x 14532 2013/06/14 17:17:23 tmp/t/a.out -rwxrwxr-x 14539 2013/06/14 17:19:13 tmp/t/static-order </code></pre> In your example I can't see any code which removes this additional information, and you could easily use this to filter out directories by looking for any line which starts with a <code>d</code> instead of a <code>-</code>. If you don't have this extended information, you'll need to do one of the other two. The first option is pretty simple - just split by slashes and then descend a standard tree structure, adding entries for directories and files which haven't been seen yet. Once all the entries have been parsed, you can traverse the tree and print out anything which is a node with no children. The second option is something more complicated, but more memory efficient, where you maintain a list of parent directories and check whether they're a prefix of the current item in the list. If so, you can be sure the previous one is a directory and the current one is a file, so you can mark the previous one as something not to show. You can also throw items off this list once you've recursed "out" of that directory, provided that <code>rsync</code> returns them in a predictable order. You have to make sure you only check for prefixes at slash boundaries (so <code>foo/dir</code> is not a parent of <code>foo/dir-bar</code>, but it is a parent of <code>foo/dir/bar</code>). Generally this approach is rather fiddly, and unless you're dealing with an awfully large directory tree then one of the other approaches is probably preferable. By the way, either of the pure string-based approaches also have the disadvantage that an empty directory will be indistinguishable from a file, since it's only the presence or absence of files within a directory which distinguishes them. This is another reason I suggest using the meta-information from <code>rsync</code>. EDIT As requested, an example using the <code>rsync</code> meta-data: <pre><code>import subprocess cmdline = ["rsync", "-e", "ssh", "-r", "user@host:/dir"] proc = subprocess.Popen(cmdline, stdout=subprocess.PIPE) for entry in proc.stdout: items = entry.strip().split(None, 4) if not items[0].startswith("d") and "." in items[4]: print items[4] </code></pre> In this example, I'm invoking <code>rsync</code> directly and having it use <code>ssh</code>, assuming that appropriate SSH keys are set up. I would strongly suggest using SSH keys instead of the <code>sshpass</code> utility - storing your passwords in plaintext is a really bad idea from a security perspective. You can always set up your keys with no passphrase if you're not worried about them being stolen. There are lots of pages which explain how to create SSH keys (<a href="http://www.debian.org/devel/passwordlessssh" rel="nofollow">this one</a>, for example). Replace <code>user</code>, <code>host</code> and <code>/dir</code> with your username on the remote machine, the hostname of the remote machine and the parent directory you wish to list on the remote machine (you can omit <code>/dir</code> if you want to list the user's home directory). Otherwise the code should run unmodified. If will print the path name of each file that it founds, skipping directories and items which don't contain a dot. If your dot filter was just an attempt to skip directories as well, you can omit 'and "." in items[4]'. EDIT 2 This example just prints the entries, but of course you'll presumably want to do something else. If you want to be really clever, you could write it as a generator which calls <code>yield</code> on the items as they crop up. I've got an example of this below, which also prints the items but you can see how it can be used to do anything else. I've also added some better error handling to make sure the use of <code>subprocess</code> can't deadlock: EDIT 3: I've updated this example to also include file size and modification time. This is based on what I get back from my <code>rsync</code> - if yours has a different format you might need to use different members from <code>items</code> or possible change the format string to <code>strptime()</code> to match the formats returned by your <code>rsync</code>. <pre><code>from datetime import datetime import os import subprocess def find_remote_files(hostspec): cmdline = ["rsync", "-e", "ssh", "-r", hostspec] with open(os.devnull, "w") as devnull: proc = subprocess.Popen(cmdline, stdout=subprocess.PIPE, stderr=devnull) try: for entry in proc.stdout: items = entry.strip().split(None, 4) if not items[0].startswith("d"): dt = datetime.strptime(" ".join(items[2:4]), "%Y/%m/%d %H:%M:%S") yield (int(items[1]), dt, items[4]) proc.wait() except: # On any exception, terminate process and re-raise exception. proc.terminate() proc.wait() raise for filesize, filedate, filename in find_remote_files("user@host:/dir"): print "Filename: %s" % (filename,) print "(%d bytes, modified %s)" % (filesize, filedate.strftime("%Y-%m-%d")) </code></pre> You should be able to paste the whole <code>find_remote_files()</code> function into your code and use it directly, if you like.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to modify this rsync command to find out the directory with '.' in Python?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCartroo
UserOwnerUserId
1. USCartroo
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow to modify this rsync command to find out the directory with '.' in Python?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.