StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSort by function using bash/coreutils instead of perl
primarykey
Id
20861194
data
AcceptedAnswerId
20864235
AnswerCount
3
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2013-12-31T17:40:20.433
FavoriteCount
1
LastActivityDate
2015-01-24T21:49:35.763
LastEditDate
2014-01-02T19:36:43.550
LastEditorUserId
411316
OwnerUserId
411316
ParentId
0
PostTypeId
1
Score
8
ViewCount
396
LastEditorDisplayName
text
Body
I found out that if you sort a list of files by file extension rather than alphabetically before putting them in a tar archive, you can dramatically increase the compression ratio (especially for large source trees where you likely have lots of .c, .o, and .h files). I couldn't find an easy way to sort files using the shell that works in every case the way I'd expect. An easy solution such as <code>find | rev | sort | rev</code> does the job but the files appear in an odd order, and it doesn't arrange them as nicely for the best compression ratio. Other tools such as <code>ls -X</code> don't work with <code>find</code>, and <code>sort -t. -k 2,2 -k 1,1</code> messes up when files have more than one period in the filename (e.g. version-1.5.tar). Another quick-n-dirty option, using <code>sed</code> replaces the last period with a <code>/</code> (which never occurs in a filename), then sorts, splitting along the <code>/</code>: <pre><code>sed 's/$\.[^.]*$$/\/\1/' | sort -t/ -k 2,2 -k 1,1 | sed 's/\/$[^/]*$$/\1/' </code></pre> However, once again this doesn't work using the output from <code>find</code> which has <code>/</code>s in the names, and all other characters (other than 0) are allowed in filenames in *nix. I discovered that using Perl, you can write a custom comparison subroutine using the same output as <code>cmp</code> (similar to <code>strcmp</code> in C), and then run the perl sort function, passing your own custom comparison, which was easy to write with perl regular expressions. This is exactly what I did: I now have a perl script which calls <pre><code>@lines = <STDIN>; print sort myComparisonFunction @lines; </code></pre> However, perl is not as portable as bash, so I want to be able to do with with a shell script. In addition, <code>find</code> does not put a trailing / on directory names so the script thinks directories are the same as files without an extension. Ideally, I'd like to have <code>tar</code> read all the directories first, then regular files (and sort them), then symbolic links which I can achieve via <pre><code>cat <(find -type d) <(find -type f | perl exsort.pl) <(find -not -type d -and -not -type f) | tar --no-recursion -T - -cvf myfile.tar </code></pre> but I still run into the issue that either I have to type this monstrosity every time, or I have both a shell script for this long line AND a perl script for sorting, and perl isn't available everywhere so stuffing everything into one perl script isn't a great solution either. (I'm mainly focused on older computers, cause nowadays all modern Linux and OSX come with a recent enough version of perl). I'd like to be able to put everything together into one shell script, but I don't know how to pass a custom function to GNU sort tool. Am I out of luck, and have to use one perl script? Or can I do this with one shell script? EDIT: Thanks for the idea of a Schwartizan Transform. I used a slightly different method, using <code>sed</code>. My final sorting routine is as follows: <pre><code>sed 's_^$\([^/]*/$*\)$.*$$\.[^\./]*$$_\4/\3/\1_' | sed 's_^$\([^/]*/$*\)$[^\./]\+$$_/\3/\1_' | sort -t/ -k1,1 -k2,2 -k3,3 | sed 's_^$[^/]*$/$[^/]*$/$.*$$_\3\2\1_' </code></pre> This handles special characters (such as *) in filenames and places files without an extension first because they are often text files. (Makefile, COPYING, README, configure, etc.). P.S. In case anyone wants my original comparison function or think I could improve on it, here it is: <pre><code>sub comparison { my $first = $a; my $second = $b; my $fdir = $first =~ s/^(([^\/]*\/)*)([^\/]*)$/$1/r; my $sdir = $second =~ s/^(([^\/]*\/)*)([^\/]*)$/$1/r; my $fname = $first =~ s/^([^\/]*\/)*([^\/]*)$/$2/r; my $sname = $second =~ s/^([^\/]*\/)*([^\/]*)$/$2/r; my $fbase = $fname =~ s/^(([^\.]*\.)*)([^\.]*)$/$1/r; my $sbase = $sname =~ s/^(([^\.]*\.)*)([^\.]*)$/$1/r; my $fext = $fname =~ s/^([^\.]*\.)*([^\.]*)$/$2/r; my $sext = $sname =~ s/^([^\.]*\.)*([^\.]*)$/$2/r; if ($fbase eq "" && $sbase ne ""){ return -1; } if ($sbase eq "" && $fbase ne ""){ return 1; } (($fext cmp $sext) or ($fbase cmp $sbase)) or ($fdir cmp $sdir) } </code></pre>
Tags
<regex><perl><bash><shell><sorting>
Title
Sort by function using bash/coreutils instead of perl
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USLeo Izen
UserOwnerUserId
1. USLeo Izen
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSort by function using bash/coreutils instead of perl
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSort by function using bash/coreutils instead of perl
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POSort by function using bash/coreutils instead of perl
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.