Note that there are some explanatory texts on larger screens.

plurals
  1. POWriting one file per group in Pig Latin
    text
    copied!<p><strong>The Problem:</strong> I have numerous files that contain Apache web server log entries. Those entries are not in date time order and are scattered across the files. I am trying to use Pig to read a day's worth of files, group and order the log entries by date time, then write them to files named for the day and hour of the entries it contains.</p> <p><strong>Setup:</strong> Once I have imported my files, I am using Regex to get the date field, then I am truncating it to hour. This produces a set that has the record in one field, and the date truncated to hour in another. From here I am grouping on the date-hour field.</p> <p><strong>First Attempt:</strong> My first thought was to use the STORE command while iterating through my groups using a FOREACH and quickly found out that is not cool with Pig.</p> <p><strong>Second Attempt:</strong> My second try was to use the MultiStorage() method in the piggybank which worked great until I looked at the file. The problem is that MulitStorage wants to write all fields to the file, including the field I used to group on. What I really want is just the original record written to the file.</p> <p><strong>The Question:</strong> So...am I using Pig for something it is not intended for, or is there a better way for me to approach this problem using Pig? Now that I have this question out there, I will work on a simple code example to further explain my problem. Once I have it, I will post it here. Thanks in advance.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload