Note that there are some explanatory texts on larger screens.

plurals
  1. POR extract time components from semi-standard strings
    primarykey
    data
    text
    <h3>Setup</h3> <p>I have a column of durations stored as a strings in a dataframe. I want to convert them to an appropriate time object, probably <a href="http://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html" rel="nofollow noreferrer">POSIXlt</a>. Most of the strings are easy to parse using <a href="https://stackoverflow.com/questions/9022908/convert-list-vector-of-strings-with-date-format-into-posix-date-class-with-r">this method</a>:</p> <pre><code>&gt; data &lt;- data.frame(time.string = c( + "1 d 2 h 3 m 4 s", + "10 d 20 h 30 m 40 s", + "--")) &gt; data$time.span &lt;- strptime(data$time.string, "%j d %H h %M m %S s") &gt; data$time.span [1] "2012-01-01 02:03:04" "2012-01-10 20:30:40" NA </code></pre> <p>Missing durations are coded <code>"--"</code> and need to be converted to <code>NA</code> - this already happens but should be preserved.</p> <p>The challenge is that <em>the string drops zero-valued elements</em>. Thus the desired value <code>2012-01-01 02:00:14</code> would be the string <code>"1 d 2 h 14 s"</code>. However this string parses to <code>NA</code> with the simple parser:</p> <pre><code>&gt; data2 &lt;- data.frame(time.string = c( + "1 d 2 h 14 s", + "10 d 20 h 30 m 40 s", + "--")) &gt; data2$time.span &lt;- strptime(data2$time.string, "%j d %H h %M m %S s") &gt; data2$time.span [1] NA "2012-01-10 20:30:40" NA </code></pre> <h3>Questions</h3> <ol> <li>What is the "R Way" to handle all the possible string formats? Perhaps test for and extract each element individually, then recombine?</li> <li>Is POSIXlt the right target class? I need duration free from any specific start time, so the addition of false year and month data (<code>2012-01-</code>) is troubling.</li> </ol> <h3>Solution</h3> <p>@mplourde definitely had the right idea w/ dynamic creation of a formatting string based on testing various conditions in the date format. The addition of <code>cut(Sys.Date(), breaks='years')</code> as the baseline for the <code>datediff</code> was also good, but failed to account for a critical quirk in <code>as.POSIXct()</code> <em>Note: I'm using R2.11 base, this may have been fixed in later versions</em>.</p> <p>The output of <code>as.POSIXct()</code> changes dramatically depending on whether or not a date component is included:</p> <pre><code>&gt; x &lt;- "1 d 1 h 14 m 1 s" &gt; y &lt;- "1 h 14 m 1 s" # Same string, no date component &gt; format (x) # as specified below [1] "%j d %H h %M m %S s" &gt; format (y) [1] "% H h % M %S s" &gt; as.POSIXct(x,format=format) # Including the date baselines at year start [1] "2012-01-01 01:14:01 EST" &gt; as.POSIXct(y,format=format) # Excluding the date baselines at today start [1] "2012-06-26 01:14:01 EDT" </code></pre> <p>Thus the second argument for the <code>difftime</code> function should be:</p> <ul> <li>The start of the first day of the current year if the input string <em>has</em> a day component</li> <li>The start of the <em>current</em> day if the input string <em>does not</em> have a day component</li> </ul> <p>This can be accomplished by changing the unit parameter on the <code>cut</code> function:</p> <pre><code>parse.time &lt;- function (x) { x &lt;- as.character (x) break.unit &lt;- ifelse(grepl("d",x),"years","days") # chooses cut() unit format &lt;- paste(c(if (grepl("d", x)) "%j d", if (grepl("h", x)) "%H h", if (grepl("m", x)) "%M m", if (grepl("s", x)) "%S s"), collapse=" ") if (nchar(format) &gt; 0) { difftime(as.POSIXct(x, format=format), cut(Sys.Date(), breaks=break.unit), units="hours") } else {NA} } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload