Note that there are some explanatory texts on larger screens.

plurals
  1. PORemove rows from data: overlapping time intervals?
    primarykey
    data
    text
    <p>Edit: I'm looking for solution for this question now also with other programming languages.</p> <p>Based on the <a href="https://stackoverflow.com/questions/9003333/how-to-count-the-number-of-concurrent-users-using-time-interval-data">other question I asked</a>, I have a dataset like this (for R users, dput for this below) which represents user computer sessions:</p> <pre><code> username machine start end 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27 2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43 10 USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53 11 USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22 12 USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43 13 USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38 14 USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43 15 USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47 &gt; </code></pre> <p>There may be several concurrent (overlapping based on time) sessions for the same username from the the same computer. How can I remove those rows so that only one sessions is left for this data? Original data set has approx. 500 000 rows.</p> <p>The expected output is (rows 2, 15 removed)</p> <pre><code> username machine start end 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43 10 USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53 11 USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22 12 USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43 13 USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38 14 USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43 &gt; </code></pre> <p>Here is the dataset:</p> <pre><code>structure(list(username = c("user1", "user1", "user1", "user1", "user1", "user1", "user1", "user1", "user1", "USER2", "USER2", "USER2", "USER2", "USER2", "USER2" ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D5599.domain.com", "D5645.domain.com", "D5682.domain.com", "D5686.domain.com", "D5694.domain.com", "D5696.domain.com", "D5772.domain.com", "D5772.domain.com", "D5847.domain.com", "D5855.domain.com", "D5871.domain.com", "D5927.domain.com", "D5927.domain.com", "D5952.domain.com", "D5993.domain.com", "D6012.domain.com", "D6048.domain.com", "D6077.domain.com", "D5688.domain.com", "D5815.domain.com", "D6106.domain.com", "D6128.domain.com" ), class = "factor"), start = structure(c(1294040658, 1294040789, 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561, 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580, 1297688363), class = c("POSIXct", "POSIXt"), tzone = ""), end = structure(c(1294040847, 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342, 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878, 1297688863, 1297689287), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("username", "machine", "start", "end"), row.names = c(NA, 15L), class = "data.frame") </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload