Note that there are some explanatory texts on larger screens.

plurals
  1. POR subset ordered factor
    primarykey
    data
    text
    <p>I have the following data.frame which reports various data on countries over various years. The data is disaggregated by urban/rural, urban slum/urban non-slum, and capital city/other urban centres. Sadly the data is a bit patchy so not every country has data reported in the same year, and across all indicators.</p> <p>I am trying to subset the data to produce some plots to compare the most recent data from every country. I've created a column in the data.frame called 'latest' which reports whether a row is the most recent year. However, when I am trying to compare say slum/non-slum - the data available is not always the most recent. I'd therefore like to create a subset which looks to see whether data is present in a given row, if it isn't, I'd like to select data from the next most recent year.</p> <p>I have a feeling this could be achieved by using the order of the factored variable 'Year' but no idea how to go about this. I can select only the rows with data in them, but this gives me multiple entries for each country as follows:</p> <pre><code>fever[(fever$Non.slum!='NA'),] </code></pre> <p>Produces this:</p> <pre><code> COUNTRY Year Urban Rural Total Capital.City Other.Cities..towns Non.slum Slum latest NA &lt;NA&gt; &lt;NA&gt; NA NA NA NA NA NA NA &lt;NA&gt; NA.1 &lt;NA&gt; &lt;NA&gt; NA NA NA NA NA NA NA &lt;NA&gt; 3 Ethiopia 2011 14.78709 16.03735 15.87641 11.86713 15.28213 10.6 15.4 y 4 Ethiopia 2005 16.00000 18.90000 16.90000 15.70000 16.10000 15.0 16.1 n 5 Ethiopia 2000 22.38637 25.18128 24.90000 19.49970 22.86689 19.9 22.6 n 6 Kenya 2008/9 20.71574 22.58868 22.20000 16.99561 22.39136 19.2 21.8 y 7 Kenya 2003 39.78713 40.75334 40.56866 38.45388 40.49664 31.7 42.8 n 8 Kenya 1998 41.67932 42.44481 42.30155 38.79310 43.27112 36.3 43.7 n NA.2 &lt;NA&gt; &lt;NA&gt; NA NA NA NA NA NA NA &lt;NA&gt; 10 Lesotho 2009 12.93654 16.22281 17.90000 13.25208 12.69136 11.7 13.6 y </code></pre> <p>So what I need is a function to select only those rows where data exists in the Slum/Non.Slum column, but only a single entry per COUNTRY based on the most recent data available.</p> <p>I've searched through the forum to try and find an answer but not getting very far:(</p> <p>Can anyone offer any handy advice?</p> <p>Thanks</p> <p>p.s. here's my data:</p> <pre><code>structure(list(COUNTRY = structure(c(1L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L), .Label = c("Comoros", "Eritrea", "Ethiopia", "Kenya", "Lesotho", "Madagascar", "Malawi", "Namibia", "Rwanda", "Swaziland", "Tanzania", "Uganda", "Zambia", "Zimbabwe"), class = "factor"), Year = structure(c(5L, 12L, 25L, 16L, 9L, 22L, 13L, 7L, 2L, 23L, 15L, 22L, 14L, 6L, 1L, 24L, 15L, 9L, 1L, 13L, 6L, 19L, 9L, 1L, 24L, 21L, 16L, 9L, 1L, 19L, 24L, 21L, 15L, 8L, 5L, 1L, 18L, 10L, 4L, 20L, 11L, 5L, 1L, 24L, 17L, 8L, 3L), .Label = c("1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2000/1", "2001/2", "2002", "2003", "2003/4", "2004", "2005", "2005/6", "2006", "2006/7", "2007", "2007/8", "2008/9", "2009", "2010", "2011"), class = "factor"), Urban = c(47.8, 24.2, 14.7870851371451, 16, 22.3863741902043, 20.7157413622361, 39.7871349722997, 41.6793203690612, 35.8582154033059, 12.9365423294414, 19.8478428266605, 11.9207464780274, 18.5676950229307, 27.6081260825543, 22.9, 30.7, 29.8676525754328, 28.8769863995411, 36.7350808997634, 23.6685395495197, 43.5924904552921, 15.3818829930695, 20.9, 28.4927963558185, 16.7, 18.1130004296917, 25.3, 19, 32.2, 17.6, 29.7, 20.7, 22.5, 29.6313219134818, 30.5, 31.6001273852453, 25, 32.9, 35.2, 16.3, 33.1, 38.1, 33.7, 8.65178666948846, 7.3, 22.6, 34.5), Rural = c(47.6, 32.7, 16.0373484732733, 18.9, 25.181276309133, 22.5886832681651, 40.7533401938621, 42.4448145032958, 38.5298174751626, 16.2228067346473, 26.465049129342, 8.41898094643249, 19.2257425400682, 29.635864119259, 27.7, 35.1, 38.283749104983, 37.0204386868532, 40.4553536902836, 23.6050855848523, 38.7593744908809, 16.2668541968914, 18.7, 36.7752452450324, 15.6, 20.1615269604521, 26.4, 31, 42.1, 30.3, 21.2, 18.4, 24.9, 31.338473181485, 30.3, 26.3897272106662, 43, 45.3, 47.8, 18.5, 47.6, 41.4, 51.8, 10.1757289609584, 7.6, 27.3, 41.5), Total = c(47.6, 29.8, 15.8764113925424, 16.9, 24.9, 22.2, 40.5686598193627, 42.3015496943942, 38.2, 17.9, 25.5279161214695, 8.8, 19.1, 29.2, 27.1, 34.5, 37.1294935260729, 36, 40.0371752616418, 23.6, 39.8, 15.9, 19.4, 34.0357824734553, 15.8, 19.9, 26.2, 29.1, 41.6, 27.5, 22.9, 18.8, 24.4, 31, 30.3, 27.5, 40.9, 43.9, 46.3, 17.8, 43.1, 40.1, 43.2, 9.72279457486365, 7.5, 25.8, 39.7), Capital.City = c(62.5, 19.3, 11.8671319871973, 15.7, 19.4996995263646, 16.99560676463, 38.4538776537224, 38.7931034482758, 34.1584158415842, 13.2520773874409, 12.7659574468085, 15.6873992936943, 14.9619843565563, 20.3036710627491, 19.5, NA, 32.011454861578, 26.1111111111111, 33.2046332046333, 27.514648271213, 43.2946409100591, 17.6134198692098, 21.2, 28.4927963558185, 17.4, 16.0904522908004, 26.6, 22.7, 31.5, 13.4, NA, NA, 26.1, 29.1331564646512, 29.4, 33.3949166628871, 18.9, 29.8, 30.5, 11, 32.3, 38.7, 26.3, 7.0408031903801, 9.8, 26.7, 38.5), Other.Cities..towns = c(43.7, 27.7, 15.2821312519876, 16.1, 22.8668864784677, 22.3913621285115, 40.4966412598569, 43.2711212401314, 36.8054151596036, 12.6913635462951, 25.601742942828, 9.93629083208327, 20.5991643631925, 31.5220710266449, NA, NA, 28.6811759983293, 30.211847684812, 38.1705931383969, 22.8969512882811, 43.6420373588114, 13.7990084372002, 20.4, 28.6992700604113, NA, 19.6243357194177, 24.4, 17.5, 33, 22.1, NA, NA, 21.5, 29.7793073823722, 30.9, 31.1616047201402, 29.9, 35.6, 39.4, 18.5, 33.4, 37.7, 36.3, 10.2081343223745, 5.3, 18.9, 30), Non.slum = c(NA, NA, 10.6, 15, 19.9, 19.2, 31.7, 36.3, NA, 11.7, 18.8, NA, NA, NA, NA, NA, 24.7, 25.3, 33.7, 13.6, 35.1, 15.6, 19.4, 36.8, NA, 15.4, 21.1, NA, 21.6, 18.1, NA, NA, 21.6, 35.7, 24, 32, 15.9, 30.8, 28.4, 13.9, 28.5, 35.7, 29.3, 7.6, 7.2, 21.8, 29.7), Slum = c(NA, NA, 15.4, 16.1, 22.6, 21.8, 42.8, 43.7, NA, 13.6, 21.3, NA, NA, NA, NA, NA, 31.4, 31, 37, 24.2, 44.8, 15.1, 23, 34, NA, 18.8, 27, NA, 34.9, 17.1, NA, NA, 22.8, 26.5, 32.3, 31.5, 28.5, 34.2, 36, 17.9, 38.6, 39.3, 35.7, 10.1, 7.4, 31.4, 41.1), latest = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("n", "y"), class = "factor")), .Names = c("COUNTRY", "Year", "Urban", "Rural", "Total", "Capital.City", "Other.Cities..towns", "Non.slum", "Slum", "latest"), row.names = c(NA, -47L), class = "data.frame") </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload