Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This isn't an answer to your question -- so I'll make it CW -- but you might be interested in looking at the <a href="http://pandas.pydata.org/" rel="nofollow">pandas</a> library. It makes working with tabular data a lot more fun than it would be otherwise.</p> <p>First read in the data (I'm using your <code>NewCaseFile</code> here, which seems comma-delimited, so I called it <code>ncf.csv</code>):</p> <pre><code>&gt;&gt;&gt; import pandas as pd &gt;&gt;&gt; df = pd.read_csv("ncf.csv") &gt;&gt;&gt; df &lt;class 'pandas.core.frame.DataFrame'&gt; Int64Index: 932 entries, 0 to 931 Data columns (total 9 columns): zcta 932 non-null values xcoord 932 non-null values ycoord 932 non-null values m5064 932 non-null values m6574 932 non-null values m75plus 932 non-null values f5064 932 non-null values f6574 932 non-null values f75plus 932 non-null values dtypes: float64(1), int64(8) &gt;&gt;&gt; df.head() # look at the start of the frame zcta xcoord ycoord m5064 m6574 m75plus f5064 f6574 f75plus 0 51062 211253.4 4733175 0 0 1 0 0 0 1 51011 212255.6 4757939 0 0 1 0 0 0 2 51109 215303.5 4721048 0 1 7 0 1 2 3 51001 215651.1 4746655 1 0 4 0 1 0 4 51103 216887.7 4713568 4 9 28 1 1 8 </code></pre> <p>Use the x,y,zip columns as an index, and sum across the population columns:</p> <pre><code>&gt;&gt;&gt; df = df.set_index(["zcta", "xcoord", "ycoord"]) &gt;&gt;&gt; df["total"] = df.sum(axis=1) &gt;&gt;&gt; df.head() m5064 m6574 m75plus f5064 f6574 f75plus total zcta xcoord ycoord 51062 211253.4 4733175 0 0 1 0 0 0 1 51011 212255.6 4757939 0 0 1 0 0 0 1 51109 215303.5 4721048 0 1 7 0 1 2 11 51001 215651.1 4746655 1 0 4 0 1 0 6 51103 216887.7 4713568 4 9 28 1 1 8 51 </code></pre> <p>Sum by the columns:</p> <pre><code>&gt;&gt;&gt; df.sum() m5064 981 m6574 1243 m75plus 2845 f5064 1355 f6574 1390 f75plus 1938 total 9752 dtype: int64 </code></pre> <p>Et cetera. In particular, it makes it much easier to do many otherwise straightforward-to-explain-but-annoying-in-practice transformations. For example:</p> <pre><code>&gt;&gt;&gt; df = pd.read_csv("ncf.csv") &gt;&gt;&gt; d2 = pd.melt(df, id_vars=list(df.columns[:3])) &gt;&gt;&gt; d2["sex"] = d2["variable"].str[:1] &gt;&gt;&gt; d2["age_lower"] = d2["variable"].str[1:3].astype(float) &gt;&gt;&gt; d2["age_upper"] = d2["variable"].str[3:].replace("plus", 100).astype(float) &gt;&gt;&gt; del d2["variable"] &gt;&gt;&gt; d2.rename(columns={"value": "count"}, inplace=True) </code></pre> <p>gives:</p> <pre><code>&gt;&gt;&gt; d2.head() zcta xcoord ycoord count sex age_lower age_upper 0 51062 211253.4 4733175 0 m 50 64 1 51011 212255.6 4757939 0 m 50 64 2 51109 215303.5 4721048 0 m 50 64 3 51001 215651.1 4746655 1 m 50 64 4 51103 216887.7 4713568 4 m 50 64 &gt;&gt;&gt; d2.groupby("sex")["count"].sum() sex f 4683 m 5069 Name: count, dtype: int64 </code></pre> <p>and so on.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload