Note that there are some explanatory texts on larger screens.

plurals
  1. POPython - CSV Module, Getting Information From a File
    primarykey
    data
    text
    <p>Here is the situation:</p> <p>The first problem I'm having is with obtaining information from a CSV file. The purpose of the code I'm writing is to get a bunch of information on ZCTAs (zip codes), for a number of different cohorts (there are six currently being used, but the code is meant to be flexible to have any number of cohorts). One file contains the population, by cohort, for each ZCTA. Another file has the number of 'cases' (cases of cancer observed) for each cohort, for each ZCTA. Another file has the crude rate for each cohort, for the state of Iowa (the focus of this research), for the rate at which one can 'expect' to see the number of people who have cancer, for a population, by cohort. There are a couple of other files, but these are the focus, as this is where my issue is exhibited.</p> <p>What my code does, initially, is to read the population file and get the population of each cohort by ZCTA. Each ZCTA, and the information, is stored in a list, which is then stored in a list of lists (nested), containing all of the ZCTAs. The code then gets the crude rate. Then, the crude rate is taken times the appropriate cohort, for each ZCTA and summed with all of the other cohorts within each ZCTA, to get the total number of people we can EXPECT to see having cancer, for each ZCTA. The population is also summed up. This information is stored in a another list, as well as a list containing all of the ZCTAs. This information will be the focus (The list of all of the ZCTAs, which each contain the total population and the total number of expected cases).</p> <p>So, the problem is that I then need to take this newly acquired list and get the number of OBSERVED cases, for each cohort, sum those together, append it to the appropriate ZCTA and write it to a new file. I have code implemented that does this fine, EXCEPT that the bottom 22 or so ZCTAs don't get the number of observed cases. I don't know if it is the code, or what, but it works for all of the other 906, but doesn't get the bottom 22.</p> <p>The reader will find sample data for the files I've discussed (the observed case file, and the output file) at: <a href="https://gist.github.com/FortyLashes/c90608c3e938587ebd11" rel="nofollow">Gist</a></p> <p>Here is the code I'm using:</p> <pre><code>`expectedcsv = open('ExpectedCases.csv', 'w', newline= '') expectedwriter = csv.writer(expectedcsv, delimiter = ',') expectedHeader = ['zcta', 'expected', 'pop', 'observed'] thecasesreader = csv.reader(thecasescsv, delimiter = ',') for zcta in zctaPop: caseCounter = 0 thecasescsv = open('NewCaseFile.csv', 'r', newline = '') thecasesreader = csv.reader(thecasescsv, delimiter = ',') for case in thecasesreader: if case[0] == zcta[0]: for i in range(3, len(case)): caseCounter += int(case[i]) zcta.append(caseCounter) expectedwriter.writerow(zcta) expectedcsv.close() thecasescsv.close()` </code></pre> <p>Something else I would also like to bring up is that later on in the code, the actual purpose for all of this, is to create an SMR filter, for each grid point. The grid points are somewhat arbitrary they have been placed (via coordinates) over the entire state of Iowa. The SMR is the number of observed divided by the number of expected cases. The threshold, that is, how many expected cases for a particular filter, is set by the user. So, if a user wants a filter created on 150 expected cases (for each grid point), the code goes through each ZCTA, summing up the expected cases until greater than 150 are found. The distance to this last ZCTA is the 'radius' of the filter.</p> <p>To do this, I built a distance matrix (the distance from each grid point to every ZCTA) and then sorted it, nearest to furthest. Because of the size of the file (2300 X 930), I have to read this file line by line and get all of the information from other files. So, starting with the nearest ZCTA, I get the population, expected cases, and observed cases (the problem with this file was discussed above) and add these each to their respective counter (one for population, one for observed and one for expected). Then it goes to the next closest ZCTA and does the same, until the the threshold is exceeded.</p> <p>The problem here is that I couldn't use the CSV Module to read these files, as I was already reading from another file and the index would be lost. So, I had to use just the regular <code>filename.read()</code>, which then required some interesting use of <code>maketrans</code> and <code>.translate</code>. I'm not sure its efficient or works great. Everything seems to be fine, but without the above problem being fixed, it's impossible to tell. I have included the code below, but was wondering if anybody had any better ideas/suggestions?</p> <pre><code>`expectedCSV = open('ExpectedCases.csv', 'r', newline = '') table = str.maketrans('\r', ' ') content = expectedCSV.read() expectedCSV.close() content = content.translate(table) content = content.split(sep = '\n') newContent = [] for item in content: newContent.append((item.split(sep= ','))) content = ' ' for item in newContent: if item[0] == currentZcta: expectedTotal += (float(item[1])) totalPop += (float(item[2])) totalObservedCount += (float(item[3]))` </code></pre> <p>Also, I couldn't figure out how to color the methods blue and the variables red, as some of the more awesome users of this site do. I would be very much interested in learning how to do that for future posts.</p> <p>If anybody needs more info or anything clarified to help answer/formulate a solution, please, by all means, ask! Thanks for taking the time to read!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload