Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I more efficiently search a large list in python?
    primarykey
    data
    text
    <p>Problem: I am working with a very large data set that I need to iterate over. Every five minutes my program adds about 1300 rows of information each with 4 columns. This means that in the course of one day it gathers about 374,400 rows of information or 1,497,600 cells per day. There are 1300 rows because there are 1300 items that the program is tracking every five minutes. For example:</p> <pre><code>Item_Name Price Quantity_in_Stock Maximum_Stock_Level ---------- Soap 1.00 10 10 Frogs 1.25 12 16 Pickled Yogurt 1.35 7 8 Malodorous Ooze 6.66 6 66 </code></pre> <p>I'm trying to tally the changes over the course of the day in the stock levels of each unique item. My current technique pulls the entire data set from a mysql server. I rely upon the item name, the stock level, the maximum stock, and the observation date:</p> <pre><code>q = """SELECT Item_Name,Item_In_Stock,Item_Max,Observation_Date FROM DB WHERE Observation_Date&gt;DATE_ADD(curdate(),INTERVAL -1 DAY) """ try: x.execute(q) conn.commit() valueValue= x.fetchall() # The entire data set except: conn.rollback() </code></pre> <p>Then I iterate through each Item_Name and for each item I find all matching values:</p> <pre><code>for item in ItemNames: matching = [s for s in valueValue if item[0] in s] # item[0] is an item name, i.e. Soap, Frogs, Pickled Yogurt, etc. </code></pre> <p>After that, I want to find out the number of items purchased for that day. This is tricky because items are restocked and therefore I have to compare each time interval against the last to see if there is any change in stock level (I can't just compare the beginning and end):</p> <pre><code>for item in matching: if not tempValue: tempValue = item[1] #for first row, set value equal to first row if tempValue &gt; item[1]: #if last row greater than current row buyCount = buyCount + (item[1]-tempValue) # Add the different to the buyCount (volume sold) tempValue = item[1] #set tempValue for next row comparison </code></pre> <p>This method works, but it is fairly slow. I've timed it at about 2.2 seconds per unique item (out of the 1300) tallying iteration. This means that the entire day takes about 50 minutes to calculate. I'd like to cut down on this time if possible. What can I do to improve on this searching and tallying function?</p> <p>EDIT: I've tried letting MySQL do the work with the following code, but it is actually slower than using python to sort through it all:</p> <pre><code>for item in getnameValues: # for each item name execute the following query q = """SELECT Item_Name,Item_In_Stock,Item_Max,Observation_Date FROM DB WHERE Item_Name=%s and Observation_Date&gt;DATE_ADD(curdate(),INTERVAL -1 DAY) """ try: x.execute(q,item[0]) # executes the query for the current item conn.commit() valueValue= x.fetchall() </code></pre> <p>I'm assuming I need a way to loop through all the items within MySQL, and then have it send a list of lists back to python. Right?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload