Note that there are some explanatory texts on larger screens.

plurals
  1. POWriting Efficient Queries in SAS Using Proc sql with Teradata
    primarykey
    data
    text
    <p>EDIT: Here is a more complete set of code that shows exactly what's going on per the answer below.</p> <pre><code>libname output '/data/files/jeff' %let DateStart = '01Jan2013'd; %let DateEnd = '01Jun2013'd; proc sql; CREATE TABLE output.id AS ( SELECT DISTINCT id FROM mydb.sale_volume AS sv WHERE sv.category IN ('a', 'b', 'c') AND sv.trans_date BETWEEN &amp;DateStart AND &amp;DateEnd ) CREATE TABLE output.sums AS ( SELECT id, SUM(sales) FROM mydb.sale_volue AS sv INNER JOIN output.id AS ids ON ids.id = sv.id WHERE sv.trans_date BETWEEN &amp;DateStart AND &amp;DateEnd GROUP BY id ) run; </code></pre> <p>The goal is to simply query the table for some id's based on category membership. Then I sum these members' activity across all categories.</p> <p>The above approach is far slower than:</p> <ol> <li>Running the first query to get the subset</li> <li>Running a second query the sums every ID</li> <li>Running a third query that inner joins the two result sets.</li> </ol> <p>If I'm understanding correctly, it may be more efficient to make sure that all of my code is completely passed through rather than cross-loading.</p> <hr> <p>After posting a question yesterday, a member suggested I might benefit from asking a separate question on performance that was more specific to my situation. </p> <p>I'm using SAS Enterprise Guide to write some programs/data queries. I don't have permissions to modify the underlying data, which is stored in 'Teradata'. </p> <p>My basic problem is writing efficient SQL queries in this environment. For example, I query a large table (with tens of millions of records) for a small subset of ID's. Then, I use this subset to query the larger table again:</p> <pre><code>proc sql; CREATE TABLE subset AS ( SELECT id FROM bigTable WHERE someValue = x AND date BETWEEN a AND b ) </code></pre> <p>This works in a matter of seconds and returns 90k ID's. Next, I want to query this set of ID's against the big table, and problems ensue. I'm wanting to sum values over time for the ID's:</p> <pre><code>proc sql; CREATE TABLE subset_data AS ( SELECT bigTable.id, SUM(bigTable.value) AS total FROM bigTable INNER JOIN subset ON subset.id = bigTable.id WHERE bigTable.date BETWEEN a AND b GROUP BY bigTable.id ) </code></pre> <p>For whatever reason, this takes a really long time. The difference is that the first query flags 'someValue'. The second looks at all activity, regardless of what's in 'someValue'. For example, I could flag every customer who orders a pizza. Then I would look at every purchase for all customers who ordered pizza.</p> <p>I'm not overly familiar with SAS so I'm looking for any advice on how to do this more efficiently or speed things up. I'm open to any thoughts or suggestions and please let me know if I can offer more detail. I guess I'm just surprised the second query takes so long to process.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload