Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I speed up queries against huge data warehouse tables with effective-dated data?
    primarykey
    data
    text
    <p>So I am querying some extremely large tables. The reason they are so large is because PeopleSoft inserts new records every time a change is made to some data, rather than updating existing records. In effect, its transactional tables are also a data warehouse.</p> <p>This necessitates queries that have nested selects in them, to get the most recent/current row. They are both effective dated and within each date (cast to a day) they can have an effective sequence. Thus, in order to get the current record for <code>user_id=123</code>, I have to do this:</p> <pre><code>select * from sometable st where st.user_id = 123 and st.effective_date = (select max(sti.effective_date) from sometable sti where sti.user_id = st.user_id) and st.effective_sequence = (select max(sti.effective_sequence) from sometable sti where sti.user_id = st.user_id and sti.effective_date = st.effective_date) </code></pre> <p>There are a phenomenal number of indexes on these tables, and I can't find anything else that would speed up my queries.</p> <p>My trouble is that I often times want to get data about an individual from these tables for maybe 50 user_ids, but when I join my tables having only a few records in them with a few of these PeopleSoft tables, things just go to crap.</p> <p>The PeopleSoft tables are on a remote database that I access through a database link. My queries tend to look like this:</p> <pre><code>select st.* from local_table lt, sometable@remotedb st where lt.user_id in ('123', '456', '789') and lt.user_id = st.user_id and st.effective_date = (select max(sti.effective_date) from sometable@remotedb sti where sti.user_id = st.user_id) and st.effective_sequence = (select max(sti.effective_sequence) from sometable@remotedb sti where sti.user_id = st.user_id and sti.effective_date = st.effective_date) </code></pre> <p>Things get even worse when I have to join several PeopleSoft tables with my local table. Performance is just unacceptable.</p> <p>What are some things I can do to improve performance? I've tried query hints to ensure that my local table is joined to its partner in PeopleSoft first, so it doesn't attempt to join all its tables together before narrowing it down to the correct user_id. I've tried the <code>LEADING</code> hint and toyed around with hints that tried to push the processing to the remote database, but the explain plan was obscured and just said 'REMOTE' for several of the operations and I had no idea what was going on.</p> <p>Assuming I don't have the power to change PeopleSoft and the location of my tables, are hints my best choice? If I was joining a local table with four remote tables, and the local table joined with two of them, how would I format the hint so that my local table (which is very small -- in fact, I can just do an inline view to have my local table only be the user_ids I'm interested in) is joined first with each of the remote ones?</p> <p><strong>EDIT:</strong> The application needs real-time data so unfortunately a materialized view or other method of caching data will not suffice.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload