StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POVertica and joins
primarykey
Id
13531279
data
AcceptedAnswerId
0
AnswerCount
5
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2012-11-23T14:42:05.990
FavoriteCount
2
LastActivityDate
2017-10-09T21:07:00.900
LastEditDate
2014-04-03T13:14:23.437
LastEditorUserId
1504392
OwnerUserId
997904
ParentId
0
PostTypeId
1
Score
16
ViewCount
5634
LastEditorDisplayName
text
Body
I'm adapting a web analysis tool to use <code>Vertica</code> as the DB. I'm having real problems <code>optimizing joins</code>. I tried creating pre-join projections for some of my queries, and while it did make the queries blazing fast, it slowed data loading into the fact table to a crawl. A simple <code>INSERT INTO ... SELECT * FROM</code> which we use to load data into the fact table from a staging table goes from taking ~5 seconds to taking 20+ minutes. Because of this I dropped all pre-join projections and tried using the Database Designer to design query specific projections but it's not enough. Even with those projections a simple join is taking ~14 seconds, something that takes ~1 second with a pre-join projection. My question is this: Is it normal for a pre-join projection to slow data insertion this much and if not, what could be the culprit? If it is normal, then it's a show stopper for us and are there other techniques we could use to speed up the joins? We're running Vertica on a 5 node cluster, each node having 2 x quad core CPU and 32 GB of memory. The tables in my example query have 188,843,085 and 25,712,878 rows respectively. The EXPLAIN output looks like this: <pre><code>EXPLAIN SELECT referer_via_.url as referralPageUrl, COUNT(DISTINCT sessio n.id) as visits FROM owa_session as session JOIN owa_referer AS referer_vi a_ ON session.referer_id = referer_via_.id WHERE session.yyyymmdd BETWEEN '20121123' AND '20121123' AND session.site_id = '49' GROUP BY referer_via_ .url ORDER BY visits DESC LIMIT 250; Access Path: +-SELECT LIMIT 250 [Cost: 1M, Rows: 250 (STALE STATISTICS)] (PATH ID: 0) | Output Only: 250 tuples | Execute on: Query Initiator | +---> SORT [Cost: 1M, Rows: 1 (STALE STATISTICS)] (PATH ID: 1) | | Order: count(DISTINCT "session".id) DESC | | Output Only: 250 tuples | | Execute on: All Nodes | | +---> GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 1M, Rows: 1 (STALE STATISTICS)] (PATH ID: 2) | | | Aggregates: count(DISTINCT "session".id) | | | Group By: referer_via_.url | | | Execute on: All Nodes | | | +---> GROUPBY HASH (SORT OUTPUT) (RESEGMENT GROUPS) [Cost: 1M, Rows : 1 (STALE STATISTICS)] (PATH ID: 3) | | | | Group By: referer_via_.url, "session".id | | | | Execute on: All Nodes | | | | +---> JOIN HASH [Cost: 1M, Rows: 1 (STALE STATISTICS)] (PATH ID: 4) Outer (RESEGMENT) | | | | | Join Cond: ("session".referer_id = referer_via_.id) | | | | | Execute on: All Nodes | | | | | +-- Outer -> STORAGE ACCESS for session [Cost: 463, Rows: 1 (ST ALE STATISTICS)] (PUSHED GROUPING) (PATH ID: 5) | | | | | | Projection: public.owa_session_projection | | | | | | Materialize: "session".id, "session".referer_id | | | | | | Filter: ("session".site_id = '49') | | | | | | Filter: (("session".yyyymmdd >= 20121123) AND ("session" .yyyymmdd <= 20121123)) | | | | | | Execute on: All Nodes | | | | | +-- Inner -> STORAGE ACCESS for referer_via_ [Cost: 293K, Rows: 26M] (PATH ID: 6) | | | | | | Projection: public.owa_referer_DBD_1_seg_Potency_2012112 2_Potency_20121122 | | | | | | Materialize: referer_via_.id, referer_via_.url | | | | | | Execute on: All Nodes </code></pre>
Tags
<sql><join><vertica>
Title
Vertica and joins
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USUp_One
UserOwnerUserId
1. USuser997904
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POVertica and joins
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POVertica and joins
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POVertica and joins
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.