Note that there are some explanatory texts on larger screens.

plurals
  1. POMassive CROSS JOIN in SQL Server 2005
    primarykey
    data
    text
    <p>I'm porting a process which creates a MASSIVE <code>CROSS JOIN</code> of two tables. The resulting table contains 15m records (looks like the process makes a 30m cross join with a 2600 row table and a 12000 row table and then does some grouping which must split it in half). The rows are relatively narrow - just 6 columns. It's been running for 5 hours with no sign of completion. I only just noticed the count discrepancy between the known good and what I would expect for the cross join, so my output doesn't have the grouping or deduping which will halve the final table - but this still seems like it's not going to complete any time soon.</p> <p>First I'm going to look to eliminate this table from the process if at all possible - obviously it could be replaced by joining to both tables individually, but right now I do not have visibility into everywhere else it is used.</p> <p>But given that the existing process does it (in less time, on a less powerful machine, using the FOCUS language), are there any options for improving the performance of large <code>CROSS JOIN</code>s in SQL Server (2005) (hardware is not really an option, this box is a 64-bit 8-way with 32-GB of RAM)?</p> <p>Details:</p> <p>It's written this way in FOCUS (I'm trying to produce the same output, which is a CROSS JOIN in SQL):</p> <pre><code>JOIN CLEAR * DEFINE FILE COSTCENT WBLANK/A1 = ' '; END TABLE FILE COSTCENT BY WBLANK BY CC_COSTCENT ON TABLE HOLD AS TEMPCC FORMAT FOCUS END DEFINE FILE JOINGLAC WBLANK/A1 = ' '; END TABLE FILE JOINGLAC BY WBLANK BY ACCOUNT_NO BY LI_LNTM ON TABLE HOLD AS TEMPAC FORMAT FOCUS INDEX WBLANK JOIN CLEAR * JOIN WBLANK IN TEMPCC TO ALL WBLANK IN TEMPAC DEFINE FILE TEMPCC CA_JCCAC/A16=EDIT(CC_COSTCENT)|EDIT(ACCOUNT_NO); END TABLE FILE TEMPCC BY CA_JCCAC BY CC_COSTCENT AS COST CENTER BY ACCOUNT_NO BY LI_LNTM ON TABLE HOLD AS TEMPCCAC END </code></pre> <p>So the required output really is a CROSS JOIN (it's joining a blank column from each side).</p> <p>In SQL:</p> <pre><code>CREATE TABLE [COSTCENT]( [COST_CTR_NUM] [int] NOT NULL, [CC_CNM] [varchar](40) NULL, [CC_DEPT] [varchar](7) NULL, [CC_ALSRC] [varchar](6) NULL, [CC_HIER_CODE] [varchar](20) NULL, CONSTRAINT [PK_LOOKUP_GL_COST_CTR] PRIMARY KEY NONCLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [JOINGLAC]( [ACCOUNT_NO] [int] NULL, [LI_LNTM] [int] NULL, [PR_PRODUCT] [varchar](5) NULL, [PR_GROUP] [varchar](1) NULL, [AC_NAME_LONG] [varchar](40) NULL, [LI_NM_LONG] [varchar](30) NULL, [LI_INC] [int] NULL, [LI_MULT] [int] NULL, [LI_ANLZ] [int] NULL, [LI_TYPE] [varchar](2) NULL, [PR_SORT] [varchar](2) NULL, [PR_NM] [varchar](26) NULL, [PZ_SORT] [varchar](2) NULL, [PZNAME] [varchar](26) NULL, [WANLZ] [varchar](3) NULL, [OPMLNTM] [int] NULL, [PS_GROUP] [varchar](5) NULL, [PS_SORT] [varchar](2) NULL, [PS_NAME] [varchar](26) NULL, [PT_GROUP] [varchar](5) NULL, [PT_SORT] [varchar](2) NULL, [PT_NAME] [varchar](26) NULL ) ON [PRIMARY] CREATE TABLE [JOINCCAC]( [CA_JCCAC] [varchar](16) NOT NULL, [CA_COSTCENT] [int] NOT NULL, [CA_GLACCOUNT] [int] NOT NULL, [CA_LNTM] [int] NOT NULL, [CA_UNIT] [varchar](6) NOT NULL, CONSTRAINT [PK_JOINCCAC_KNOWN_GOOD] PRIMARY KEY CLUSTERED ( [CA_JCCAC] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] </code></pre> <p>With the SQL Code:</p> <pre><code>INSERT INTO [JOINCCAC] ( [CA_JCCAC] ,[CA_COSTCENT] ,[CA_GLACCOUNT] ,[CA_LNTM] ,[CA_UNIT] ) SELECT Util.PADLEFT(CONVERT(varchar, CC.COST_CTR_NUM), '0', 7) + Util.PADLEFT(CONVERT(varchar, GL.ACCOUNT_NO), '0', 9) AS CC_JCCAC ,CC.COST_CTR_NUM AS CA_COSTCENT ,GL.ACCOUNT_NO % 900000000 AS CA_GLACCOUNT ,GL.LI_LNTM AS CA_LNTM ,udf_BUPDEF(GL.ACCOUNT_NO, CC.COST_CTR_NUM, GL.LI_LNTM, 'N') AS CA_UNIT FROM JOINGLAC AS GL CROSS JOIN COSTCENT AS CC </code></pre> <p>Depending on how this table is subsequently used, it should be able to be eliminated from the process, by simply joining to both the original tables used to build it. However, this is an extremely large porting effort, and I might not find the usage of the table for some time, so I was wondering if there were any tricks to <code>CROSS JOIN</code>ing big tables like that in a timely fashion (especially given that the existing process in FOCUS is able to do it more speedily). That way I could validate the correctness of my building of the replacement query and then later factor it out with views or whatever.</p> <p>I am also considering factoring out the UDFs and string manipulation and performing the CROSS JOIN first to break the process up a bit.</p> <p><strong>RESULTS SO FAR:</strong></p> <p>It turns out that the UDFs do contribute a lot (negatively) to the performance. But there also appears to be a big difference between a 15m row cross join and a 30m row cross join. I do not have SHOWPLAN rights (boo hoo), so I can't tell whether the plan it is using is better or worse after changing indexes. I have not refactored it yet, but am expecting the entire table to go away shortly.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload