Note that there are some explanatory texts on larger screens.

plurals
  1. PORefactoring a tsql view which uses row_number() to return rows with a unique column value
    text
    copied!<p>I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)</p> <pre><code>CREATE VIEW productView AS SELECT * FROM (SELECT Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering, customer.Id //various other columns FROM products LEFT OUTER JOIN customer ON customer.productId = prodcut.Id //various other joins ) as temp WHERE temp.prodcut_numbering = 1 </code></pre> <p>Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this</p> <pre><code>SELECT * FROM (SELECT Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering, customer.Id //various other columns FROM products LEFT OUTER JOIN customer ON customer.productId = prodcut.Id //various other joins ) as temp WHERE prodcut_numbering = 1 and prodcut.Id = 10 </code></pre> <p>I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines</p> <pre><code>SELECT Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering, customer.id //various other columns FROM products LEFT OUTER JOIN customer ON customer.productId = prodcut.Id //various other joins WHERE prodcut_numbering = 1 </code></pre> <p>But this doesn't seem to be allowed. Is there any way to do something similar?</p> <p>EDIT - </p> <p>After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.</p> <pre><code>CREATE TABLE Products (id int not null PRIMARY KEY) CREATE TABLE Customers ( id int not null PRIMARY KEY, productId int not null, value varchar(20) NOT NULL) declare @count int = 1 while @count &lt;= 150000 begin insert into Customers (id, productID, value) values (@count,@count/2, 'Value ' + cast(@count/2 as varchar)) insert into Products (id) values (@count) SET @count = @count + 1 end CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC) </code></pre> <p>With the above sample set, the 'get everything' query below</p> <pre><code>select * from Products outer apply (select top 1 * from Customers where Products.id = Customers.productID) Customers </code></pre> <p>takes ~1000ms to run. Adding an explicit condition:</p> <pre><code>select * from Products outer apply (select top 1 * from Customers where Products.id = Customers.productID) Customers where Customers.value = 'Value 45872' </code></pre> <p>Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload