Note that there are some explanatory texts on larger screens.

plurals
  1. POIndexing URL's in SQL Server 2005
    text
    copied!<p>What is the best way to deal with storing and indexing URL's in SQL Server 2005? </p> <p>I have a WebPage table that stores metadata and content about Web Pages. I also have many other tables related to the WebPage table. They all use URL as a key. </p> <p>The problem is URL's can be very large, and using them as a key makes the indexes larger and slower. How much I don't know, but I have read many times using large fields for indexing is to be avoided. Assuming a URL is nvarchar(400), they are enormous fields to use as a primary key.</p> <p>What are the alternatives? </p> <p>How much pain would there likely to be with using URL as a key instead of a smaller field.</p> <p>I have looked into the WebPage table having a identity column, and then using this as the primary key for a WebPage. This keeps all the associated indexes smaller and more efficient but it makes importing data a bit of a pain. Each import for the associated tables has to first lookup what the id of a url is before inserting data in the tables.</p> <p>I have also played around with using a hash on the URL, to create a smaller index, but am still not sure if it is the best way of doing things. It wouldn't be a unique index, and would be subject to a small number of collisions. So I am unsure what foreign key would be used in this case...</p> <p>There will be millions of records about webpages stored in the database, and there will be a lot of batch updating. Also there will be a quite a lot of activity reading and aggregating the data.</p> <p>Any thoughts?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload