Note that there are some explanatory texts on larger screens.

plurals
  1. POSitecore search performance when re-indexing and custom IndexingProvider
    text
    copied!<p>We are on Sitecore 6.4 and are using the shared source advanced search module and are seeing a big degredation in site search performance when the Sitecore re-index process kicks in and updates the changes to the web database.</p> <p>When we kick off a full site publish, the indexing manager picks up the changes and processes the history records, which in turn re-indexes each item that has been affected. As this is happening per item you can see the Lucene index on disk changing whilst looking at the directory (the number of files grow and change as you watch it).</p> <p>If you try and search on the public website when this is happening, the search can take noticibly longer to complete; and under heavy load it can take up to 15 seconds longer until the re-index process has finished.</p> <p>I can see this process is controlled by the IndexingProvider class. Is there any way in which to override this class and implement our own?</p> <p>We have looked at the searching logic and can see that an IndexSearchContext object is created each time a search is requested, which in turn creates a new IndexSearcher. We have changed some of the logic so that the IndexSearchContext is preserved as a singlton, which of course means that multiple requests can be served by the same Lucene IndexSearcher. This has drastically reduced memory consumption as using the same searher is recommended to increase performance.</p> <p>However, in doing this, changes to the index will not be picked up until a new IndexSearcher is created. We need a way in which to notify our code that the indexing process has finished and then we can reset our singleton IndexSearchContext object. How might we integrate this logic into the Sitecore configured code?</p> <p>When rebuilding the index manually it only takes about 5 seconds to complete. Obviously this effectively deletes the index and then creates it all again but why does the item by item update take so long? Is there not a better way in which the update process can be achieved without going item by item and it not affecting the public website?</p> <p>I would have expected others to be affected by this problem so I'm keen to hear how people have tackled the problem.</p> <p><strong>EDIT - additional info from Sitecore forum</strong></p> <p>The Sitecore.Search code does seem to make heavy use of creating/disposing new Lucene objects for a single operation. It does not seem overly scalable for large environments, which is why I was surprised when I saw the code. Especially if the indexes are large and there are a lot of content updates/publishes each day.</p> <p>Looking at the classes via dotPeek I cannot see how we would override the IndexUpdateContext as it's created in a non virtual method. A custom DatabaseCrawler could get some access but only to the context object already created.</p> <p>I notice that we can define our own Index implementation in the web.config for each index. We can also re-implement the crawler (we already have the advanced crawler in place from the shared module) and maybe get some control of the indexing process. I would be reluctant to pull out too much of the Sitecore code into our own implementation as it may affect future updates.</p> <p>I have one question though regarding the IndexingProvider. In the following method:</p> <pre><code>private void UpdateItem(HistoryEntry entry, Database database) { int count = database.Indexes.Count; if (count != 0 || this.OnUpdateItem != null) { Item obj = database.GetItem(entry.ItemId, entry.ItemLanguage, entry.ItemVersion); if (obj != null) { if (this.OnUpdateItem != null) this.OnUpdateItem((object) this, (EventArgs) new SitecoreEventArgs("index:updateitem", new object[2] { (object) database, (object) obj }, new EventResult())); for (int index = 0; index &lt; count; ++index) database.Indexes[index].UpdateItem(obj); } } } </code></pre> <p>It fires the update event, which is handled by the DatabaseCrawler as it attached to the IndexingProvider.OnUpdateItem event; but why does the method above also call the Sitecore.Data.Indexing.Index.UpdateItem method? I thought that namespace was being depreciated in version 6.5 so I'm surprised to see a link between the new and the old namespace.</p> <p>So it looks like the DatabaseCrawler is handling the update, which deletes the item and then adds it again to the index; and then the old Sitecore.Data.Indexing.Index also tries to update it. Surely there is something wrong here? I don't know though so please correct me if I am wrong, this is just what it looks like when I track through the decompiled code without any debugging.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload