StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3673197
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2010-09-09T01:36:56.363
FavoriteCount
0
LastActivityDate
2010-09-09T01:36:56.363
LastEditDate
LastEditorUserId
0
OwnerUserId
32187
ParentId
3672605
PostTypeId
2
Score
27
ViewCount
0
LastEditorDisplayName
text
Body
On the e-commerce site that I work for, we use Solr to provide fast faceting and searching of the product catalog. (In non-Solr geek terms, this means the "ATI Cards (34), NVIDIA (23), Intel (5)" style of navigation links that you can use to drill-down through product catalogs on sites like Zappos, Amazon, NewEgg, and Lowe's.) This is because Solr is designed to do this kind of thing fast and well, and trying to do this kind of thing efficiently in a traditional relational database is, well, not going to happen, unless you want to start adding and removing indexes on the fly and go full EAV, which is just cough Magento cough stupid. So our SQL Server database is the "authoritative" data store, and the Solr indexes are read-only "projections" of that data. You're with me so far because it sounds like you are in a similar situation. The next step is determining whether or not it is OK that the data in the Solr index may be slightly stale. You've probably accepted the fact that it will be somewhat stale, but the next decisions are <ul> <li>How stale is too stale?</li> <li>When do I value speed or querying features over staleness?</li> </ul> For example, I have what I call the "Worker", which is a Windows service that uses <a href="http://quartznet.sourceforge.net/" rel="noreferrer">Quartz.NET</a> to execute C# <code>IJob</code> implementations periodically. Every 3 hours, one of these jobs that gets executed is the <code>RefreshSolrIndexesJob</code>, and all that job does is ping an <code>HttpWebRequest</code> over to <code>http://solr.example.com/dataimport?command=full-import</code>. This is because we use Solr's built-in <a href="http://wiki.apache.org/solr/DataImportHandler" rel="noreferrer">DataImportHandler</a> to actually suck in the data from the SQL database; the job just has to "touch" that URL periodically to make the sync work. Because the DataImportHandler commits the changes periodically, this is all effectively running in the background, transparent to the users of the Web site. This does mean that information in the product catalog can be up to 3 hours stale. A user might click a link for "Medium In Stock (3)" on the catalog page (since this kind of faceted data is generated by querying SOLR) but then see on the product detail page that no mediums are in stock (since on this page, the quantity information is one of the few things not cached and queried directly against the database). This is annoying, but generally rare in our particularly scenario (we are a reasonably small business and not that high traffic), and it will be fixed up in 3 hours anyway when we rebuild the whole index again from scratch, so we have accepted this as a reasonable trade-off. If you can accept this degree of "staleness", then this background worker process is a good way to go. You could take the "rebuild the whole thing every few hours" approach, or your repository could insert the ID into a table, say, <code>dbo.IdentitiesOfStuffThatNeedsUpdatingInSolr</code>, and then a background process can periodically scan through that table and update only those documents in Solr if rebuilding the entire index from scratch periodically is not reasonable given the size or complexity of your data set. A third approach is to have your repository spawn a background thread that updates the Solr index in regards to that current document more or less at the same time, so the data is only stale for a few seconds: <pre><code>class MyRepository { void Save(Post post) { // the following method runs on the current thread SaveThePostInTheSqlDatabaseSynchronously(post); // the following method spawns a new thread, task, // queueuserworkitem, whatevever floats our boat this week, // and so returns immediately UpdateTheDocumentInTheSolrIndexAsynchronously(post); } } </code></pre> But if this explodes for some reason, you might miss updates in Solr, so it's still a good idea to have Solr do a periodic "blow it all away and refresh", or have a reaper background Worker-type service that checks for out-of-date data in Solr everyone once in a blue moon. As for querying this data from Solr, there are a few approaches you could take. One is to hide the fact that Solr exists entirely via the methods of the Repository. I personally don't recommend this because chances are your Solr schema is going to be shamelessly tailored to the UI that will be accessing that data; we've already made the decision to use Solr to provide easy faceting, sorting, and fast display of information, so we might as well use it to its fullest extent. This means making it explicit in code when we mean to access Solr and when we mean to access the up-to-date, non-cached database object. In my case, I end up using NHibernate to do the CRUD access (loading an <code>ItemGroup</code>, futzing with its pricing rules, and then saving it back), forgoing the repository pattern because I don't typically see its value when NHibernate and its mappings are already abstracting the database. (This is a personal choice.) But when querying on the data, I know pretty well if I'm using it for catalog-oriented purposes (I care about speed and querying) or for displaying in a table on a back-end administrative application (I care about currency). For querying on the Web site, I have an interface called <code>ICatalogSearchQuery</code>. It has a <code>Search()</code> method that accepts a <code>SearchRequest</code> where I define some parameters--selected facets, search terms, page number, number of items per page, etc.--and gives back a <code>SearchResult</code>--remaining facets, number of results, the results on this page, etc. Pretty boring stuff. Where it gets interesting is that the implementation of that <code>ICatalogSearchQuery</code> is using a list of <code>ICatalogSearchStrategy</code>s underneath. The default strategy, the <code>SolrCatalogSearchStrategy</code>, hits SOLR directly via a plain old-fashioned <code>HttpWebRequest</code> and parsing the XML in the <code>HttpWebResponse</code> (which is much easier to use, IMHO, than some of the SOLR client libraries, though they may have gotten better since I last looked at them over a year ago). If that strategy throws an exception or vomits for some reason, then the <code>DatabaseCatalogSearchStrategy</code> hits the SQL database directly--although it ignores some parameters of the <code>SearchRequest</code>, like faceting or advanced text searching, since that is inefficient to do there and is the whole reason we are using Solr in the first place. The idea is that usually SOLR is answering my search requests quickly in full-featured glory, but if something blows up and SOLR goes down, then the catalog pages of the site can still function in "reduced-functionality mode" by hitting the database with a limited feature set directly. (Since we have made explicit in code that this is a search, that strategy can take some liberties in ignoring some of the search parameters without worrying about affecting clients too severely.) Key takeaway: What is important is that the decision to perform a query against a possibly-stale data store versus the authoritative data store has been made explicit--if I want fast, possibly stale data with advanced search features, I use <code>ICatalogSearchQuery</code>. If I want slow, up-to-date data with the insert/update/delete capability, I use NHibernate's named queries (or a repository in your case). And if I make a change in the SQL database, I know that the out-of-process Worker service will update Solr eventually, making things eventually consistent. (And if something was really important, I could broadcast an event or ping the SOLR store directly, telling it to update, possibly in a background thread if I had to.) Hope that gives you some insight.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POWhere / How to fit Solr into ASP.net MVC app (using nHibernate / Repository Pattern)
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USNicholas Piasecki
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POWhere / How to fit Solr into ASP.net MVC app (using nHibernate / Repository Pattern)
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COexcellent response! I use Solr indexing slightly differently in that the Solr config is setup to batch query new records at a set period. This way no code had to be written, just a change to the Solr config. Once Solr returns search matches I currently load all the data for each match from NHibernate, though I intend to change this to have all the required display data returned by Solr as some point. Never managed to get the batch import working, but need to do this soon in case the index corrupts or I change the indexed fields.
 singulars
 PostPostId
 PO
 UserUserId
 USJordan

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.