StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to optimize "text search" for inverted index and relational database?
primarykey
Id
10820060
data
AcceptedAnswerId
33154252
AnswerCount
4
ClosedDate
2017-02-21T23:56:34.107
CommentCount
2
CommunityOwnedDate
CreationDate
2012-05-30T16:07:33.157
FavoriteCount
0
LastActivityDate
2015-10-15T22:08:40.097
LastEditDate
2015-10-15T22:08:40.097
LastEditorUserId
1063062
OwnerUserId
1063062
ParentId
0
PostTypeId
1
Score
12
ViewCount
1146
LastEditorDisplayName
text
Body
<h1>Update 2015-10-15</h1> Back in 2012, I was building a personal online application and actually wanted to re-invent the wheel because am curious by nature, for learning purposes and to enhance my algorithm and architecture skills. I could have used apache lucene and others, however as I mentioned I decided to build my own mini search engine. Question: So is there really no way to enhance this architecture except by using available services like elasticsearch, lucene and others? <hr> <h1>Original question</h1> I am developing a web application, in which users search for specific titles (say for example : book x, book y, etc..) , which data is in a relational database (MySQL). I am following the principle that each record that was fetched from the db, is cached in memory , so that the app has less calls to the database. I have developed my own mini search engine , with the following architecture: <img src="https://i.stack.imgur.com/4ZRYv.jpg" alt="Architecture diagram"> This is how it works: <ul> <li>a) User searches a record name</li> <li>b) The system check what character the query starts with, checks if query there : get record. If not there, adds it and get all matching records from database using two ways: <ul> <li>Either query already there in the Table "Queries" (which is a sort of history table) thus get record based on IDs (Fast performance) </li> <li>Or, otherwise using Mysql LIKE %% statement to get records/ids (Also then keep the used query by the user in history table Queries along with the ids it maps to). -->Then It adds records and their ids to the cache and Only the ids to the inverted index map.</li> </ul></li> <li>c) results are returned to the UI</li> </ul> The system works fine, however I have Two main issues, that i couldn't find a good solution for (been trying for the past month): First issue: if you check point (b) , case where no query "history" is found and it has to use the Like %% statement : this process becomes time consuming when the query matches numerous records in the database (instead of one or two): <ul> <li>It will take some time to get records from Mysql (this is why i used INDEXES on the specific columns)</li> <li>Then time to save query history</li> <li>Then time to add records/ids to cache and inverted index maps</li> </ul> Second issue: The application allows users to add themselves new records, that can immediately be used by other users logged in the to application. However to achieve this, inverted index map and table "queries" have to be updated so that in case any old query matches to the new word. For example if a new record "woodX" is being added, still the old query "wood" does map to it. So in order to re-hook query "wood" to this new record, here is what i am doing now: <ul> <li>new record "woodX" gets added to "records" table</li> <li>then i run a Like %% statement to see which already existing query in table "queries" does map to this record(for example "wood"), then add this query with the new record id as a new row: [ wood, new id].</li> <li>Then in memory, update inverted index Map's "wood" key's value (ie the list), by adding the new record Id to this list</li> </ul> --> Thus now if a remote user searches "wood" it will get from memory : wood and woodX The Issue here is also time consumption. Matching all query histories (in table queries) with the newly added word takes a lot of time (the more matching queries, the more time). Then the in memory update also takes a lot of time. What i am thinking of doing to fix this time issue, is to return the desired results to the user first , then let the application POST an ajax call with the required data to achieve all these UPDATE tasks. But i am not sure if this is a bad practice or an unprofessional way of doing things? So for the past month ( a bit more) i tried to think of the best optimization/modification/update for this architecture, but I am not an expert in the document retrieval field (actually its my first mini search engine ever built). I would appreciate any feedback or guidance on what i should do to be able to achieve this kind of architecture. Thanks in advance. PS: <ul> <li>Its a j2ee application using servlets.</li> <li>I am using MySQL innodb (thus i cannot use full-text search option)</li> </ul>
Tags
<architecture><search-engine><inverted-index><text-search>
Title
How to optimize "text search" for inverted index and relational database?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USshadesco
UserOwnerUserId
1. USshadesco
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHow to optimize "text search" for inverted index and relational database?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.