Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><code>My task is to find business objects which are exactly same as given object (ones which have exactly same set of properties with exactly same values). Performance is critical.</code></p> <p>Approach might vary depending on the average number of properties objects will typically have, a few versus dozens. </p> <p>Assuming that objects have a varying number of properties:</p> <p>I would start with a composite non-unique index on the dyad (PropertyValues.PropertyId, PropertyValues.PropertyValue) for select-performance.</p> <p>Then, given an entity ID, I would select its propertid, propertyvalue pairs into a cursor. </p> <p>[EDIT: Not sure whether (entityid, propertyid) is unique in your system or if you are allowing multiple instances of the same property id for an entity, e.g. FavoriteColors:</p> <pre><code> entityid propertyid property value 1 17 blue 1 17 dark blue 1 17 sky blue 1 17 ultramarine </code></pre> <p>You would also need either a non-unique index on the monad (PropertyValues.entityid) or a composite index on (PropertyValues.entityid,PropertyValues.propertyid); the composite index would be unique if you wanted to prevent the same propertyid from being associated with an entity more than once.</p> <p>If a property can occur multiple times, you should probably have a CanBeMultivalued flag in your Properties table. You should have a unique index on the triad (entityid, propertyid, propertyvalue) if you wanted to prevent this:</p> <pre><code> entityid propertyid property value 1 17 blue 1 17 blue </code></pre> <p>If you have this triad indexed, you would not need (entityid) index or the (entityid, propertyid) composite index in the PropertyValues table.</p> <p>[/EDIT]</p> <p>Then I would create a temp table to store matching entity ids.</p> <p>Then I would iterate my cursor above to grab the given entity's propertyid, propertyvalue pairs, one pair at a time, and issue a select statement with each iteration:</p> <pre><code> insert into temp select entityid from PropertyValues where propertyid = mycursor.propertyid and propertyvalue = mycursor.propertyvalue </code></pre> <p>At the end of the loop, you have a non-distinct set of entityids in your temp table for all entities that had <em>at least one of the properties in common with the given object</em>. But the ones you want must have <em>all</em> properties in common.</p> <p>Since you know how many properties the given object has, you can do the following to fetch only those entities that have all of the properties in common with the given object:</p> <pre><code> select entityid from temp group by entityid having count(entityid) = {the number of properties in the given object} </code></pre> <p>ADDENDUM:</p> <p>After the first property-value pair of the given object is used to select all potential matches, your temp table would not be missing any possible matches; rather it would contain entityids that were not perfect matches, which must be discarded in some manner, either by being ignored (by your group by having... clause) or by being explicitly removed from the temp table.</p> <p>Also, after the first iteration of the loop, you could explore the possibility that an inner join between the temp table and the PropertyValues table might offer some performance gain:</p> <pre><code> select entityid from propertvalues &gt;&gt; inner join temp on temp.entityid = propertyvalues.entityid &lt;&lt; where propertyid = mycursor.propertyid and propertyvalue = mycursor.propertyvalue </code></pre> <p>And you might also try removing entityids from temp after the first iteration:</p> <pre><code> delete from temp where not exists ( select entityid from propertyvalues inner join temp on temp.entityid = propertyvalues.entityid where propertyid = mycursor.propertyid and propertyvalue = mycursor.propertyvalue ) </code></pre> <p>Alternatively, it would be possible to optimize this looping approach further if you stored some metadata about property-frequency. Optimally, when looking for matches for a given entity, you'd want to begin with the <em>least frequently occuring property-value pair</em>. You could order the given object's property-value pairs by ascending frequency, so that in your loop you'd be looking for the rarest one first. That would reduce the set of potential matches to its smallest possible size on the first iteration of the loop.</p> <p>Of course, if temp were empty at any time after the given object's first property-value pair was used to look for matches, you would know that there are no matches for your given object, because you have found a property-value that no other entity possesses, and you could exit the loop and return a null set.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload