Note that there are some explanatory texts on larger screens.

plurals
  1. POEfficient Query Generation for Permutations of Tags
    primarykey
    data
    text
    <p>Here's a simplified version of a problem I'm encountering at work. The details have been changed and more generalized so I can explain it easier.</p> <p>Let's say you have a blog engine that allows blog posts to be assigned tags when they're created. So I could write a post titled "My Vacation in Italy", and I decide to add the following tags to it: <code>has-photos</code>, <code>vacation</code>, <code>family</code>. As part of my blog engine, I can create custom actions based on groups of tags. So I decided before writing it that any post with the tags <code>has-photos</code> and <code>family</code> will be automatically shared on Facebook. When that post is created for the first time, I have to then automatically cross-reference all of its tags with all actions that can be performed on combinations of those tags.</p> <p>When the "My Vacation in Italy" post is saved, I then need to look-up all actions for the following groups of tags:</p> <ul> <li><code>has-photos</code></li> <li><code>vacation</code></li> <li><code>family</code></li> <li><code>has-photos</code> &amp; <code>vacation</code></li> <li><code>has-photos</code> &amp; <code>family</code></li> <li><code>vacation</code> &amp; <code>family</code></li> <li><code>has-photos</code> &amp; <code>vacation</code> &amp; <code>family</code></li> </ul> <p>Generating that query is trivial, I just get all the permutations of any length from the original tag set of the post. It comes out to being <code>2^N - 1</code> possibilities of tag combinations.</p> <p>The problem I'm running into arises when you put this up against large datasets. What we're dealing with are the following:</p> <ul> <li>10,000+ "posts" arriving daily</li> <li>20+ "tags" per "post"</li> <li>1,000s of "actions" existing already when blog posts arrive, with varying #s of tags they're triggered on</li> </ul> <p>When a post arrives with 20 tags, that comes out to a little over a million permutations I'd be generating a query for. Even if my database allowed me to send query strings to it that large (hint: it doesn't), it'd still take forever to run.</p> <p>Is there a clever solution to this I'm not thinking of? Right now as I see it, I'm left with one possibility:</p> <h3>Actions use OR instead of AND</h3> <p>I could change it so that when you create a pre-defined action, the tags it acts on are implicitly OR'ed instead of AND'ed. Then the tag combinations drops from <code>2^N - 1</code> to just <code>N</code>. Unfortunately this would severely limit the usefulness of the "tag action" feature.</p> <p>Edit: I'm not necessarily looking for an answer in SQL. Just a different approach to solving this problem, even if it's just a high level description.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload