Note that there are some explanatory texts on larger screens.

plurals
  1. POEfficient Implementation of a Notification System -- Should I use or avoid JOINs?
    primarykey
    data
    text
    <h2>The Tables</h2> <p>Let us assume that we have a table of articles:</p> <pre><code>CREATE TABLE articles ( id integer PRIMARY KEY, last_update timestamp NOT NULL, ... ); </code></pre> <p>Users can bookmark articles:</p> <pre><code>CREATE TABLE bookmarks ( user integer NOT NULL REFERENCES users(id), article integer NOT NULL REFERENCES articles(id), PRIMARY KEY(user, article), last_seen timestamp NOT NULL ); </code></pre> <h2>The Feature to be Implemented</h2> <p>What I want to do now is to inform users about articles which have been updated <em>after</em> the user has last seen them. The access to the whole system is via a web interface. Whenever a page is requested, the system should check whether the user should be notified about updated articles (similar to the notification bar on the top of a page here on SO).</p> <h2>The Question</h2> <p>What is the best and most efficient implementation of such a feature, given that both tables above contain tens of millions of rows?</p> <h2>My Solution #1</h2> <p>One could do a simple join like this:</p> <pre><code>SELECT ... FROM articles, bookmarks WHERE bookmarks.user = 1234 AND bookmarks.article = articles.article AND last_seen &lt; last_update; </code></pre> <p>However, I'm worried that doing this JOIN might be expensive if the user has many bookmarked articles (which might happen more often than you think), especially if the database (in my case PostgreSQL) has to traverse the index on the primary key of <code>articles</code> for every bookmarked article. Also the <code>last_seen &lt; last_update</code> predicate can only be checked after accessing the rows on the disk.</p> <h2>My Solution #2</h2> <p>Another method is more difficult, but might be better in my case. It involves expanding the bookmarks table by a notify column:</p> <pre><code>CREATE TABLE bookmarks ( user integer NOT NULL REFERENCES users(id), article integer NOT NULL REFERENCES articles(id), PRIMARY KEY(user, article), last_seen timestamp NOT NULL, notify boolean NOT NULL DEFAULT false ); CREATE INDEX bookmark_article_idx ON bookmarks (article); </code></pre> <p>Whenever an article is updated, the update operation should trigger setting notify to true for every user who has bookmarked this article. The big disadvantage that comes to mind is that if an article has been bookmarked a lot, setting notify to true for lots of rows can be expensive. The advantage could be that checking for notifications is as simple as:</p> <pre><code>SELECT article FROM bookmarks WHERE user = 1234 AND notify = true; </code></pre> <h2>Final Thoughts</h2> <p>I think that the second method can be a lot more efficient if the number of page views (and with it the number of times the system checks for notifications) outweighs the number of updates of articles. However, this might not always be the case. There might be users with lots of bookmarked articles which log in only once a month for a couple of minutes, and others who check for updates almost every minute.</p> <p>There's also a third method that involves a notification table in which the system INSERTs notifications for every user once an article is updated. However, I consider that an inefficient variant of Method #2 since it involves saving notifications.</p> <p>What method is the most efficient when both tables contain millions of rows? Do you have another method that might be better?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload