Note that there are some explanatory texts on larger screens.

plurals
  1. POApproximating Page Views Per Tag (or Tag Group) Per Month with Limited Data?
    text
    copied!<p>Using the Stack Overflow public data dump, I've created three simple tables:</p> <ul> <li>Questions (Question_Id, View_Count, Creation_Date)</li> <li>Tags (Tag_Name)</li> <li>QuestionTags (Question_Id, Tag_Name)</li> </ul> <p>The Questions table has hundreds of thousands of rows with Creation_Date spanning from a year ago to today. Looking over the data, there are two notable trends:</p> <ul> <li><strong>Number of Questions Increase by Period</strong> - for example, there were more questions asked this month than three months ago</li> <li><strong>Questions Views have a Long Tail</strong> - by looking at views based on weeks open, we can see that most of a question's views occur in the first week; a lesser amount in the second and third; and a long, constant tail in the following weeks</li> </ul> <p>If neither of these factors came in to play, it'd be fairly trivial to estimate traffic for a given tag (or group of tags) over a month:</p> <pre><code>SELECT YEAR(Q.Creation_Date) ,MONTH(Q.Creation_Date) ,SUM( Q.View_Count / DATEDIFF(m,Q.Creation_Date,GETDATE()) ) FROM Questions Q JOIN QuestionTags QT ON Q.Question_Id = QT.Question_Id WHERE QT.Tag_Name IN ('c#','.net', ... ) GROUP BY YEAR(Q.Creation_Date), MONTH(Q.Creation_Date) ORDER BY 1,2 </code></pre> <p>But because of aforementioned factors (especially the long tail), I'm not sure how to approximate views. My thoughts are to create a function that, using the long tail formula, will calculates views for a month based on the current number of views and weeks open.</p> <p>Here's what I came up with to find the tail:</p> <pre><code>DECLARE @SDTE DATETIME, @EDTE DATETIME SELECT @SDTE = '2009-01-11' -- after new years holiday ,@EDTE = CAST( MAX([Creation_Date]) AS INT ) FROM [Questions] SELECT [DaysOpen_Count] ,AVG( [WView_Count] ) FROM ( SELECT QT.[Tag_Name], Q.[View_Count], [DaysOpen_Count] = DATEDIFF(DAY, Q.[Creation_Date], @EDTE), [WView_Count] = CAST( Q.[View_Count] / ( DATEDIFF(DAY, Q.[Creation_Date], @EDTE) / 7.0 ) AS INT ) FROM [Questions] Q INNER JOIN [QuestionTags] QT ON Q.[Question_Id] = QT.[Question_Id] WHERE [Tag_Name] IN ('c#','.net',...) AND [Creation_Date] &lt; @EDTE ) Q GROUP BY [DaysOpen_Count] ORDER BY 1,2 </code></pre> <p>How should I proceed to create this SQL query?</p> <p>The end goal is a Stored PRocedure that inputs a CSV-string of tags and spits out the past six month's page views for those tags.</p> <p><strong>UPDATE</strong> After "earning" the tumbleweed badge, I figured it was time for a bounty!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload