StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.</p> <p>Here's how i've tried to reconcile the disparity when i've come across these studies: </p> <p><em>Data Sources recorded in server-side collection but not client-side:</em></p> <ul> <li><p>hits from <strong>mobile devices</strong> that don't support javascript (this is probably a significant source of disparity between the two collection techniques--e.g., Jan 07 <a href="http://www.comscore.com/press/release.asp?press=1432" rel="nofollow noreferrer">comScore study</a> showed that 19% of UK Internet Users access the Internet from a mobile device)</p></li> <li><p>hits from <strong>spiders</strong>, bots (which you mentioned already)</p></li> </ul> <p><em>Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:</em></p> <ul> <li><p>hits from users behind <strong>firewalls</strong>, particularly corporate firewalls--firewalls block page tag, plus some are configured to reject/delete cookies.</p></li> <li><p>hits from users who have <strong>disabled javascript in their browsers</strong>--five percent, according to the <a href="http://www.w3schools.com/browsers/browsers_stats.asp" rel="nofollow noreferrer">W3C Data</a></p></li> <li><p>hits from users who <strong>exit the page before it loads</strong>. Again, this is a larger source of disparity than you might think. The most frequently-cited <a href="http://www.stonetemple.com/article/analytics-report-august-2007-part2.shtml" rel="nofollow noreferrer">study</a> to support this was conducted by Stone Temple Consulting, which showed that the difference in unique visitor traffic between two identical sites configured with the same web analytics system, but which differed only in that the js tracking code was placed at the <strong>bottom</strong> of the pages in one site, and at the <strong>top</strong> of the pages in the other--was <strong>4.3</strong>%</p></li> </ul> <hr> <p>FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:</p> <ol> <li><p>monitor requests for our <strong>robots.txt</strong> file: then of course filter all other requests from same IP address + user agent (not all spiders will request robots.txt of course, but with miniscule error, any request for this resource is probably a bot.</p></li> <li><p>compare user agent and ip addresses against published lists: <strong>iab.net</strong> and <strong>user-agents.org</strong> publish the two lists that seem to be the most widely used for this purpose</p></li> <li><p><strong>pattern analysis</strong>: nothing sophisticated here; we look at (i) page views as a function of time (i.e., clicking a lot of links with 200 msec on each page is probative); (ii) the path by which the 'user' traverses out Site, is it systematic and complete or nearly so (like following a back-tracking algorithm); and (iii) precisely-timed visits (e.g., 3 am each day).</p></li> </ol>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload