The measurement and comparison of web site and web application popularity is vital for any business model which depends on advertising for some or all of its income. Every advertiser wants to get the best value for money spent. In general, higher popularity implies higher value for advertising, although this can be offset by knowing more about visitors, and selecting appropriate advertising.
The world wide web has been around for decades. It would be nice to assume that measuring and comparing popularity of web sites is a solved problem. It most assuredly is not. With the growth of convergent applications and the spread of application usage across an increasingly diverse range of devices the situation can only become more complex.

There are several major ways to try and gain insight into web popularity. The oldest way is to examine the server logs of the systems hosting the application. Servers typically log every page request from every user. In theory, all that is needed is to determine which requests originated from which users, and the count of distinct users will give the popularity.
Unfortunately, there are several problems with this approach. The first problem is how to determine which logged requests belong to which users. Some applications use “cookies” (small nuggets of information stored by a web browser and passed back to the server on every request). However, at best a cookie only identifies a particular web browser on a particular machine – a user who visits a site from home and from work will likely have several distinct cookies. Other problems with cookies include some visitors deliberately switching off cookies, or clearing stored cookies and appearing like a new visitor. In a multiple device world, may devices simply do not support cookies.
Other approaches to correlating log entries and unique visitors, for example using IP addresses or URL parameters are even less reliable. It is also important to realise that many log entries will not be caused by real human visitors at all, but rather by “robots” such as the system Google uses to index web pages for its search engine.
The single biggest problem with relying on server logs for measuring popularity is that it is almost impossible to compare different systems. Not only will the information logged often be different, but server log files are private data. Each organization will probably be able to compare the popularity of its own web sites and applications with each other, but probably not be able to compare them with competitors or other established web sites.
As we have seen, there are significant problems with server logs for popularity measurement. This has led to a range of alternate techniques and services, each of which has its own advantages and disadvantages.
A familiar approach to advertisers is the one taken by Nielsen. Just as for TV ratings, a select group of people are issued with devices to track their internet usage, and the results are treated as statistically similar to the broader population. This has the advantage of being similar to the traditional way of rating TV, but it also has the same disadvantages of group bias and generalising from a small group to a much larger one. This approach is unique among web traffic analysis in that the company knows more than just the web activity of its sample group. Nielsen collects demographic and personal data about its test panel, and also interviews members to obtain more subjective information.
Nielsen’s reputation as a source of web rankings took a significant blow in 2007, when they arbitrarily decided to change the way of calculating overall popularity. There is also a strong argument that the statistical problems with using a small test group are magnified in the case of the web. Unlike TV where viewers choose from at most a few hundred channels, on the web there is an almost limitless number of sites to visit. For this reason, Nielsen web figures probably only make sense for the most popular few web destinations.
In an attempt to take a similar approach but to gather much broader statistics, Alexa have been busy for years encouraging web users to install monitoring software in the form of a “toolbar”. Alexa rankings suffer from some of the same problems as the Neilsen approach. Alexa generalises from a relatively small set of users, suffers potential bias toward the kind of people who will install their toolbar and, significantly, lacks data from anyone prevented from installing the Alexa software by company policy.
A third approach to tracking site usage and popularity is the one exemplified by Google Analytics. With this technique the site operator is required to place some special code in every page of the site. Whenever a visitor views a page, Google is notified, and compiles both specific and overall statistics. In general, this is considered a more accurate way of tracking and measuring overall site usage than the statistical approach taken by Nielsen and Alexa. However, it faces its own problems. Google’s embedded code relies on a specific browser feature (JavaScript) to be present and enabled and for the web browser to be able to effectively communicate with Google’s logging servers. If either of these are not working, then no activity is recorded from that user. Most significantly, though. Google Analytics shares a problem with the traditional analysis of server logs; the data gathered for any particular site is private, and unsuitable for comparison with competitors.
So, for now, there is no single best way to measure and compare web popularity. Attempts are continuing to try and solve this overall problem, but I don’t hold out hope for an answer any time soon.
As for other devices and communication methods, there is next to nothing. Some web-based techniques happen to work with a small number of mobile devices, and some TV-based techniques might work with some set-top box interfaces, but for now user and popularity tracking for convergent applications is still back in the “stone age” of reading log files.
I predict that there is a major business opportunity for anyone who can crack this problem.



For years Frank Carver has been paying attention to the strange world of convergent technology. During that time he has discussed and researched broad subject areas, come to some surprising conclusions, produced and distributed digital media, scattered ideas and opinions like sparks from a firework, and above all consulted for businesses both large and small to help develop and deploy successful systems, services, and products in this highly complex arena.

