Linking to Michael yesterday set off a bell or two. He noted, back when the referral storm hit, two things. First, that he was glad he’s gone to a hosted service (he uses Typepad), for which, hoorah; and that he was rightly concerned about bandwidth costs. People gave immediately, however, and he was well able to bear the cost.
The fellow I linked to yesterday via Manuel, Tom, noted that his site’s unexpected traffic growth had led to worries about bandwidth costs, costs that he’s defraying via a gift from his grandma!
When a blogger’s work becomes successful enough to, for a moment, graze the underbelly of commercial publishing, it threatens the very low-cost predicate of the publication itself.
Setting aside for the moment the absurdity of the situation, which is clear, it seems to me that over the past few years we’ve seen this exact phenomenon occur over and over again. I’m guessing, now that media people have integrated the blogosphere into their information gathering practices, we’ll see it with greater frequency and to more devastating effect over time.
Therefore, I think that two things need to happen. First, I think there is a proactive business opportunity for the right business to defray these transient bandwidth costs, probably in the form of short term ads on the sites that are experiencing the bolus. The obvious home for the service is in Nick Denton‘s portfolio or, maybe more sensibly, as a default feature for Typepad, Movable Type, and for premium-style accounts at Blogger, since the free accounts already have banners.
I won’t go further down that branch at the moment, but I will note that it might even be cooler yet if this feature enabled Google keyword ads. Maybe it should be an independent service, or a program that the keyword service provides for bloggers, who are currently more or less specifically discouraged from using it.
Back to my original thought.
Can we assemble a large enough sample set to generalize about traffic spikes and retention for bloggers from the various events over the past two years? I’m guessing we’d need a sample set of twenty-four events, from things like Michael’s twenty-four hours to Mahir’s moment in the sun (which is probably too long ago to get data on) to the effects of things such as being linked from instapundit or having your site or name mentioned in a media outlet such as the NYT.
The objective would be to develop predictive data, very generalized, allowing folks in the future that experience such an event to look at some pretty simple tables and decide what to do. I’m guessing we could establish percentile growth parameters for various kinds of events which would allow site-maintainers to reasonably project the shape and duration on the increased traffic.
Is this possible? What’s the best way to gather data? Should it be a data-gathering website? Or should that simply be a component?
During the dot-com boom, I saw studies on topics like this from all kinds of sources, but they were all terribly flawed, usually by the desire to predict huge market growth to justify absurd pricing to the end user or to attract VC dough or to prop up earnings and so forth if it was post-IPO. Of course, at the same time, many of these studies also were using infinitesimal user and traffic bases to develop their growth and usage projections – sometimes smaller than the traffic bases we see for blogs in general – which suggests another set of studies. Hm.
My impression is that there’s a kind of 80-20-10 on daily traffic to blogs: 80 percent or more get fewer than 100 site visitors daily; about 20 percent get between 100 and n site visitors daily, and 10 percent or less get n-plus site visitors daily. I also suspect the curve that one could plot against this simple distribution is logarithmic, based on what I know about traffic fall-off in click-throughs.
Cool thoughts. Another solution would be to leverage the BitTorrent P2P model, where people who link to something also cache and serve a copy of the linked content from their host. This would provide a dynamic, distributed hosting solution for sites receiving a lot of links. The number of distributed hosts serving the linked content scales with the number of links pointing to the URI in question.
hmm, Tom, interesting idea. How would the application functionality pair up with the blog, and who’s responsible for implementing it? The server-maintainer, I’d think, right?
If that’s true, does this highlight a possible problem, in that I think we can fairly assume the majority of blogs are actually hosted by a fairly small number of hosting or blog-service providers.
I think the Blog Census bears a lot of that out – look at the predominance of LJ, Blogspot, and Diaryland (all of whom are also first-gen providers, and look at that log curve!).
So maybe this isn’t a problem for the hosted services, but only for FQD self-hosters or remote-hosters. That wouldn’t invalidate the data-gathering project by any means though.
Another point of interest is that the distributed hosting you propose is both the effective industry solution for high-demand hosting (distributed high-availability was the buzzword, as I recall), although not based on a flexible P2P model, and, in practice, it’s already how users solve the problem of a site that’s been blocked for bandwidth consumption, such as a GeoCities site or a .Mac site.
People immediately turn to the googlecache – which of course is bulletproof in terms of bandwidth demand and server reliability (thanks to all those pigeons.
So, back to thinking about a P2P widget. Should it operate under the architecture of the blog app, like as a plugin in MT?
Interestingly – MT’s direct default support of RSS / customizable XML also provides a clue to how such a widget might operate.
Or perchance as an Apache mod? Something leads me in that direction; there’s a robust API that lots of people have experience with, and it could be invoked on a domain-by-domain basis – but then we’re back looking to the server-admins to implement. That’s something that’s gonna be tricky. First, we have the traditional, slothful reticence of server architects to add or change services. Second, adding a feature to allow users to divert bandwidth clearly presents a direct threat to billable bits, and would be perceived as a revenue-diminisher by management.
Oh, it’s a pickle. a mod is still prolly the way to go, though, because wildcatters can bear the risk and then will bring expertise with them as they seek employment stability later in life.
Now, returning to the data-gathering portion of the problem: Obviuosly, setting up a site to poll users is going to be the best way to get this information. I wonder, also, could traffic-trackers share data without violating user agreements? Ideally, as the data is piled up, it could then be reported back at near real-time; the initial conclusions could then be revisited after a period to see if anything’s changed, etc…
This seems like a clearly academic thing; while there is commercial opportunity, the paranoid dictates of competition would tend to keep the data secret or skew it as noted previously.
I suppose the first step is defining the data-types (and the amounts) needed. Hm.
Hey Mike, my idea has been stated elsewhere by people smarter than me. Read Don Park’s run down of BitTorrent through to the end, and you’ll find this bit:
“In my opinion, flash flood nature of blogs will be well served by BitTorrent. Likewise, link-happy nature of blogs will complement BitTorrent well. Ultimately, I think a tailored variation of BitTorrent should be built into blog clients and servers for download sharing of feeds, images, enclosures, and other blog-related resources. BitTorrent will encourage media-rich blog posts without applying power-law to the bloggers’ wallet. BitTorrent means blog torrents.”
Just what I was trying to get at, with my limited vocabulary.