« ok I caved on the RAID | Main | Bill Gurley sounds the alarm (again) on broadband »
March 14, 2005
Sifry on the state of the Blogsphere
When I first met David Sifry, he was running Technorati on a box under his desk. Today the service is tracking 40,000 new blogs a day totaling over 7.8M. The blogsphere is doubling roughly every 5 months. He recently blogged some of his analysis of this growth. I was struck by the following excerpt, specifically the comments on Spam. Not only is comment spam and trackback spam a problem, but now bots are actually creating WHOLE BLOGS solely for the purpose of creating links among themselves and a host site. Everyone tries to game the gamer. So Technorati is spending an increasing amount of their time trying to purge their indexes of links to/from spam blogs. I wonder how much time Google spends on that? Or how spam links are accounted for in the Google Pagerank?
We are currently seeing about 30,000 - 40,000 new weblogs being created each day, depending on the day. Compared to the past, this is well over double the rate of change in October, when there were about 15,000 new weblogs created each day. The remarkable growth over the past 3 months can be attributed to the increase in new, mainstream services such as MSN Spaces, and in increases of use of services like Blogger, AOL Journals, and LiveJournal. In addition, services outside the United States have been taking off, including a number of media sites promoting blogging, such as Le Monde in France.
There is a dark underbelly to these numbers, however: Part of the growth of new weblogs created each day is due to an increase in spam blogs - fake blogs that are created by robots in order to foster link farms, attempted search engine optimization, or drive traffic through to advertising or affiliate sites. We have been battling the spam situation in a significant way for about 2 months - prior to January, spam wasn't much of an issue. All of these charts reflect Technorati's databases after spam blogs have been removed, and we feel that we've been able to capture and identify most of the spam out there, but one should note that there is definitely blog spam that we don't catch (tell us if you see spam in the index!). I'd estimate that we currently catch about 90% of spam and remove it from the index, and notify the blog hosting operators. Most of this fake blog spam comes from hosted services or from specific IP addresses. One of the results of the extremely productive Spam Squashing Summit of a few weeks ago is the increased collaboration between services in order to report and combat this spam. Right now, about 20% of the aggregate pings Technorati receives are from spam blogs, so you won't see that in these numbers - these statistics show only "cleaned" data.


Posted by Martin at March 14, 2005 7:32 PM
Trackback Pings
TrackBack URL for this entry:
http://www.nwventurevoice.com/cgi-bin/mt-tb.cgi/1816
Comments
Post a comment
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)