Whose Government Social Media Bot Did I Trigger?

This weekend, my logs were absolutely slaughtered by three different IP addresses belonging to "PSINet," which I've traced before to Cyveillance activity (Cyveillance is the preferred tracking provider of Goldman Sachs and the NY Fed at least, as far as I can tell from my logs).

My suspicions were confirmed by a Whois search:
network:Street-Address:1555 Wilson Blvd, Suite 406
network:Org-Name:Cyveillance Inc.
network:Updated:2010-07-09 18:51:19
network:Updated-by:Michael Callender

Listen, you fuckers, I don't know what you want or why you crawled 1/3 of my site over the weekend but if you have a question, why don't you ask?

Yup, there's a convenient Cyveillance office just up the way from me in government-infested Arlington, VA.

Has the Cyveillance bot molesterbated your logs?
It is belived that Cyveillancebot crawls the Web looking to mine information about the current Web zeitgeist for corporations, as well as searching for copyrighted materials and brands and logos that may be misappropriated. The bot, according to the Cyveillance Web site and other sources, is part of a suite of technologies that feeds to a human analyst. The technology was called NetSapien in the 1999-2000 timeframe, though Cyveillance' Web site uses other terms in 2003.

Cyveillancebot uses IP addresses in the range of -, and may use others (but unconfirmed). Here's a list of other 'media enforcer' bots, servers et al.

Cyveillancebot ignores robot.txt, as far as anyone can tell. Cyveillancebot spoofs its identity, naming itself various flavors of Windows browsers: - - [02/May/2003:13:01:37 -0700] "Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 3.51)" - - [02/May/2003:13:01:37 -0700] "Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 5.0)" - - [02/May/2003:13:01:58 -0700] "Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 3.51)" - - [02/May/2003:13:01:58 -0700] "Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 3.51)" - - [02/May/2003:13:02:57 -0700] "Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 4.0)"

Whether this is code changing the ID at frequent intervals, or, say, a number of machines behind a common firewall is unknown. Cyveillancebot sometimes shows the same ID for all accesses over a given time period.

Cyveillancebot is a bit unusual for a bot in that it includes the referrer line (Googlebot doesn't), but this may be part of the ploy to look like a browser in access logs.

Cyveillancebot doesn't, however, download graphics files, java or other page components. It does seem to download other types of binary files (perhaps it is looking for illegal mp3s etc.).

Cyveillancebot seems to operate in 2 modes. In mode 1, it comes in on a link and reads a single spage. In mode 2, it downloads every page in a directory or even a whole web site as fast as it can. The mode 2 behavior is notoriously bad, and can amount to a DOS attack consuming all available bandwidth.

The bot is also reported to get stuck in query loops on database-driven sites and is reported to have brought at least one server down. No one knows why this behavior is allowed to continue: the parameters and practices for good behavior are well understood. Whether this is incompetence, indifference or, perhaps, an (curious) example of music industry 'punitive' technology, is unknown.
I have seen numerous Cyveillance bots in Santa Clara, CA (another notorious "data center") but never this many in Washington DC at once. I mean they spent the entire weekend up my ass. And I wasn't the only one.

Beware 38.105.**!! Those fuckers don't play around.

Who is paying these fools to prowl incendiary websites when they can't even manage to hide their IP addresses when they assrape your logs?

