Wednesday, June 18, 2008

A Better ROBOT9000

Introduction

Quite a while ago, XKCD posted an idea about an auto-moderating IRC bot to maintain some sense of order in his channel. The basis for the idea is that as social communities grow, they start to suck. In every online social network I've been involved in, this has been true -- it has also been true in part for some social activities I enjoy outside of the computer / internet realm. In this post, I'd like to touch both on the sociological as well as technical aspects of this subject.


Sociological Aspects of Online Communities

It's probably universally accepted that as a community becomes larger, some form of government should be set up to manage it (anarchists aside). This is evident in the smallest of groups to the largest: we see political order in everything from clubs to our jobs to the management of our society our own countries.

Initially, a very small community consisting of only a few friends may function well without any intervention. After all, these are people who are familiar with each other and thus have experience interacting with each other socially. They know the quirks of the others; they are aware of the boundaries of what is found socially acceptable to each other. A community like this is self-governing: there is no need to instantiate regulations upon such a small community because there is little to no cause for social discord in their normal interaction.

If the idea behind the community is a good one, others may be invited to join who are strangers to the others. As the community grows, more nuances and social quirks are introduced. People become less familiar with one another and we start to see social discord when people (perhaps unintentionally) overstep their boundaries. This becomes more prevalent as the community continues to grow. However, the group remains closed: everyone has a common interest (which is the centerpiece of interaction in the community), and it is assumed that everyone is a friend of someone else there. It is due to this closed nature that, though social boundaries may be traversed from time to time, there is no need for regulation: the group can function normally without moderation.

Now let's examine an open group. Imagine that we have a public service: a swimming pool. In general, most of the attendants will not know each other and, while some bonds may form between them, each attendant is largely unaware (and even complacent) about the social boundaries and expectations of the others. In this case, rules for safety are instantiated with an active moderator (or more), who we see manifested as lifeguards. Frequently, these rules are not limited to physical safety: social safety is moderated as well! (Next time you are at a public swimming facility, look at the rules board: most of the time, there are rules forbidding screaming and profanity.) From personal experience, most people are aware that as the size of the group at such a facility grows, more intervention is needed to ensure the safety of all around.

On to open anonymous groups. This is a fairly new concept (within the last several decades) made possible by advances in communication. BBSes and the Internet have made it possible for people to communicate without knowing anything about each other. It is in this form of communication that moderation is direly needed: as interpretation of communication is largely non-verbal (or, in this case, non-textual), we are making it in a sense more difficult to communicate our ideas. Additionally, due to the implied anonymity of an alias, people are more apt to extend their behavior to extremes as any consequence is fairly benign. Rules are placed and moderation is available, but this is frequently not enough, and largely due to the inconsistencies of moderators in these communities. As moderation is unpaid, their consistency suffers, both in the enforcement of their policies and the actual time they spend moderating. Additionally, what one moderator may consider socially acceptable, another may not, leading to inconsistency between moderators. Finally, since there are inconsistencies in the time they are willing and able to spend performing the moderation, times frequently arise in which no moderation is available, and anarchy ensues.

A lot of what ends up being socially unacceptable in a situation turns out to be repeated conversation. If experience is any indication, most things that could be considered annoying, profane, or otherwise socially harmful are indeed repetitive. Internet memes are repeated to the point of not being any kind of funny; profanities and insults -- well, there are a finite combination of these. A solution proposed by XKCD for IRC chat communities is to create an auto-moderation bot that penalizes all violations in a well-defined and consistent manner.


Implementation

The original implementation of ROBOT9000 is in Perl with a MySQL back end. This implementation is extremely suboptimal: SQL is not good for text searching, and in test use, the bot proves itself to be rather unreliable. It is being run on a server for the ScoreHero community, connecting to the same MySQL database used for the site. The site has over 235,000 members at this time and with its score management and forum features, has a very heavily loaded MySQL database. Because of this, the bot seems to lose its socket to the database after short periods of inactivity, causing it to die and require restarting. Running the bot on a different server seems somewhat counterproductive since the channel is officially moderated by the ScoreHero staff and the bot should be run on assets belonging to the site.

Hence, I'm working on a ROBOT9000 implementation in C. Its functionality is the same, but it uses a red-black tree data structure, storing two 32-bit hashes for each unique line of text, penalizing if the hashes already exist in the list. There are a couple issues to hammer out regarding best practices, manual intervention, and collisions (which I'll probably leave alone). Right now, it's semi-usable. Source code is available at http://testbed.dh0.us/~dho/crobot/