Fighting Guestbook Spam

Anyone who has surfed the web for even a little bit has probably noticed the high volume of spam that is in all areas of the web. Spammers are sending their messages not only to email, but to guest books, web forms, comment pages, basically anywhere they can post a message on a site. Anyone developing these targets probably knows that it can be quite difficult to try and keep these spammers at bay, but still allow users to post comments and do so in an easy, non intrusive manner.

CAPTCHA's are becoming a popular spam fighting technique. You know those images of letters that you have to enter into a box in order to complete a form. Sure, they can be fairly effective, but they add just one more annoyance, and complexity to users trying to fill out a form. There are also projects to crack the CAPTCHA images and allow a program to identify the letters on the image, and results have been quite successful. I did not want to add this to my guest book to control the spam. I did not want to add another field for the user to fill out strictly for testing to see if they are human or not.

Javascript is another method of controlling spam that is somewhat common, either intentionally or unintentionally. The problem with this method is that it may not work if you don't take care to code the javascript correctly, or if the user has javascript disabled. For accessibility reasons, I did not want to resort to a javascript solution. I wanted my guest book to be usable by those without javascript since the rest of my site is.

After a long time of thinking about this problem and what I could do, I think I have finally come up with a solution which will work well, until spam bots become ever smarter. My solution is two-fold. Firstly, I use a two-step submit process in which I accept the form information, saves it on a temporary status to the database, and display a confirmation/preview page. Once confirmed, I make the post permanent. Secondly I used a tiny bit of javascript and a hidden input field to eliminate this second step of the submit process for users who have javascript enabled and are able to take advantage of it.

So lets have a look at a little code shall we? In my database for the guest book table I added an additional field named isPending. The field is a simple tinyint(1) field with a default value of 1. When this field is set to 1, it indicates the entry has been posted, but is still awaiting conformation. This means it will not be displayed on the guest book page because it is unknown if it is spam or not.

Next I wrote the code in the PHP page to accept the form post, and immediately add the post to the database with the new isPending flag equal to 1. I then take the ID of the generated row and print out a confirm page with the ID in a hidden input, but no inputs for the other data. In a sense, this confirm page is like a preview/confirm page. In Pseudo-code, it looks something like this:

<?php
if (count($_POST) > 0){
        $db->query("INSERT INTO guestbook (name, comments, isPending) VALUES ('$name', '$comments', 1)");
        $id = $db->lastid();

        echo 'Add entry?';
        echo '<input type="hidden" value="'.$id.'" name="confirmAdd" />';
        echo '<input type="submit" value="Yes" />
}
?>

When the confirm page is submitted, I pass the ID of the added row so I can just toggle the isPending flag to zero and the entry finally looks like it was added. I do not allow any post data other than the row ID so an entry cannot be directly posted to this second page, and any direct post from the spam bots to the first page will result in the entry being hidden as the isPending flag will forever be set to 1. In pseudo-code, the next step is something like this:

<?php
if (isset($_POST['confirmAdd'])){
        $db->query('UPDATE guestbook SET isPending=0 WHERE entryid = '.intval($_POST['confirmAdd']));
}
?>

This two-step process fixes the problem of bots simply examining the form and directly posting with the fields filled in, as they never click the confirm button on the second page so the entry is hidden. For regular uses, it gives them a chance to see their post before submitting it for good, and is not intrusive or complicated in any way.

For your convenience

For convenience though, I decided I wanted to provide the ability to skip this second step. The way to do that is to add a flag on the original form which confirms the post automatically. However, this re-opens the direct posting ability as a bot now can just auto-confirm their post. If you set the initial value of this confirm field to be in the confirmed state, then it is likely the bot will leave the field in that state, and the spam will be auto-confirmed, or they may change the field to another value which also happens to be seen as confirmed to your script. If you set it initially to be unconfirmed, it is possible the bot may be smart enough to change it to confirmed. This is especially true if you use a simple yes/no, 1/0 toggle and name the field with a common confirm name. What needs to be done is make it so the bot does not know about this field at all, but a traditional browser does.

To solve this, we use javascript to setup the field. Some browsers have problems with dynamically created hidden inputs however, so creating the field from scratch was not really a choice. What I did instead, was to add the fields input tag to the code in the HTML, but exclude the name and value attributes, and include only an ID attribute, which I made different from the name. On page load, I use some javascript to grab that input by it's ID, and dynamically assign the name and value to it to allow a confirmed post from the start. Bots as far as I know do not typically understand javascript, so until they do and are able to process these scripts, it is a safe way to determine a browser from a bot. In Pseudo-code, my form then looked a little something like this:

<form method="post" action="guestbook.php">
    Name: <input type="text" name="name" value="" /><br />
    Comments: <textarea name="comments"></textarea>
    <input type="hidden" id="confirmedAddInput" />
</form>
<script type="text/javascript">
    var input = document.getElementById('confirmedAddInput');
    input.name = 'confirmedAdd';
    input.value = 1;
</script>

With the javascript and hidden input in place, it simply becomes a matter of adjusting the server-side code to accept this pre-confirmation and enter the data into the database as confirmed, rather than as a temporary post. So what I did was check to see if the confirmedAdd field was sent, and if it's value was 1. If so, I set the pending flag to 0 rather than one, and skip the confirm page. In pseudo-code, it looks something like this:

<?php
if (count($_POST) > 0){
        if (isset($_POST['confirmedAdd']) && $_POST['confirmedAdd'] == 1){
                $db->query("INSERT INTO guestbook (name, comments, isPending) VALUES ('$name', '$comments', 0)");
        }
        else {
                $db->query("INSERT INTO guestbook (name, comments, isPending) VALUES ('$name', '$comments', 1)");
                $id = $db->lastid();

                echo 'Add entry?';
                echo '<input type="hidden" value="'.$id.'" name="confirmAdd" />';
                echo '<input type="submit" value="Yes" />
        }
}

So there you have it. A spam-fighting method that so far has been working fairly well for me on my guest book. I've been running the system for about a week and a half now and all the spam entries I have received are just sitting in the database with the isPending flag enabled so they are never seen by end users. If one wanted too, you could extend this to include a basic word filter to stop entries that are obviously spam before they even make it to the database. I chose not to go that far mostly out of laziness. See the attachments for copies of my guest book code so you can see the real deal rather than the pseudo code above.

Attachments

  • guestbook.php -- The main guest book file which contains the logic for displaying the page and processing post requests.
  • Guestbook.class.php -- A PHP class to wrap a guest book and provide easy to use methods for manipulating guest book entries.
  • GuestbookEntry.class.php -- A PHP class that just acts as a container for a guest book entry and it's data.