Friday, October 12, 2007

Creating a Job queue in Innodb

This post is half-this is how you do it, and half-is there a better way to do it?

So you want to build a system that does jobs. You want the jobs to be able to be run in parallel for speed, but also for redundancy. This system needs to be coordinated so, for example, the same jobs aren't being done twice, the status of every job is easy to see, and multiple servers can run jobs by simply querying the central source.

How can we build code around Innodb to have MySQL be our central controlling scheme?


CREATE TABLE IF NOT EXISTS job_queue(
id int(10) not null auto_increment,

updated timestamp not null,
started timestamp not null,

state ENUM('NEW', 'WORKING', 'DONE', 'ERROR' ) default 'NEW',

PRIMARY KEY ( id ),
KEY( STATE )
) ENGINE=Innodb;

In this schema, our job table has a unique id, started and last updated time, and a STATE. In a real system, there would probably be some more meta data here about the nature of the job.

When new jobs need to be done, rows are inserted into the table (with started=NOW(), updated changes automatically)

insert into job_queue set started=NOW();


Now, in perl, we write our job controller. This program could be run on every server you want to run jobs:


#!/usr/bin/perl

# Don't leave zombies around, shamelessly stolen from 'man perlipc'
use POSIX ":sys_wait_h";
sub REAPER {
my $child;
while (($child = waitpid(-1,WNOHANG)) > 0) {
$Kid_Status{$child} = $?;
}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;

use DBD::mysql;

my $dbh = DBI->connect(
'DBI:mysql:database=test;host=127.0.0.1;port=3306',
'test', 'test',
{ RaiseError => 1, AutoCommit => 0 }
);


while( 1 ) {
my $row;
eval {
my $sth = $dbh->prepare( "select id from job_queue where state='NEW' li
mit 1 for update" );
$sth->execute();
$row = $sth->fetchrow_hashref();
$sth->finish();
};

if( $@ or !$row ) {
# Couldn't lock or lock wait timeout
$dbh->commit();
sleep 10;
next;
}

# Got one, change state, commit, and fork a worker

$dbh->do( "update job_queue set state='WORKING' where id=" . $row->{id} );
$dbh->commit();

# Fork a worker
if( fork ) {
# Parent, let the child have the old connection and reconnect to
# the db.
$dbh->{InactiveDestroy} = 1;
$dbh = DBI->connect(
'DBI:mysql:database=test;host=127.0.0.1;port=3306',
'test', 'test',
{ RaiseError => 1, AutoCommit => 0 }
);
} else{
# Child
print "Locking\n";
$dbh->do( "select * from job_queue where id=" . $row->{id}
. " for update " );

print "Working\n";
#simulated work here
srand( time );
sleep rand 30;

$dbh->do( "update job_queue set STATE='DONE' where
id=" . $row->{id} );
print "Committing\n";
$dbh->commit();

$dbh->disconnect();

exit;
}
sleep 1;
}


Now, this isn't bad. But it has at least one thing I don't like about it. If a forked worker dies before the job finishes (and commits), there's no way for the job to be reassigned to another worker.

Ideally each worker would have a lock on his job row inside of a transaction, which it is doing now, but, instead of the job being in the 'WORKING' state first, I'd rather it was 'NEW' before the transaction started. If that were the case, if a worker died, it's unfinished transaction would be rolled back, and the job would be unlocked and NEW again.

However, because of the way SELECT ... FOR UPDATE works, Innodb will deadlock waiting for 'NEW' jobs to unlock from the workers (or wait until they are done). This is sub-optimal, since my parent process could be forking new jobs in the meantime. What would fix this is a SELECT ... FOR UPDATE that skipped locked rows without blocking.

If anyone knows a good way to achieve this, please let me know!

However,  assuming we have no better alternatives, we can create a reaper process like this:


#!/usr/bin/perl

use DBD::mysql;

my $dbh = DBI->connect(
'DBI:mysql:database=test;host=127.0.0.1;port=3306',
'test', 'test',
{ RaiseError => 1, AutoCommit => 0 }
);


while( 1 ) {
my $row;
eval {
my $sth = $dbh->prepare( "select id from job_queue where state='WORKING' limit 1 for update" );
$sth->execute();
$row = $sth->fetchrow_hashref();
$sth->finish();
};

if( $@ or !$row ) {
# Couldn't lock or lock wait timeout
$dbh->commit();
sleep 10;
next;
}

# Got one, change state, commit, and fork a worker

$dbh->do( "update job_queue set state='NEW' where id=" . $row->{id} ); $dbh->commit();

sleep 1;
}


This will find the first WORKING row and try to lock it. Since normal jobs will be moved to DONE before they are committed, this should only ever find jobs that are in WORKING and unlocked, which means the worker died.

However this isn't perfect. Because SELECT ... FOR UPDATE will try to lock a row already locked, we have to wait until all jobs before our stalled job are complete, getting deadlocks along the way (be sure to set your innodb_lock_wait_timeout fairly low!. Even further, I've seen dead worker processes leave behind idle mysql connections that hold onto their row locks.

Is there any smoother way to do this? I'd love to hear other people's advice.

Thursday, October 4, 2007

How safe is setting read_only?

We use a lot of dual master setups for redundancy and recoverability. Typically, most apps can't write to both masters at the same time, so for consistency we keep only one of the masters read-write at any given instant. To switch over gracefully, turn the primary read-only, wait for replication to catch up, turn the secondary read-write. Simple, eh?

Well, I've just had a report that setting the primary read-only caused a transaction to be partially replicated to a slave. Wait, what? If I'm reading the email correctly, they switched masters in the middle of a transaction. The transaction commit correctly failed on the master with a read-only error, but somehow the other master (or a slave) got part of the transaction that failed and had to be skipped.

This was an innodb table in MySQL 5.0. My first thought was that XA was disabled (though we typically use 4.1 here, so I thought of innodb_safe_binlog). I verified that the XA support appeared to be enabled correctly. My other thoughts are maybe the binlog didn't get synced properly or a thousand other possibilities... I'll update if find out more.

Now, I'm not panicking too much, this could be a non-issue, bad report, what-have-you. However, it got me thinking. How safe *is* it to simply set read-only? Transactionally speaking, is this doing what I want, which is an atomic "deny writes from now on" change? Is this safe with binlogging?

My user claims that doing a flush tables with read lock before setting read-only is safer, but is it? Does setting read-only somehow muck with the binlog to prevent it from being resynced with the inno transactional state somehow?

I figured I'd post it out here and see what people say...

UPDATE: I found out the user isn't using normal transactions, but rather AUTOCOMMIT. I'm following up to try to figure out what the exact nature of the slave error was.

However, another related question is: Is there a race condition between applying a write statement and writing the binary log? Is it possible to do one without the other if there's a sudden server crash? With Innodb, I'd expect the XA transactions to handle this event, but what about plain old MyISAM?