Fix race between stopWaitingForJobs() and getJob()

Turbo-Hipster's test suite was failing for me in a very non-deterministic
manner -- sometimes the ZuulClient would get stuck in a call to
gearman-Worker's getJob. It turned out that it was possible for the
whole worker.stopWaitingForJobs to finish before a call to
worker.getJob) gets scheduled. This meant that the stopWaitingForJobs'
logic which tried hard to interrupt any pending getJob() calls failed.

The fix is to let the getJob() check whether it missed it chance, i.e.
whether the whole worker is not supposed to be running anymore. In order
to guarantee thread safety, both setting of this variable and checking
whether it's set should happen in a synchronized manner. Stuff gets
messy here: both getJob() and stopWaitingForJobs() acquire a lock, which
means that getJob() must *not* hold the lock while it blocks (otherwise,
stopWaitingForJobs() won't be able to interrupt it because it would get
deadlocked before it gets a chance to enqueue its Nones).

It seems that there's one illusive race, though -- when thread B is
executing getJob() and gets interrupted right after the try/finally
terminates (and hence the lock is already released) and execution turns
into stopWaitingForJobs, it's quite possible that the self.running gets
unset after the getJob has already checked it. However, the lock also
protects the self.waiting_for_jobs which means that either the
stopWaitingForJobs() will see an increased waiting_for_jobs integer, or
that the getJob() will notice an updated self.running.

Change-Id: I51ec9cf06622d91ab22a4ff80630fae7913d4b5d
1 file changed