Thursday, February 3, 2011

Using Spring AOP to Retry Failed Idempotent Concurrent Operations

There are times when an operation that needs to be performed fails because of problems of concurrency. For example, take the case where there is operation currently in process whose status is stored in a database. If you also need the ability to cancel this task while it is in progress, you would naturally implement a call that would update this status to a 'cancelled' state.

In an environment where transactions are non-blocking, such as is often found in a Spring/Hibernate stack, this introduces a point of contention. If the cancel call loads and modifies the task data concurrently with an attempt by the task executor to modify the status, there is a notable chance that upon attempting to commit the cancel transaction an (in this case, ConcurrencyFailureException) exception would be thrown. The caller of the cancel call, being uninterested with the internal interactions between threads of operation and simply wanting the task cancelled, should not be exposed to such an error. They expect the task to be cancelled.

This is an example of an idempotent concurrent operation that is a good candidate for retrying upon failure. I recently encountered this very problem and will share my solution for it.

A couple of notes. First, in this example case we are talking about a runtime exception, so calling code will likely have no knowledge (nor should it) of the likelihood of a concurrency failure. Second, remember that if you are using Spring's @Transactional annotation to support your database operations, you are using AOP to surround that method call and drive the transaction. As a result, the method itself has no knowledge of whether it has succeeded, and cannot react to a failure. As such our best course of action is to use AOP ourselves to wrap the call in a try/catch block and retry if appropriate. The @Around advice is appropriate for this sort of requirement.

Because the method itself is the one that both defines the contract (if called, I will cancel this operation) and knows of the possibility for failure, the best way to define the join point is via a method annotation. The example annotation here allows us to specify both the exception that is pertinent to our case and the number of retries appropriate, providing us the flexibility to reuse it in multiple circumstances.

RetryConcurrentOperation.java
/**
 * Annotation that indicates an operation should be retried if the specified exception is encountered.
 */
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
@Documented
public @interface RetryConcurrentOperation {

    /**
     * Specify exception for which operation should be retried.
     */
    Class exception() default Exception.class;

    /**
     * Sets the number of times to retry the operation. The default of -1 indicates we want to use whatever the global default is.
     */
    int retries() default -1;
}

We now have the ability to request that a method be retried upon failing for a given reason, but we need to implement some advice to react to it. Note that we have set a default number of retries with a public setter, the intention being that your Spring configuration can define a global default that can be overridden if necessary by your method annotations. Likewise, we implement the Ordered interface so that you can ensure this has a higher priority than your transaction manager.

ConcurrentOperationFailureInterceptor.java
/**
 * Advice that traps exceptions out of annotated calls and retries the call if appropriate.
 */
@Aspect
public class ConcurrentOperationFailureInterceptor implements Ordered {

    private static final Logger LOG = LoggerFactory.getLogger(ConcurrentOperationFailureInterceptor.class);

    private static final int DEFAULT_MAX_RETRIES = 2;

    private int maxRetries = DEFAULT_MAX_RETRIES;
    private int order = 1;

    /**
     * Advice that traps an exception specified by an annotation so that the operation can be retried.
     *
     * @param pjp wrapper around method being executed
     * @param retryConcurrentOperation annotation indicating method should be wrapped
     * @return return value of wrapped call
     * @throws Exception if retries exceed maximum, rethrows exception configured in RetryConcurrentOperation annotation
     * @throws Throwable any other things the wrapped call throws will pass through
     */
    @Around("@annotation(retryConcurrentOperation)")
    public Object performOperation(ProceedingJoinPoint pjp, RetryConcurrentOperation retryConcurrentOperation) throws Throwable {
        Class exceptionClass = retryConcurrentOperation.exception();
        int retries = retryConcurrentOperation.retries();
        if (!(retries > 0)) {
            retries = this.maxRetries;
        }
        if (LOG.isInfoEnabled()) {
            LOG.info("Attempting operation with potential for {} with maximum {} retries", exceptionClass.getCanonicalName(), retries);
        }

        int numAttempts = 0;
        do {
            numAttempts++;
            try {
                return pjp.proceed();
            } catch (Throwable ex) {
                // if the exception is not what we're looking for, pass it through
                if (!exceptionClass.isInstance(ex)) {
                    throw ex;
                }

                // we caught the configured exception, retry unless we've reached the maximum
                if (numAttempts > retries) {
                    LOG.warn("Caught {} and exceeded maximum retries ({}), rethrowing.", exceptionClass.getCanonicalName(), retries);
                    throw ex;
                }
                if (LOG.isInfoEnabled()) {
                    LOG.info("Caught {} and will retry, attempts: {}", exceptionClass.getCanonicalName(), numAttempts);
                }
            }
        } while (numAttempts <= retries);
        // this will never execute - we will have either succesfully returned or rethrown an exception
        return null;
    }

    @Override
    public int getOrder() {
        return order;
    }

    /**
     * Allow overriding of the default order.
     *
     * @param order aspect order
     */
    public void setOrder(int order) {
        this.order = order;
    }

    /**
     * Allow overriding of the default maximum number of retries.
     *
     * @param maxRetries maximum number of retries
     */
    public void setMaxRetries(int maxRetries) {
       this.maxRetries = maxRetries;
    }
Of course, we then have to wire this advice up. Here is a snippet of an application context that does just that. aop-context.xml
<beans xmlns:aop="http://www.springframework.org/schema/aop"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.springframework.org/schema/beans"
    xsi:schemalocation="http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">

    <aop:aspectj-autoproxy proxy-target-class="true"/>

    <bean class="com.example.project.execution.ConcurrentOperationFailureInterceptor" id="failureInterceptor">
        <property name="maxRetries" value="3"/>
    </bean>
</beans>
Once we have this code implemented, we now have the ability to define our cancel operation in such a way that the calling code does not have to worry about unexpected failures.

MyService.java
/**
 * Task management code.
 */
public class MyService {

    // ...

    /**
     * Cancel the indicated task. Retries twice if fails because of a concurrency failure.
     */
    @Transactional
    @RetryConcurrentOperation(exception = ConcurrencyFailureException.class, retries = 2)
    public Task cancel(long taskId) {
        Task task = taskDao.get(taskId);
        task.setStatus(TaskStatus.CANCELLED);
        taskDao.save(task);
        return task;
    }

    // ...
}

Of course, for many situations there is still a chance that the threshold of configured retries could be reached and the exception could bubble up to the surface. This chance, though, should generally be minuscule. Ideally, a single retry should be sufficient for any operation to complete successfully. If you are using this technique to retry several times and are still seeing issues, it is probably time to reexamine your design. Do not use AOP to mask a poor design.

In the next post I will write some unit tests that ensure that our advice is working correctly.

2 comments: