In-Depth Analysis of Seata TCC Mode (1)

January 18, 2022 · 12 min read

Zhang Chenghui

Seata currently supports AT mode, XA mode, TCC mode, and SAGA mode. Previous articles have talked more about non-intrusive AT mode. Today, we will introduce TCC mode, which is also a two-phase commit.

What is TCC

TCC is a two-phase commit protocol in distributed transactions. Its full name is Try-Confirm-Cancel. Their specific meanings are as follows:

Try: Check and reserve business resources;
Confirm: Commit the business transaction, i.e., the commit operation. If Try is successful, this step will definitely be successful;
Cancel: Cancel the business transaction, i.e., the rollback operation. This step will release the resources reserved in Try.

TCC is an intrusive distributed transaction solution. All three operations need to be implemented by the business system itself, which has a significant impact on the business system. The design is relatively complex, but the advantage is that TCC does not rely on the database. It can manage resources across databases and applications, and can implement an atomic operation for different data access through intrusive coding, better solving the distributed transaction problems in various complex business scenarios.

Seata TCC mode

Seata TCC mode follows the same principle as the general TCC mode. Let's first use Seata TCC mode to implement a distributed transaction:

Suppose there is a business that needs to use service A and service B to complete a transaction operation. We define a TCC interface for this service in service A:

public interface TccActionOne {
    @TwoPhaseBusinessAction(name = "DubboTccActionOne", commitMethod = "commit", rollbackMethod = "rollback")
    public boolean prepare(BusinessActionContext actionContext, @BusinessActionContextParameter(paramName = "a") String a);

    public boolean commit(BusinessActionContext actionContext);

    public boolean rollback(BusinessActionContext actionContext);
}

Similarly, we define a TCC interface for this service in service B:

public interface TccActionTwo {
    @TwoPhaseBusinessAction(name = "DubboTccActionTwo", commitMethod = "commit", rollbackMethod = "rollback")
    public void prepare(BusinessActionContext actionContext, @BusinessActionContextParameter(paramName = "b") String b);

    public void commit(BusinessActionContext actionContext);

    public void rollback(BusinessActionContext actionContext);
}

In the business system, we start a global transaction and execute the TCC reserve resource methods for service A and service B:

@GlobalTransactional
public String doTransactionCommit(){
    // Service A transaction participant
    tccActionOne.prepare(null,"one");
    // Service B transaction participant
    tccActionTwo.prepare(null,"two");
}

The example above demonstrates the implementation of a global transaction using Seata TCC mode. It can be seen that the TCC mode also uses the @GlobalTransactional annotation to initiate a global transaction, while the TCC interfaces of Service A and Service B are transaction participants. Seata treats a TCC interface as a Resource, also known as a TCC Resource.

TCC interfaces can be RPC or internal JVM calls, meaning that a TCC interface has both a sender and a caller identity. In the example above, the TCC interface is the sender in Service A and Service B, and the caller in the business system. If the TCC interface is a Dubbo RPC, the caller is a dubbo:reference and the sender is a dubbo:service.

When Seata starts, it scans and parses the TCC interfaces. If a TCC interface is a sender, Seata registers the TCC Resource with the TC during startup, and each TCC Resource has a resource ID. If a TCC interface is a caller, Seata proxies the caller and intercepts the TCC interface calls. Similar to the AT mode, the proxy intercepts the call to the Try method, registers a branch transaction with the TC, and then executes the original RPC call.

When the global transaction decides to commit/rollback, the TC will callback to the corresponding participant service to execute the Confirm/Cancel method of the TCC Resource using the resource ID registered by the branch.

How Seata Implements TCC Mode

From the above Seata TCC model, it can be seen that the TCC mode in Seata also follows the TC, TM, RM three-role model. How to implement TCC mode in these three-role models? I mainly summarize the implementation as resource parsing, resource management, and transaction processing.

Resource Parsing

Resource parsing is the process of parsing and registering TCC interfaces. As mentioned earlier, TCC interfaces can be RPC or internal JVM calls. In the Seata TCC module, there is a remoting module that is specifically used to parse TCC interfaces with the TwoPhaseBusinessAction annotation:

The RemotingParser interface mainly has methods such as isRemoting, isReference, isService, getServiceDesc, etc. The default implementation is DefaultRemotingParser, and the parsing of various RPC protocols is executed in DefaultRemotingParser. Seata has already implemented parsing of Dubbo, HSF, SofaRpc, and LocalTCC RPC protocols while also providing SPI extensibility for additional RPC protocol parsing classes.

During the Seata startup process, the GlobalTransactionScanner annotation is used for scanning and executes the following method:

io.seata.spring.util.TCCBeanParserUtils#isTccAutoProxy

The purpose of this method is to determine if the bean has been TCC proxied. In the process, it first checks if the bean is a Remoting bean. If it is, it calls the getServiceDesc method to parse the remoting bean, and if it is a sender, it registers the resource:

io.seata.rm.tcc.remoting.parser.DefaultRemotingParser#parserRemotingServiceInfo

public RemotingDesc parserRemotingServiceInfo(Object bean, String beanName, RemotingParser remotingParser){
    RemotingDesc remotingBeanDesc = remotingParser.getServiceDesc(bean, beanName);
    if(remotingBeanDesc == null){
    return null;
    }
    remotingServiceMap.put(beanName, remotingBeanDesc);

    Class<?> interfaceClass = remotingBeanDesc.getInterfaceClass();
    Method[] methods = interfaceClass.getMethods();
    if (remotingParser.isService(bean, beanName)) {
        try {
            //service bean, registry resource
            Object targetBean = remotingBeanDesc.getTargetBean();
            for (Method m : methods) {
                TwoPhaseBusinessAction twoPhaseBusinessAction = m.getAnnotation(TwoPhaseBusinessAction.class);
                if (twoPhaseBusinessAction != null) {
                    TCCResource tccResource = new TCCResource();
                    tccResource.setActionName(twoPhaseBusinessAction.name());
                    tccResource.setTargetBean(targetBean);
                    tccResource.setPrepareMethod(m);
                    tccResource.setCommitMethodName(twoPhaseBusinessAction.commitMethod());
                    tccResource.setCommitMethod(interfaceClass.getMethod(twoPhaseBusinessAction.commitMethod(),
                    twoPhaseBusinessAction.commitArgsClasses()));
                    tccResource.setRollbackMethodName(twoPhaseBusinessAction.rollbackMethod());
                    tccResource.setRollbackMethod(interfaceClass.getMethod(twoPhaseBusinessAction.rollbackMethod(),
                    twoPhaseBusinessAction.rollbackArgsClasses()));
                    // set argsClasses
                    tccResource.setCommitArgsClasses(twoPhaseBusinessAction.commitArgsClasses());
                    tccResource.setRollbackArgsClasses(twoPhaseBusinessAction.rollbackArgsClasses());
                    // set phase two method's keys
                    tccResource.setPhaseTwoCommitKeys(this.getTwoPhaseArgs(tccResource.getCommitMethod(),
                    twoPhaseBusinessAction.commitArgsClasses()));
                    tccResource.setPhaseTwoRollbackKeys(this.getTwoPhaseArgs(tccResource.getRollbackMethod(),
                    twoPhaseBusinessAction.rollbackArgsClasses()));
                    // registry tcc resource
                    DefaultResourceManager.get().registerResource(tccResource);
                }
            }
        } catch (Throwable t) {
            throw new FrameworkException(t, "parser remoting service error");
        }
    }
    if (remotingParser.isReference(bean, beanName)) {
        // reference bean, TCC proxy
        remotingBeanDesc.setReference(true);
    }
    return remotingBeanDesc;
    }

The above method first calls the parsing class getServiceDesc method to parse the remoting bean and puts the parsed remotingBeanDesc into the local cache remotingServiceMap. At the same time, it calls the parsing class isService method to determine if it is the initiator. If it is the initiator, it parses the content of the TwoPhaseBusinessAction annotation to generate a TCCResource and registers it as a resource.

Resource Management

1. Resource Registration

The resource for Seata TCC mode is called TCCResource, and its resource manager is called TCCResourceManager. As mentioned earlier, after parsing the TCC interface RPC resource, if it is the initiator, it will be registered as a resource:

io.seata.rm.tcc.TCCResourceManager#registerResource

public void registerResource(Resource resource){
    TCCResource tccResource=(TCCResource)resource;
    tccResourceCache.put(tccResource.getResourceId(),tccResource);
    super.registerResource(tccResource);
    }

TCCResource contains the relevant information of the TCC interface and is cached locally. It continues to call the parent class registerResource method (which encapsulates communication methods) to register with the TC. The TCC resource's resourceId is the actionName, and the actionName is the name in the @TwoParseBusinessAction annotation.

2. Resource Commit/Rollback

io.seata.rm.tcc.TCCResourceManager#branchCommit

public BranchStatus branchCommit(BranchType branchType,String xid,long branchId,String resourceId,
    String applicationData)throws TransactionException{
    TCCResource tccResource=(TCCResource)tccResourceCache.get(resourceId);
    if(tccResource==null){
    throw new ShouldNeverHappenException(String.format("TCC resource is not exist, resourceId: %s",resourceId));
    }
    Object targetTCCBean=tccResource.getTargetBean();
    Method commitMethod=tccResource.getCommitMethod();
    if(targetTCCBean==null||commitMethod==null){
    throw new ShouldNeverHappenException(String.format("TCC resource is not available, resourceId: %s",resourceId));
    }
    try{
    //BusinessActionContext
    BusinessActionContext businessActionContext=getBusinessActionContext(xid,branchId,resourceId,
    applicationData);
    // ... ...
    ret=commitMethod.invoke(targetTCCBean,args);
    // ... ...
    return result?BranchStatus.PhaseTwo_Committed:BranchStatus.PhaseTwo_CommitFailed_Retryable;
    }catch(Throwable t){
    String msg=String.format("commit TCC resource error, resourceId: %s, xid: %s.",resourceId,xid);
    LOGGER.error(msg,t);
    return BranchStatus.PhaseTwo_CommitFailed_Retryable;
    }
    }

When the TM resolves the phase two commit, the TC will callback to the corresponding participant (i.e., TCC interface initiator) service to execute the Confirm/Cancel method of the TCC Resource registered by the branch.

In the resource manager, the corresponding TCCResource will be found in the local cache based on the resourceId, and the corresponding BusinessActionContext will be found based on xid, branchId, resourceId, and applicationData, and the parameters to be executed are in the context. Finally, the commit method of the TCCResource is executed to perform the phase two commit.

The phase two rollback is similar.

Transaction Processing

As mentioned earlier, if the TCC interface is a caller, the Seata TCC proxy will be used to intercept the caller and register the branch before processing the actual RPC method call.

The method io.seata.spring.util.TCCBeanParserUtils#isTccAutoProxy not only parses the TCC interface resources, but also determines whether the TCC interface is a caller. If it is a caller, it returns true:

io.seata.spring.annotation.GlobalTransactionScanner#wrapIfNecessary

As shown in the figure, when GlobalTransactionalScanner scans the TCC interface caller (Reference), it will proxy and intercept it with TccActionInterceptor, which implements MethodInterceptor.

In TccActionInterceptor, it will also call ActionInterceptorHandler to execute the interception logic, and the transaction-related processing is in the ActionInterceptorHandler#proceed method:

public Object proceed(Method method, Object[] arguments, String xid, TwoPhaseBusinessAction businessAction,
    Callback<Object> targetCallback) throws Throwable {
    //Get action context from arguments, or create a new one and then reset to arguments
    BusinessActionContext actionContext = getOrCreateActionContextAndResetToArguments(method.getParameterTypes(), arguments);
    //Creating Branch Record
    String branchId = doTccActionLogStore(method, arguments, businessAction, actionContext);
    // ... ...
    try {
    // ... ...
    return targetCallback.execute();
    } finally {
    try {
    //to report business action context finally if the actionContext.getUpdated() is true
    BusinessActionContextUtil.reportContext(actionContext);
    } finally {
    // ... ...
    }
    }
}

In the process of executing the first phase of the TCC interface, the doTccActionLogStore method is called for branch registration, and the TCC-related information such as parameters is placed in the context. This context will be used for resource submission/rollback as mentioned above.

How to control exceptions

In the process of executing the TCC model, various exceptions may occur, the most common of which are empty rollback, idempotence, and suspense. Here I will explain how Seata handles these three types of exceptions.

How to handle empty rollback

What is an empty rollback?

An empty rollback refers to a situation in a distributed transaction where the TM drives the second-phase rollback of the participant's Cancel method without calling the participant's Try method.

How does an empty rollback occur?

As shown in the above figure, after the global transaction is opened, participant A will execute the first-phase RPC method after completing branch registration. If the machine where participant A is located crashes or there is a network anomaly at this time, the RPC call will fail, meaning that participant A's first-phase method did not execute successfully. However, the global transaction has already been opened, so Seata must progress to the final state. When the global transaction is rolled back, participant A's Cancel method will be called, resulting in an empty rollback.

To prevent empty rollback, it is necessary to identify it in the Cancel method. How does Seata do this?

Seata's approach is to add a TCC transaction control table, which contains the XID and BranchID information of the transaction. A record is inserted when the Try method is executed, indicating that phase one has been executed. When the Cancel method is executed, this record is read. If the record does not exist, it means that the Try method was not executed.

How to Handle Idempotent Operations

Idempotent operation refers to TC repeating the two-phase commit, so the Confirm/Cancel interface needs to support idempotent processing, which means that it will not cause duplicate resource submission or release.

So how does idempotent operation arise?

As shown in the above figure, after participant A completes the two phases, network jitter or machine failure may cause TC not to receive the return result of participant A's execution of the two phases. TC will continue to make repeated calls until the two-phase execution result is successful.

How does Seata handle idempotent operations?

Similarly, a status field is added to the TCC transaction control table. This field has 3 values:

tried: 1
committed: 2
rollbacked: 3

After the execution of the two-phase Confirm/Cancel method, the status is changed to committed or rollbacked. When the two-phase Confirm/Cancel method is called repeatedly, checking the transaction status can solve the idempotent problem.

How to Handle Suspend

Suspension refers to the two-phase Cancel method being executed before the phase Try method, because empty rollback is allowed. After the execution of the two-phase Cancel method, directly returning success, the global transaction has ended. However, because the Try method is executed later, this will cause the resources reserved by the phase Try method to never be committed or released.

So how does suspension arise?

As shown in the above figure, when participant A's phase Try method is executed, network congestion occurs, and due to Seata's global transaction timeout limit, after the Try method times out, TM resolves to roll back the global transaction. After the rollback is completed, if the RPC request arrives at participant A at this time and the Try method is executed to reserve resources, it will cause suspension.

How does Seata handle suspension?

Add a status to the TCC transaction control table:

suspended: 4

When the two-phase Cancel method is executed, if it is found that there is no related record in the TCC transaction control table, it means that the two-phase Cancel method is executed before the phase Try method. Therefore, a record with status=4 is inserted. Then, when the phase Try method is executed, if status=4 is encountered, it means that the two-phase Cancel has been executed, and false is returned to prevent the phase Try method from succeeding.

Author Introduction

Zhang Chenghui, currently working at Ant Group, loves to share technology. He is the author of the WeChat public account "Advanced Backend," the author of the technical blog (https://objcoding.com/), and his GitHub ID is: objcoding.

In-Depth Analysis of Seata AT Mode Transaction Isolation Levels and Global Lock Design

January 12, 2022 · 8 min read

chenghui.zhang

Seata AT mode is a non-intrusive distributed transaction solution. Seata internally implements a proxy layer for database operations. When using Seata AT mode, we actually use the built-in data source proxy DataSourceProxy provided by Seata. Seata adds a lot of logic in this proxy layer, such as inserting rollback undo_log records and checking global locks.

Why check global locks? This is because the transaction isolation of Seata AT mode is based on the local isolation level of supporting transactions. Under the premise of database local isolation level of read committed or above, Seata designs a global write exclusive lock maintained by the transaction coordinator to ensure write isolation between transactions. At the same time, global transactions are by default defined at the read uncommitted isolation level.

Understanding Seata Transaction Isolation Levels

Before discussing Seata transaction isolation levels, let's review the isolation levels of database transactions. Currently, there are four types of database transaction isolation levels, from lowest to highest:

Read uncommitted
Read committed
Repeatable read
Serializable

The default isolation level for databases is usually read committed, such as Oracle, while some databases default to repeatable read, such as MySQL. Generally, the read committed isolation level of databases can satisfy the majority of business scenarios.

We know that a Seata transaction is a global transaction, which includes several local transaction branches. During the execution of a global transaction (before the global transaction is completed), if a local transaction commits and Seata does not take any measures, it may lead to reading of committed local transactions, causing dirty reads. If a local transaction that has been committed before the global transaction commits is modified, it may cause dirty writes.

From this, we can see that traditional dirty reads involve reading uncommitted data, while Seata's dirty reads involve reading data that has not been committed under the global transaction, where the global transaction may include multiple local transactions. The fact that one local transaction commits does not mean that the global transaction commits.

Working under the read committed isolation level is fine for the vast majority of applications. In fact, the majority of scenarios that work under the read uncommitted isolation level also work fine.

In extreme scenarios, if an application needs to achieve global read committed, Seata also provides a global lock mechanism to implement global transaction read committed. However, by default, Seata's global transactions work under the read uncommitted isolation level to ensure efficiency in the majority of scenarios.

Implementation of Global Locks

In AT mode, Seata uses the internal data source proxy DataSourceProxy, and the implementation of global locks is hidden within this proxy. Let's see what happens during the execution and submission processes.

1. Execution Process

The execution process is in the StatementProxy class. During execution, if the executed SQL is select for update, the SelectForUpdateExecutor class is used. If the executed method is annotated with @GlobalTransactional or @GlobalLock, it checks if there is a global lock. If a global lock exists, it rolls back the local transaction and continuously competes to obtain local and global locks through a while loop.

io.seata.rm.datasource.exec.SelectForUpdateExecutor#doExecute

public T doExecute(Object... args) throws Throwable {
    Connection conn = statementProxy.getConnection();
    // ... ...
    try {
        // ... ...
        while (true) {
            try {
                // ... ...
                if (RootContext.inGlobalTransaction() || RootContext.requireGlobalLock()) {
                    // Do the same thing under either @GlobalTransactional or @GlobalLock, 
                    // that only check the global lock  here.
                    statementProxy.getConnectionProxy().checkLock(lockKeys);
                } else {
                    throw new RuntimeException("Unknown situation!");
                }
                break;
            } catch (LockConflictException lce) {
                if (sp != null) {
                    conn.rollback(sp);
                } else {
                    conn.rollback();
                }
                // trigger retry
                lockRetryController.sleep(lce);
            }
        }
    } finally {
        // ...
    }

2. Submission Process

The submission process occurs in the doCommit method of ConnectionProxy.

If the executed method is annotated with @GlobalTransactional, it will acquire the global lock during branch registration:

Requesting TC to register a branch

io.seata.rm.datasource.ConnectionProxy#register

private void register() throws TransactionException {
    if (!context.hasUndoLog() || !context.hasLockKey()) {
        return;
    }
    Long branchId = DefaultResourceManager.get().branchRegister(BranchType.AT, getDataSourceProxy().getResourceId(),
                                                                null, context.getXid(), null, context.buildLockKeys());
    context.setBranchId(branchId);
}

When a TC registers a branch, it obtains a global lock

io.seata.server.transaction.at.ATCore#branchSessionLock

protected void branchSessionLock(GlobalSession globalSession, BranchSession branchSession) throws TransactionException {
    if (!branchSession.lock()) {
        throw new BranchTransactionException(LockKeyConflict, String
                                             .format("Global lock acquire failed xid = %s branchId = %s", globalSession.getXid(),
                                                     branchSession.getBranchId()));
    }
}

2）If the execution method has a '@GlobalLock' annotation, the global lock is checked for existence before committing, and if it does, an exception is thrown:

io.seata.rm.datasource.ConnectionProxy#processLocalCommitWithGlobalLocks

private void processLocalCommitWithGlobalLocks() throws SQLException {
    checkLock(context.buildLockKeys());
    try {
        targetConnection.commit();
    } catch (Throwable ex) {
        throw new SQLException(ex);
    }
    context.reset();
}

GlobalLock Annotation Explanation

From the execution process and submission process, it can be seen that since opening a global transaction with the @GlobalTransactional annotation can check if the global lock exists before transaction submission, why does Seata still provide a @GlobalLock annotation?

This is because not all database operations require opening a global transaction, and opening a global transaction is a relatively heavy operation that involves initiating RPC processes to TC. The @GlobalLock annotation only checks the existence of the global lock during the execution process and does not initiate a global transaction. Therefore, when there is no need for a global transaction but the global lock needs to be checked to avoid dirty reads and writes, using the @GlobalLock annotation is a lighter operation.

How to Prevent Dirty Writes

Let's first understand how dirty writes occur when using Seata AT mode:

Note: Other processes in the branch transaction execution are omitted.

When Business One starts a global transaction containing branch transaction A (modifying A) and branch transaction B (modifying B), Business Two modifies A. Business One's branch transaction A obtains a local lock before Business Two, waiting for Business One to complete the execution of branch transaction A. Business Two then obtains the local lock, modifies A, and commits it to the database. However, Business One encounters an exception during the execution of branch transaction A. Since the data of branch transaction A has been modified by Business Two, Business One's global transaction cannot be rolled back.

How to prevent dirty writes?

Business Two uses @GlobalTransactional annotation:

Note: Other processes in the branch transaction execution are omitted.

During the execution of the global transaction by Business Two, when registering the branch transaction before the submission of branch transaction A and acquiring the global lock, it finds that Business One's global lock has not been released yet. Therefore, Business Two cannot commit and throws an exception to roll back, thus preventing dirty writes.

Business Two uses @GlobalLock annotation:

Note: Other processes in the branch transaction execution are omitted.

Similar to the effect of @GlobalTransactional annotation, but without the need to open a global transaction, it only checks the existence of the global lock before local transaction submission.

Business Two uses @GlobalLock annotation + select for update statement:

If a select for update statement is added, it checks the existence of the global lock before the update operation. Business Two can only execute the updateA operation after the global lock is released.

If only @Transactional is used, there is a possibility of dirty writes. The fundamental reason is that without the GlobalLock annotation, the global lock is not checked, which may lead to another global transaction finding that a branch transaction has been modified when rolling back. Therefore, adding select for update also has a benefit, which is that it allows for retries.

How to Prevent Dirty Reads

Dirty reads in Seata AT mode refer to the scenario where data from a branch transaction that has been committed is read by another business before the global transaction is committed. Essentially, this is because Seata's default global transaction isolation level is read uncommitted.

So how to prevent dirty reads?

Business Two queries A with @GlobalLock annotation + select for update statement:

Adding the select for update statement checks the existence of the global lock before executing the SQL. The SQL can only be executed after the global lock is acquired, thus preventing dirty reads.

Author Bio:

Zhang Chenghui currently works at Ant Group and is passionate about sharing technology. He is the author of the WeChat public account "后端进阶" (Backend Advancement) and the owner of the technical blog (https://objcoding.com/). He is also a Seata Contributor with GitHub ID: objcoding.

Q&A on the New Version of Snowflake Algorithm

June 21, 2021 · 8 min read

selfishlover

In the previous analysis of the new version of the Snowflake algorithm, we mentioned two changes made in the new version:

The timestamp no longer constantly follows the system clock.
The exchange of positions between node ID and timestamp. From the original: to:

A careful student raised a question: In the new version, the algorithm is indeed monotonically increasing within a single node, but in a multi-instance deployment, it is no longer globally monotonically increasing! Because it is obvious that the node ID is in the high bits, so the generated ID with a larger node ID will definitely be greater than the ID with a smaller node ID, regardless of the chronological order. In contrast, the original algorithm, with the timestamp in the high bits and always following the system clock, can ensure that IDs generated earlier are smaller than those generated later. Only when two nodes happen to generate IDs at the same timestamp, the order of the two IDs is determined by the node ID. So, does it mean that the new version of the algorithm is wrong?

This is a great question! The fact that students can raise this question indicates a deep understanding of the essential differences between the standard Snowflake algorithm and the new version. This is commendable! Here, let's first state the conclusion: indeed, the new version of the algorithm does not possess global monotonicity, but this does not affect our original intention (to reduce database page splits). This conclusion may seem counterintuitive but can be proven.

Before providing the proof, let's briefly review some knowledge about page splits in databases. Taking the classic MySQL InnoDB as an example, InnoDB uses a B+ tree index where the leaf nodes of the primary key index also store the complete records of data rows. The leaf nodes are linked together in the form of a doubly linked list. The physical storage form of the leaf nodes is a data page, and each data page can store up to N rows of records (where N is inversely proportional to the size of each row). As shown in the diagram: The characteristics of the B+ tree require that the left node should be smaller than the right node. What happens if we want to insert a record with an ID of 25 at this point (assuming each data page can only hold 4 records)? The answer is that it will cause a page split, as shown in the diagram: Page splits are unfriendly to I/O, requiring the creation of new data pages, copying and transferring part of the records from the old data page, etc., and should be avoided as much as possible.

Ideally, the primary key ID should be sequentially increasing (for example, setting the primary key as auto_increment). This way, a new page will only be needed when the current data page is full, and the doubly linked list will always grow sequentially at the tail, avoiding any mid-node splits.

In the worst-case scenario, if the primary key ID is randomly generated and unordered (for example, a UUID string in Java), new records will be randomly assigned to any data page. If the page is already full, it will trigger a page split.

If the primary key ID is generated by the standard Snowflake algorithm, in the best-case scenario, only one node is generating IDs within each timestamp. In this case, the algorithm's effect is equivalent to the ideal situation of sequential incrementation, similar to auto_increment. In the worst-case scenario, all nodes within each timestamp are generating IDs, and the algorithm's effect is close to unordered (but still much better than completely unordered UUIDs, as the workerId with only 10 bits limits the nodes to a maximum of 1024). In actual production, the algorithm's effectiveness depends on business traffic, and the lower the concurrency, the closer the algorithm is to the ideal scenario.

So, how does it fare with the new version of the algorithm?

The new version of the algorithm, from a global perspective, produces IDs in an unordered manner. However, for each workerId, the generated IDs are strictly monotonically increasing. Additionally, since workerId is finite, it can divide into a maximum of 1024 subsequences, each of which is monotonically increasing.

For a database, initially, the received IDs may be unordered, coming from various subsequences, as illustrated here: Initial State

If, at this point, a worker1-seq2 arrives, it will clearly cause a page split: First Split

However, after the split, interesting things happen. For worker1, subsequent seq3, seq4 will not cause page splits anymore (because there is still space), and seq5 only needs to link to a new page for sequential growth (the difference is that this new page is not at the tail of the doubly linked list). Note that the subsequent IDs of worker1 will not be placed after any nodes from worker2 or beyond (thus avoiding page splits for later nodes) because they are always smaller than the IDs of worker2; nor will they be placed before the current node of worker1 (thus avoiding page splits for previous nodes) because the subsequences of worker1 are always monotonically increasing. Here, we refer to such subsequences as reaching a steady state, meaning that the subsequence has "stabilized," and its subsequent growth will only occur at the end of the subsequence without causing page splits for other nodes.

The same principle can be extended to all subsequences. Regardless of how chaotic the IDs received by the database are initially, after a finite number of page splits, the doubly linked list can always reach a stable state: Steady State

After reaching the steady state, subsequent IDs will only grow sequentially within their respective subsequences, without causing page splits. The difference between this sequential growth and the sequential growth of auto_increment is that the former has 1024 growth points (the ends of various subsequences), while the latter only has one at the end.

At this point, we can answer the question posed at the beginning: indeed, the new algorithm is not globally monotonically increasing, but the algorithm converges. After reaching a steady state, the new algorithm can achieve the same effect as global sequential incrementation.

Further Considerations

The discussion so far has focused on the continuous growth of sequences. However, in practical production, there is not only the insertion of new data but also the deletion of old data. Data deletion may lead to page merging (InnoDB, if it finds that the space utilization of two adjacent data pages is both less than 50%, it will merge them). How does this affect the new algorithm?

As we have seen in the above process, the essence of the new algorithm is to utilize early page splits to gradually separate different subsequences, allowing the algorithm to continuously converge to a steady state. Page merging, on the other hand, may reverse this process by merging different subsequences back into the same data page, hindering the convergence of the algorithm. Especially in the early stages of convergence, frequent page merging may even prevent the algorithm from converging forever (I just separated them, and now I'm merging them back together, back to square one~)! However, after convergence, only page merging at the end nodes of each subsequence has the potential to disrupt the steady state (merging the end node of one subsequence with the head node of the next subsequence). Merging on the remaining nodes of the subsequence does not affect the steady state because the subsequence remains ordered, albeit with a shorter length.

Taking Seata's server as an example, the data in the three tables of the server has a relatively short lifecycle. After a global transaction ends, the data is cleared. This is not friendly to the new algorithm, as it does not provide enough time for convergence. However, there is already a pull request (PR) for delayed deletion in the review process, and with this PR, the effect will be much better. For example, periodic weekly cleanup allows sufficient time for the algorithm to converge in the early stages, and for most of the time, the database can benefit from it. At the time of cleanup, the worst-case result is that the table is cleared, and the algorithm starts from scratch.

If you wish to apply the new algorithm to a business system, make sure to ensure that the algorithm has time to converge. For example, for user tables or similar, where data is intended to be stored for a long time, the algorithm can naturally converge. Alternatively, implement a mechanism for delayed deletion, providing enough time for the algorithm to converge.

If you have better opinions and suggestions, feel free to contact the Seata community!

Analysis of Seata's Distributed UUID Generator Based on Improved Snowflake Algorithm

May 8, 2021 · 6 min read

selfishlover

Seata incorporates a distributed UUID generator to assist in generating global transaction IDs and branch transaction IDs. The desired characteristics for this generator include high performance, global uniqueness, and trend incrementation.

High performance is self-explanatory, and global uniqueness is crucial to prevent confusion between different global transactions or branch transactions. Additionally, trend incrementation is valuable for users employing databases as the storage tool for TC clusters, as it can reduce the frequency of data page splits, thereby minimizing database IO pressure (the branch_table table uses the branch transaction ID as the primary key).

In the older version of Seata (prior to 1.4), the implementation of this generator was based on the standard version of the Snowflake algorithm. The standard Snowflake algorithm has been well-documented online, so we won't delve into it here. If you're unfamiliar with it, consider referring to existing resources before continuing with this article.

Here, we discuss some drawbacks of the standard Snowflake algorithm:

Clock Sensitivity: Since ID generation is always tied to the current operating system's timestamp (leveraging the monotonicity of time), a clock rollback may result in repeated IDs. Seata's strategy to handle this is by recording the last timestamp and rejecting service if the current timestamp is less than the recorded value (indicating a clock rollback). The service waits until the timestamp catches up with the recorded value. However, this means the TC will be in an unavailable state during this period.
Burst Performance Limit: The standard Snowflake algorithm claims a high QPS, approximately 4 million/s. However, strictly speaking, this is a bit misleading. The timestamp unit of the algorithm is milliseconds, and the bit length allocated to the sequence number is 12, allowing for 4096 sequence spaces per millisecond. So, a more accurate description would be 4096/ms. The distinction between 4 million/s and 4096/ms lies in the fact that the former doesn't require every millisecond's concurrency to be below 4096. Seata also adheres to this limitation. If the sequence space for the current timestamp is exhausted, it will spin-wait for the next timestamp.

In newer versions (1.4 and beyond), the generator has undergone optimizations and improvements to address these issues effectively. The core idea of the improvement is to decouple from the operating system's timestamp, with the generator obtaining the system's current timestamp only during initialization as the initial timestamp. Subsequently, it no longer synchronizes with the system timestamp. The incrementation is solely driven by the incrementation of the sequence number. For example, when the sequence number reaches its maximum value (4095), the next request causes an overflow of the 12-bit space. The sequence number resets to zero, and the overflow carry is added to the timestamp, incrementing it by 1. Thus, the timestamp and sequence number can be considered as a single entity. In practice, we adjusted the bit allocation strategy for the 64-bit ID, swapping the positions of the timestamp and node ID for easier handling of this overflow carry:

Original Bit Allocation Strategy:

Modified Bit Allocation Strategy (swapping timestamp and node ID):

This arrangement allows the timestamp and sequence number to be contiguous in memory, making it easy to use an AtomicLong to simultaneously store them.

/**
 * timestamp and sequence mix in one Long
 * highest 11 bit: not used
 * middle  41 bit: timestamp
 * lowest  12 bit: sequence
 */
private AtomicLong timestampAndSequence;

The highest 11 bits can be determined during initialization and remain unchanged thereafter:

/**
 * business meaning: machine ID (0 ~ 1023)
 * actual layout in memory:
 * highest 1 bit: 0
 * middle 10 bit: workerId
 * lowest 53 bit: all 0
 */
private long workerId;

Producing an ID is then straightforward：

public long nextId() {
   // Obtain the incremented timestamp and sequence number
   long next = timestampAndSequence.incrementAndGet();
   // Extract the lowest 53 bits
   long timestampWithSequence = next & timestampAndSequenceMask;
   // Perform a bitwise OR operation with the previously saved top 11 bits
   return workerId | timestampWithSequence;
}

At this point, we can observe the following:

The generator no longer has a burst performance limit of 4096/ms. If the sequence number space for a timestamp is exhausted, it will directly advance to the next timestamp, "borrowing" the sequence number space of the next timestamp (there is no need to worry about serious consequences of this "advance consumption," as the reasons will be explained below).
The generator has a weak dependency on the operating system clock. During runtime, the generator is not affected by clock backtracking (whether it is manually backtracked or due to machine clock drift) because the generator only fetches the system clock once at startup, and thereafter, they no longer stay synchronized. The only possible scenario for duplicate IDs is a significant clock backtracking during restart (either deliberate human backtracking or modification of the operating system time zone, such as changing Beijing time to London time~ Machine clock drift is typically in the millisecond range and won't have such a large impact).
Will continuous "advance consumption" cause the generator's timestamps to be significantly ahead of the system timestamps, resulting in ID duplicates upon restart? In theory, yes, but practically almost impossible. To achieve this effect, it would mean that the generator's QPS received must be consistently stable at over 400w/s~ To be honest, even TC can't handle such high traffic, so, the bottleneck is definitely not in the generator.

In addition, we also adjusted the strategy for generating node IDs. In the original version, when the user did not manually specify a node ID, it would take the low 10 bits of the local IPv4 address as the node ID. In practical production, it was found that there were occasional occurrences of duplicate node IDs (mostly users deploying with k8s). For example, the following IPs would result in duplicates:

192.168.4.10
192.168.8.10

Meaning, as long as the low 2 bits of the fourth byte and the third byte of the IP are the same, duplicates would occur. The new version's strategy is to prioritize taking the low 10 bits from the MAC address of the local network card. If the local machine does not have a valid network card configuration, it randomly picks one from [0, 1023] as the node ID. After this adjustment, it seems that new version users are no longer reporting the same issue (of course, it remains to be tested over time, but in any case, it won't be worse than the IP extraction strategy).

The above is a brief analysis of Seata's distributed UUID generator. If you find this generator useful, you can directly use it in your project. Its class declaration is public, and the full class name is: io.seata.common.util.IdWorker

Of course, if you have better ideas, you are also welcome to discuss them with the Seata community.

Seata New Feature Support -- Undo_Log Compression

May 7, 2021 · 4 min read

chd

Current Situation & Pain Points

For Seata, it records the before and after data of DML operations to perform possible rollback operations, and stores this data in a blob field in the database. For batch operations such as insert, update, delete, etc., the number of affected rows may be significant, concatenated into a large field inserted into the database, which may lead to the following issues:

Exceeding the maximum write limit for a single database operation (such as the max_allowed_package parameter in MySQL).
Significant network IO and database disk IO overhead due to a large amount of data.

Brainstorming

For the first issue, the max_allowed_package parameter limit can be increased based on the actual situation of the business to avoid the "query is too large" problem. For the second issue, increasing bandwidth and using high-performance SSD as the database storage medium can help.

The above solutions involve external or costly measures. Is there a framework-level solution to address the pain points mentioned above?

Considering the root cause of the pain points mentioned above, the problem lies in the generation of excessively large data fields. Therefore, if the corresponding data can be compressed at the business level before data transmission and storage, theoretically, it can solve the problems mentioned above.

Feasibility Analysis

Combining the brainstorming above, in practical development, when large batch operations are required, they are often scheduled during periods of relatively low user activity and low concurrency. At such times, CPU and memory resources can be relatively more utilized to quickly complete the corresponding operations. Therefore, by consuming CPU and memory resources to compress rollback data, the size of data transmission and storage can be reduced.

At this point, two things need to be demonstrated:

After compression, it can reduce the pressure on network IO and database disk IO. This can be measured by the total time taken for data compression + storage in the database.
After compression, the efficiency of compression compared to the original data size. This can be measured by the data size before and after compression.

Testing the time spent on compressing network usage:

Compression Ratio Test:

The test results clearly indicate that using gzip or zip compression can significantly reduce the pressure on the database and network transmission. At the same time, it can substantially decrease the size of the stored data.

Implementation

Implementation Approach

Compression

Partial Code

# Whether to enable undo_log compression, default is true
seata.client.undo.compress.enable=true

# Compressor type, default is zip, generally recommended to be zip
seata.client.undo.compress.type=zip

# Compression threshold for enabling compression, default is 64k
seata.client.undo.compress.threshold=64k

Determining Whether the Undo_Log Compression Feature is Enabled and if the Compression Threshold is Reached

protected boolean needCompress(byte[] undoLogContent) {
// 1. Check whether undo_log compression is enabled (1.4.2 Enabled by Default).
// 2. Check whether the compression threshold has been reached (64k by default).
// If both return requirements are met, the corresponding undoLogContent is compressed
    return ROLLBACK_INFO_COMPRESS_ENABLE 
        && undoLogContent.length > ROLLBACK_INFO_COMPRESS_THRESHOLD;
}

Initiating Compression for Undo_Log After Determining the Need

// If you need to compress, compress undo_log
if (needCompress(undoLogContent)) {
    // Gets the compression type, default zip
    compressorType = ROLLBACK_INFO_COMPRESS_TYPE;
    //Get the corresponding compressor and compress it
    undoLogContent = CompressorFactory.getCompressor(compressorType.getCode()).compress(undoLogContent);
}
// else does not need to compress and does not need to do anything

Save the compression type synchronously to the database for use when rolling back:

protected String buildContext(String serializer, CompressorType compressorType) {
    Map<String, String> map = new HashMap<>();
    map.put(UndoLogConstants.SERIALIZER_KEY, serializer);
    // Save the compression type to the database
    map.put(UndoLogConstants.COMPRESSOR_TYPE_KEY, compressorType.name());
    return CollectionUtils.encodeMap(map);
}

Decompress the corresponding information when rolling back:

protected byte[] getRollbackInfo(ResultSet rs) throws SQLException  {
    // Gets a byte array of rollback information saved to the database
    byte[] rollbackInfo = rs.getBytes(ClientTableColumnsName.UNDO_LOG_ROLLBACK_INFO);
    // Gets the compression type
    // getOrDefault uses the default value CompressorType.NONE to directly upgrade 1.4.2+ to compatible versions earlier than 1.4.2
    String rollbackInfoContext = rs.getString(ClientTableColumnsName.UNDO_LOG_CONTEXT);
    Map<String, String> context = CollectionUtils.decodeMap(rollbackInfoContext);
    CompressorType compressorType = CompressorType.getByName(context.getOrDefault(UndoLogConstants.COMPRESSOR_TYPE_KEY,
    CompressorType.NONE.name()));
    // Get the corresponding compressor and uncompress it
    return CompressorFactory.getCompressor(compressorType.getCode())
        .decompress(rollbackInfo);
}

peroration

By compressing undo_log, Seata can further improve its performance when processing large amounts of data at the framework level. At the same time, it also provides the corresponding switch and relatively reasonable default value, which is convenient for users to use out of the box, but also convenient for users to adjust according to actual needs, so that the corresponding function is more suitable for the actual use scenario.

Seata Deadlock Issue Caused by ConcurrentHashMap

March 13, 2021 · 0 min read

xiaoyong.luo

Seata Application-Side Startup Process Analysis — Registry and Configuration Module

March 4, 2021 · 0 min read

booogu

Seata Application-Side Startup Process Analysis — How RM & TM Connect with TC

February 28, 2021 · 0 min read

booogu

Integration of Spring Cloud with Seata for Distributed Transaction - TCC Mode

January 23, 2021 · 6 min read

gongxing(zhijian.tan)

This article will introduce how to integrate Seata (1.4.0) with Spring Cloud and Feign using the TCC mode. In practice, Seata's AT mode can meet about 80% of our distributed transaction needs. However, when dealing with operations on databases and middleware (such as Redis) that do not support transactions, or when using databases that are not currently supported by the AT mode (currently AT supports MySQL, Oracle, and PostgreSQL), cross-company service invocations, cross-language application invocations, or the need for manual control of the entire two-phase commit process, we need to combine the TCC mode. Moreover, the TCC mode also supports mixed usage with the AT mode.

一、The concept of TCC mode

In Seata, a distributed global transaction follows a two-phase commit model with a Try-[Confirm/Cancel] pattern. Both the AT (Automatic Transaction) mode and the TCC (Try-Confirm-Cancel) mode in Seata are implementations of the two-phase commit. The main differences between them are as follows:

AT mode is based on relational databases that support local ACID transactions (currently supporting MySQL, Oracle, and PostgreSQL):

The first phase, prepare: In the local transaction, it combines the submission of business data updates and the recording of corresponding rollback logs. The second phase, commit: It immediately completes successfully and automatically asynchronously cleans up the rollback logs. The second phase, rollback: It automatically generates compensation operations through the rollback logs to complete data rollback.

On the other hand, TCC mode does not rely on transaction support from underlying data resources:

The first phase, prepare: It calls a custom-defined prepare logic. The second phase, commit: It calls a custom-defined commit logic. The second phase, rollback: It calls a custom-defined rollback logic.

TCC mode refers to the ability to include custom-defined branch transactions in the management of global transactions.

In summary, Seata's TCC mode is a manual implementation of the AT mode that allows you to define the processing logic for the two phases without relying on the undo_log used in the AT mode.

二、prepare

regist center nacos
seata server(TC）

三、Building TM and TCC-RM

This chapter focuses on the implementation of TCC using Spring Cloud + Feign. For the project setup, please refer to the source code (this project provides demos for both AT mode and TCC mode).

DEMO

3.1 build seata server

build server doc

3.2 build TM

service-tm

3.3 build RM-TCC

3.3.1 Defining TCC Interface

Since we are using Spring Cloud + Feign, which relies on HTTP for communication, we can use @LocalTCC here. It is important to note that @LocalTCC must be annotated on the interface. This interface can be a regular business interface as long as it implements the corresponding methods for the two-phase commit in TCC. The TCC-related annotations are as follows:

@LocalTCC: Used for TCC in the Spring Cloud + Feign mode.
@TwoPhaseBusinessAction: Annotates the try method. The name attribute represents the bean name of the current TCC method, which can be the method name (globally unique). The commitMethod attribute points to the commit method, and the rollbackMethod attribute points to the transaction rollback method. After specifying these three methods, Seata will automatically invoke the commit or rollback method based on the success or failure of the global transaction.
@BusinessActionContextParameter: Annotates the parameters to be passed to the second phase (commitMethod/rollbackMethod) methods.
BusinessActionContext: Represents the TCC transaction context.

Here is an example:

/**
 * Here we define the TCC interface.
 * It must be defined on the interface.
 * We are using Spring Cloud for remote invocation.
 * Therefore, we can use LocalTCC here.
 *
 */
@LocalTCC
public interface TccService {
 
    /**
     * Define the two-phase commit.
     * name = The bean name of this TCC, globally unique.
     * commitMethod = The method for the second phase confirmation.
     * rollbackMethod = The method for the second phase cancellation.
     * Use the BusinessActionContextParameter annotation to pass parameters to the second phase.
     *
     * @param params  
     * @return String
     */
    @TwoPhaseBusinessAction(name = "insert", commitMethod = "commitTcc", rollbackMethod = "cancel")
    String insert(
            @BusinessActionContextParameter(paramName = "params") Map<String, String> params
    );
 
    /**
     *  The confirmation method can be named differently, but it must be consistent with the commitMethod.
     *  The context can be used to pass the parameters from the try method.
     * @param context 
     * @return boolean
     */
    boolean commitTcc(BusinessActionContext context);
 
    /**
     * two phase cancel
     *
     * @param context 
     * @return boolean
     */
    boolean cancel(BusinessActionContext context);
}

3.3.2 Business Implementation of TCC Interface

To keep the code concise, we will combine the routing layer with the business layer for explanation here. However, in actual projects, this may not be the case.

Using @Transactional in the try method allows for direct rollback of operations in relational databases through Spring transactions. The rollback of operations in non-relational databases or other middleware can be handled in the rollbackMethod.
By using context.getActionContext("params"), you can retrieve the parameters defined in the try phase and perform business rollback operations on these parameters in the second phase.
Note 1: It is not advisable to catch exceptions here (similarly, handle exceptions with aspects), as doing so would cause TCC to recognize the operation as successful, and the second phase would directly execute the commitMethod.
Note 2: In TCC mode, it is the responsibility of the developer to ensure idempotence and transaction suspension prevention.

@Slf4j
@RestController
public class TccServiceImpl implements  TccService {
 
    @Autowired
    TccDAO tccDAO;
 
    /**
     * tcc t（try）method
     * Choose the actual business execution logic or resource reservation logic based on the actual business scenario.
     *
     * @param params - name
     * @return String
     */
    @Override
    @PostMapping("/tcc-insert")
    @Transactional(rollbackFor = Exception.class, propagation = Propagation.REQUIRED)
    public String insert(@RequestBody Map<String, String> params) {
        log.info("xid = " + RootContext.getXID());
        //todo Perform actual operations or operations on MQ, Redis, etc.
        tccDAO.insert(params);
        //Remove the following annotations to throw an exception
        //throw new RuntimeException("服务tcc测试回滚");
        return "success";
    }
 
    /**
     * TCC service confirm method
     * If resource reservation is used in the first phase, the reserved resources should be committed during the second phase confirmation
     * @param context 
     * @return boolean
     */
    @Override
    public boolean commitTcc(BusinessActionContext context) {
        log.info("xid = " + context.getXid() + "提交成功");
        //todo If resource reservation is used in the first phase, resources should be committed here.
        return true;
    }
 
    /**
     * tcc  cancel method
     *
     * @param context 
     * @return boolean
     */
    @Override
    public boolean cancel(BusinessActionContext context) {
        //todo Here, write the rollback operations for middleware or non-relational databases.
        System.out.println("please manually rollback this data:" + context.getActionContext("params"));
        return true;
    }
}

3.3.3 Starting a Global Transaction in TM and Invoking RM-TCC Interface

Please refer to the project source code in section 3.2.

With this, the integration of TCC mode with Spring Cloud is complete.

Analysis of Seata Configuration Management Principles

January 10, 2021 · 0 min read

xiaoyong.luo

What is TCC

Seata TCC mode

How Seata Implements TCC Mode

Resource Parsing​

Resource Management​

Transaction Processing​

How to control exceptions

How to handle empty rollback​

How to Handle Idempotent Operations​

How to Handle Suspend​

Author Introduction

Understanding Seata Transaction Isolation Levels

Implementation of Global Locks

1. Execution Process​

2. Submission Process​

GlobalLock Annotation Explanation​

How to Prevent Dirty Writes

How to Prevent Dirty Reads

Author Bio:

Further Considerations​

Current Situation & Pain Points​

Brainstorming​

Feasibility Analysis​

Compression Ratio Test:​

Implementation​

Implementation Approach​

Partial Code​

peroration​

一、The concept of TCC mode

二、prepare

三、Building TM and TCC-RM

3.1 build seata server​

3.2 build TM​

3.3 build RM-TCC​

3.3.1 Defining TCC Interface​

3.3.2 Business Implementation of TCC Interface​

3.3.3 Starting a Global Transaction in TM and Invoking RM-TCC Interface​

Resource Parsing

Resource Management

Transaction Processing

How to handle empty rollback

How to Handle Idempotent Operations

How to Handle Suspend

1. Execution Process

2. Submission Process

GlobalLock Annotation Explanation

Further Considerations

Current Situation & Pain Points

Brainstorming

Feasibility Analysis

Compression Ratio Test:

Implementation

Implementation Approach

Partial Code

peroration

3.1 build seata server

3.2 build TM

3.3 build RM-TCC

3.3.1 Defining TCC Interface

3.3.2 Business Implementation of TCC Interface

3.3.3 Starting a Global Transaction in TM and Invoking RM-TCC Interface