maj
oritywrite concern to roll back. Our main goal is to modify the election protocol to make TokuMX a true CP system. That is, in the face of network partitions, TokuMX will remain consistent. To do so means ensuring that any write that is successfully acknowledged with majority write concern is never lost in the face of a network partition. This is not currently the case for TokuMX and MongoDB. The secondary issue that Tokutek draws attention to is one of availability. In the accompanying tech report Zardosht and coauthor Leif Walsh explain that it is possible for a MongoDB replica set to be unavailable for 30 seconds or more during failover. MongoDB’s election protocol requires that a member may not vote “yes” in more than one election in any 30-second period. … [T]his 30 second threshold can be problematic in practice, especially if an election fails: this necessarily makes the set unavailable for at least 30 seconds, maybe more if successive elections fail. Ark addresses these flaws by exploiting the structure of the TokutekDB global transaction identifier (GTID). The GTID consists of a pair of 64-bit integers, (term, opid), where opid is incremented each time an operation commits on the primary, and the term is incremented each time a new primary is elected, and at this point the opid is reset to 0. The term in the GTID serves the same purpose as the term concept in the Raft protocol and that similarity allows Ark to employ many of the same solutions that Raft uses to provide its strong consistency guarantees. While Ark is an implementation of a consensus protocol that works in a real database system, it is also evidence of the flexibility in the Raft consensus algorithm. It was relatively straightforward to tweak Raft in safe ways to make it fit the MongoDB architecture and programming model, and we think this is an important feature of Raft. There is an Ark development branch available and Tokutek is actively soliciting feedback on both the design and the implementation.