Last Updated 12 July 2008.
1.6 Goals:
Check trunk regularly for updates. For maven users (of which ehcache is one) a SNAPSOT repository has been added. See http://ehcache.sf.net/snapshotrepository
A common scenario is where each cache in a cluster replicates to others. We usually want each cache to be coherent.
Following is a discussion of a potential design for both synchronous and asynchronous:
If an Exception is thrown during replication to any peer, then the caches have become incoherent.
The effect of the operation is not available in any cache until the transaction has been committed. This is done with a commit(messageId) message from the originating cache sent to each cache peer. If an Exception occurs during the operation, the commit message is not sent. On the originating cache a rollback occurs. This is not propagated to the peers. Instead, if the commit does not come within the TRANSACTION_TIMEOUT period, the operation is discarded in each cache.
If a commit fails on one cache then data can still be inconsistent. This is the classic problem with 2-phase commit transaction systems. You always need an extra phase to be sure the previous phase did not fail. (See Page 267 of Developing Enterprise Web Services, by Sandeep Chatterjee and James Webber). The originating cache commits last.
If one or more commits fail, then the caches will be incoherent. However we will throw an IncoherentTransactionException.
This then provides the option to the caller to reattempt the operation. Given that cache operations are simple and idempotent (other than the lastUpdated and version statistics), this should be a practical option.
The number of messages required is doubled, therefore enabling transactions will halve synchronous replication speed.
This works the same as with synchronous, except the caller is not informed if the transaction failed.
The originating cache records the async notifications as they occur. Once everyone has gotten the operation it is then in a position to commit, and the transaction is marked committable. If a failure occurs no commit message is sent and the transaction is placed in the failed transactions list.
Instead failed transactions are placed in a list which can be accessed using cache.getFailedTransactions() . The list contains information on the transaction id, the cache operation, and the key used. The data is not retained. By default 10,000 failures may be retained. Eviction is on a FIFO basis.
This enables the caller to periodically monitor for failed transactions and reattempt. The most probable cause of a failure is that peer has dropped off the network (or the network has failed) in which case redoing the transaction will ensure the remaining cache peers are coherent.
Because asynchronous replication uses batches in a single RMI call, all that will happen is that there will be a small extra commit message for each other message. Performance should only be mildy degraded.
Now looking to finalise JCache with ehcache as the RI. This will probably be maintained in a branch until the JSR process is concluded. The net.sf.jsr107cache will be updated in the meantime.
1.3.0-beta contained a couple of tweaks to allow programmatic replacement of the DiskStore with a separate implementation. Hopefully a code donation will provide a JDBCDiskStore implementation which will allow Derby and other databases to be plugged in. This supplements the built-in DiskStore.
Ehcache supports LRU, LFU and FIFO. There are plenty more eviction policies. An extension mechanism should be added to support these.
A MemoryStore built using SoftReferences. This would enable potentially much large MemoryStores with no risk of OutOfMemory errors. SoftReferences are already used in ehcache for the asynchronous replication spool.
Code has been donated for a Derby DiskStore implementation.
Please add your feature suggestions to Feature Requests .