| com.sleepycat.je.cleaner.SR12885Test
SR12885Test | public class SR12885Test extends TestCase (Code) | | Reproduces a problem found in SR12885 where we failed to migrate a pending
LN if the slot was reused by an active transaction and that transaction was
later aborted.
This bug can manifest as a LogNotFoundException. However, there was another
bug that caused this bug to manifest sometimes as a NOTFOUND return value.
This secondary problem -- more sloppyness than a real bug -- was that the
PendingDeleted flag was not cleared during an abort. If the PendingDeleted
flag is set, the low level fetch method will return null rather than
throwing a LogFileNotFoundException. This caused a NOTFOUND in some cases.
The sequence that causes the bug is:
1) The cleaner processes a file containing LN-A (node A) for key X. Key X
is a non-deleted LN.
2) The cleaner sets the migrate flag on the BIN entry for LN-A.
3) In transaction T-1, LN-A is deleted and replaced by LN-B with key X,
reusing the same slot but assigning a new node ID. At this point both node
IDs (LN-A and LN-B) are locked.
4) The cleaner (via a checkpoint or eviction that logs the BIN) tries to
migrate LN-B, the current LN in the BIN, but finds it locked. It adds LN-B
to the pending LN list.
5) T-1 aborts, putting the LSN of LN-A back into the BIN slot.
6) In transaction T-2, LN-A is deleted and replaced by LN-C with key X,
reusing the same slot but assigning a new node ID. At this point both node
IDs (LN-A and LN-C) are locked.
7) The cleaner (via a checkpoint or wakeup) processes the pending LN-B. It
first gets a lock on node B, then does the tree lookup. It finds LN-C in
the tree, but it doesn't notice that it has a different node ID than the
node it locked.
8) The cleaner sees that LN-C is deleted, and therefore no migration is
necessary -- this is incorrect. It removes LN-B from the pending list,
allowing the cleaned file to be deleted.
9) T-2 aborts, putting the LSN of LN-A back into the BIN slot.
10) A fetch of key X will fail, since the file containing the LSN for LN-A
has been deleted. If we didn't clear the PendingDeleted flag, this will
cause a NOTFOUND error instead of a LogFileNotFoundException.
|
SR12885Test | public SR12885Test()(Code) | | |
|
|