You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@benas@mminella
When we enabled the faultTolerant() StepBuilder we noticed that the spring_batch_chunk_write_seconds_count stopped showing in Grafana.
Looking at the source code, we saw that FaultTolerantChunkProcessor indeed don collect this metrics.
SimpleChunkProcessor.java
protectedvoidwrite(StepContributioncontribution, Chunk<I> inputs, Chunk<O> outputs) throwsException {
Timer.Samplesample = BatchMetrics.createTimerSample();
Stringstatus = BatchMetrics.STATUS_SUCCESS;
try {
doWrite(outputs.getItems());
}
catch (Exceptione) {
/* * For a simple chunk processor (no fault tolerance) we are done * here, so prevent any more processing of these inputs. */inputs.clear();
status = BatchMetrics.STATUS_FAILURE;
throwe;
}
finally {
stopTimer(sample, contribution.getStepExecution(), "chunk.write", status, "Chunk writing");
}
contribution.incrementWriteCount(outputs.size());
}
FaultTolerantChunkProcessor.java
protectedvoidwrite(finalStepContributioncontribution, finalChunk<I> inputs, finalChunk<O> outputs)
throwsException {
@SuppressWarnings("unchecked")
finalUserData<O> data = (UserData<O>) inputs.getUserData();
finalAtomicReference<RetryContext> contextHolder = newAtomicReference<>();
RetryCallback<Object, Exception> retryCallback = newRetryCallback<Object, Exception>() {
@OverridepublicObjectdoWithRetry(RetryContextcontext) throwsException {
contextHolder.set(context);
if (!data.scanning()) {
chunkMonitor.setChunkSize(inputs.size());
try {
doWrite(outputs.getItems());
}
catch (Exceptione) {
if (rollbackClassifier.classify(e)) {
throwe;
}
/* * If the exception is marked as no-rollback, we need to * override that, otherwise there's no way to write the * rest of the chunk or to honour the skip listener * contract. */thrownewForceRollbackForWriteSkipException(
"Force rollback on skippable exception so that skipped item can be located.", e);
}
contribution.incrementWriteCount(outputs.size());
}
else {
scan(contribution, inputs, outputs, chunkMonitor, false);
}
returnnull;
}
};
if (!buffering) {
RecoveryCallback<Object> batchRecoveryCallback = newRecoveryCallback<Object>() {
@OverridepublicObjectrecover(RetryContextcontext) throwsException {
Throwablee = context.getLastThrowable();
if (outputs.size() > 1 && !rollbackClassifier.classify(e)) {
thrownewRetryException("Invalid retry state during write caused by "
+ "exception that does not classify for rollback: ", e);
}
Chunk<I>.ChunkIteratorinputIterator = inputs.iterator();
for (Chunk<O>.ChunkIteratoroutputIterator = outputs.iterator(); outputIterator.hasNext();) {
inputIterator.next();
outputIterator.next();
checkSkipPolicy(inputIterator, outputIterator, e, contribution, true);
if (!rollbackClassifier.classify(e)) {
thrownewRetryException(
"Invalid retry state during recovery caused by exception that does not classify for rollback: ",
e);
}
}
returnnull;
}
};
batchRetryTemplate.execute(retryCallback, batchRecoveryCallback,
BatchRetryTemplate.createState(getInputKeys(inputs), rollbackClassifier));
}
else {
RecoveryCallback<Object> recoveryCallback = newRecoveryCallback<Object>() {
@OverridepublicObjectrecover(RetryContextcontext) throwsException {
/* * If the last exception was not skippable we don't need to * do any scanning. We can just bomb out with a retry * exhausted. */if (!shouldSkip(itemWriteSkipPolicy, context.getLastThrowable(), -1)) {
thrownewExhaustedRetryException(
"Retry exhausted after last attempt in recovery path, but exception is not skippable.",
context.getLastThrowable());
}
inputs.setBusy(true);
data.scanning(true);
scan(contribution, inputs, outputs, chunkMonitor, true);
returnnull;
}
};
if (logger.isDebugEnabled()) {
logger.debug("Attempting to write: " + inputs);
}
try {
batchRetryTemplate.execute(retryCallback, recoveryCallback, newDefaultRetryState(inputs,
rollbackClassifier));
}
catch (Exceptione) {
RetryContextcontext = contextHolder.get();
if (!batchRetryTemplate.canRetry(context)) {
/* * BATCH-1761: we need advance warning of the scan about to * start in the next transaction, so we can change the * processing behaviour. */data.scanning(true);
}
throwe;
}
}
callSkipListeners(inputs, outputs);
}
The text was updated successfully, but these errors were encountered:
Before this commit, metrics were not collected in a fault-tolerant step.
This commit updates the FaultTolerantChunkProcessor to collect metrics.
For the record, chunk scanning is not covered for two reasons:
1. When scanning a chunk, there is a single item in each write operation,
so it would be incorrect to report a metric called "chunk.write" for a
single item. We could argue that it is a singleton chunk, but still..
If we want to time scanned (aka individual) items, we need a more fine
grained timer called "scanned.item.write" for example.
2. The end result can be confusing and might distort the overall metrics
view in case of errors (because of the noisy metrics of additional transactions
for individual items).
As a reminder, the goal of the "chunk.write" metric is to give an overview
of the write operation time of the whole chunk and not to time each item
individually (this could be done using an `ItemWriteListener` if needed).
Resolves#3664
Before this commit, metrics were not collected in a fault-tolerant step.
This commit updates the FaultTolerantChunkProcessor to collect metrics.
For the record, chunk scanning is not covered for two reasons:
1. When scanning a chunk, there is a single item in each write operation,
so it would be incorrect to report a metric called "chunk.write" for a
single item. We could argue that it is a singleton chunk, but still..
If we want to time scanned (aka individual) items, we need a more fine
grained timer called "scanned.item.write" for example.
2. The end result can be confusing and might distort the overall metrics
view in case of errors (because of the noisy metrics of additional transactions
for individual items).
As a reminder, the goal of the "chunk.write" metric is to give an overview
of the write operation time of the whole chunk and not to time each item
individually (this could be done using an `ItemWriteListener` if needed).
Resolvesspring-projects#3664
@benas @mminella
When we enabled the faultTolerant() StepBuilder we noticed that the spring_batch_chunk_write_seconds_count stopped showing in Grafana.
Looking at the source code, we saw that FaultTolerantChunkProcessor indeed don collect this metrics.
SimpleChunkProcessor.java
FaultTolerantChunkProcessor.java
The text was updated successfully, but these errors were encountered: