Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout exception on IWorkflowService#ResetWorkflowExecution #562

Open
polyansky-syberry opened this issue Nov 5, 2020 · 12 comments
Open

Comments

@polyansky-syberry
Copy link

Code (modified samples):

    public static void main(String[] args) throws TException, IOException {
        IWorkflowService cadenceService = new WorkflowServiceTChannel(
            "127.0.0.1",
            7933,
            new WorkflowServiceTChannel.ClientOptions.Builder()
                .setRpcTimeout(1_000_000L)
                .setListArchivedWorkflowRpcTimeout(1_000_000_000L)
                .setQueryRpcTimeout(1_000_000_000L)
                .setRpcLongPollTimeout(1_000_000_000L)
                .build()
        );
            System.out.println("---------------------------------------------------------------");
            System.out.println("Run for " + 4);
            ResetWorkflowExecutionRequest request = new ResetWorkflowExecutionRequest();
            request.setWorkflowExecution(
                new WorkflowExecution()
                    .setWorkflowId("f5e392e2-20ed-4239-9633-65a352fbd202")
                    .setRunId("5115e281-f48b-4f51-a3de-f1b9880677a3")
            );
            request.setDomain("DOMAIN");
            request.setDecisionFinishEventId(4);
            try {
                cadenceService.ResetWorkflowExecution(request);
                System.out.println("Success");
            } catch (Exception e) {
                LoggerFactory.getLogger("Logger").error("Error", e);
            }

        System.exit(0);
    }

What I get:

09:06:20.822 [main] ERROR Logger - Error
org.apache.thrift.transport.TTransportException: timeout
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:546)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:519)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1597)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$25(WorkflowServiceTChannel.java:1586)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:569)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1585)
	at com.uber.cadence.samples.common.RegisterDomain.main(RegisterDomain.java:65)

Through CLI everything works.
Ahead of questions it is crucial for me to be capable of rerunning workflows programmatically to be able to do so under Spring.


It seems like cadence server stops the processing because timeout is not configured (in CLI we have --context_timeout option for that), but I mot sure it's true.

Can you help me with that?

@polyansky-syberry
Copy link
Author

Docker-compose

version: '3.2'
services:
  cassandra:
    image: cassandra:3.11
    restart: unless-stopped
    networks:
      - cross-comms
    volumes:
    - type: volume
      source: mycassandrastore
      target: /var/lib/cassandra
    ports:
      - "${CASSANDRA_PORT}:${CASSANDRA_PORT}"
  statsd:
    image: graphiteapp/graphite-statsd
    restart: unless-stopped
    networks:
      - cross-comms
    ports:
      - "8080:80"
      - "2003:2003"
      - "8125:8125"
      - "8126:8126"
  cadence:
    image: ubercadence/server:master-auto-setup
    restart: unless-stopped
    networks:
      - cross-comms
    ports:
      - "${CADENCE_PORT}:${CADENCE_PORT}"
      - "7934:7934"
      - "7935:7935"
      - "7939:7939"
    environment:
      - "CASSANDRA_SEEDS=cassandra"
      - "STATSD_ENDPOINT=statsd:8125"
      - "DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development.yaml"
      - "CADENCE_CONTEXT_TIMEOUT=600"
    depends_on:
      - cassandra
      - statsd
  cadence-web:
    image: ubercadence/web:latest
    restart: unless-stopped
    networks:
      - cross-comms
    environment:
      - "CADENCE_TCHANNEL_PEERS=cadence:${CADENCE_PORT}"
    ports:
      - "${CADENCE_WEB_PORT}:${CADENCE_WEB_PORT}"
    depends_on:
      - cadence
  cadence-cli-shell:
    image: crux-cadence-cli-shell:latest
    restart: unless-stopped
    networks:
     - cross-comms
    environment:
      - "CADENCE_HOST=cadence"
      - "CADENCE_PORT=${CADENCE_PORT}"
      - "CADENCE_DOMAIN=${CADENCE_DOMAIN}"
    depends_on:
      - cadence
    volumes:
      - cadencedata:/var/lib/cadencedata

volumes:
  mycassandrastore:
  cadencedata:

networks:
  cross-comms:

@polyansky-syberry
Copy link
Author

polyansky-syberry commented Nov 5, 2020

cadence --domain DOMAIN --address host.docker.internal:7933 workflow reset -w f5e392e2-20ed-4239-9633-65
a352fbd202 -r 5115e281-f48b-4f51-a3de-f1b9880677a3 --event_id 4 --reason "<Some string>"

Works fine

@polyansky-syberry
Copy link
Author

If I set event id = 5 then it returns this error:

19:29:50.332 [main] ERROR Logger - Error
org.apache.thrift.TException: Rpc error:<ErrorResponse id=5 errorType=UnexpectedError message=cadence internal error, msg: nDCStateRebuilder unable to rebuild mutable state to event ID: 4, version: -24>
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:548)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:519)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1597)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$25(WorkflowServiceTChannel.java:1586)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:569)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1585)
	at com.uber.cadence.samples.common.RegisterDomain.main(RegisterDomain.java:65)

@polyansky-syberry
Copy link
Author

image
This is screen with event types and their ids around 4 and 5

@polyansky-syberry
Copy link
Author

polyansky-syberry commented Nov 12, 2020

@sokada1221
@meiliang86
@mfateev
Guys, please, help me with that

@longquanzheng
Copy link
Collaborator

@polyansky-syberry sorry for late response. Are you able to address the issue finally? Basically reset is only allowed at DecisionTask boundary(DecisionTaskCompleted/failed/timeout events, in newer server versions, we also support scheduled/started)

@avitkovskaya-syberry
Copy link

@longquanzheng Hi! You mentioned that now it's possible to reset workflow execution from DecisionTaskScheduled event.
I have timeouted execution with such eventHistory:
image

I tried to reset execution from event 2 (used java-client version 3.6.1 and server v0.23.2 and v0.22.4).

    public String resetWorkflow() {
        var request = new ResetWorkflowExecutionRequest();
        var workflowExecution = new WorkflowExecution()
            .setRunId(runId)
            .setWorkflowId(workflowId);
        request.setWorkflowExecution(workflowExecution);
        request.setDecisionFinishEventId(2);
        request.setDomain(domain);

        try {
            return cadenceService.ResetWorkflowExecution(request).getRunId();
        } catch (TException ex) {
            throw new CadenceServiceException("Couldn't reset workflow execution", ex);
        }
    }

But it throws exception while resetting execution:

Caused by: com.uber.cadence.BadRequestError: nDCStateRebuilder unable to rebuild mutable state to event ID: 1, version: -24, baseLastEventID + baseLastEventVersion is not the same as the last event of the last batch, event ID: 2, version :-24 ,typically because of attemptting to rebuild to a middle of a batch
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38530) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38507) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result.read(WorkflowService.java:38406) ~[cadence-client-3.6.1.jar:na]
	at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:81) ~[libthrift-0.9.3.jar:0.9.3]
	at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:67) ~[libthrift-0.9.3.jar:0.9.3]
	at com.uber.tchannel.messages.ThriftSerializer.decodeBody(ThriftSerializer.java:101) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.tchannel.messages.Serializer.decodeBody(Serializer.java:49) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.tchannel.messages.EncodedResponse.getBody(EncodedResponse.java:85) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1490) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na]

If to try to reset from event 3 programmatically it throws exception:

Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=6 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID "">
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:345) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:316) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1488) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1476) ~[cadence-client-3.6.1.jar:na]

If to reset this execution via cli from event 3, it will be reset successfully.

cadence --domain WORKFLOWS_PRIMARY --address host.docker.internal:7933 workflow reset -w timeout_test_with_childWF.2
022-01-19T11:05:06Z -r 15b64382-28ef-4c03-8bfe-5be59ac4b390 --event_id 3 --reason "<Reset>"
{
  "runId": "2d6caf81-e780-4eed-a117-d167dd5d0c92"
}

But how can be reset such execution programmatically? Can the whole workflow be reset from the beginning?

@longquanzheng
Copy link
Collaborator

longquanzheng commented Feb 11, 2022 via email

@avitkovskaya-syberry
Copy link

Hi, @longquanzheng
If I have such events history in workflow run
image
I reset execution from event 3
and java client returns error
Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=7 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID "">
Is this a server side error? But via cli such workflow is resetted
how can such workflow can be resetted using java client?

@longquanzheng
Copy link
Collaborator

longquanzheng commented Mar 2, 2022 via email

@avitkovskaya-syberry
Copy link

avitkovskaya-syberry commented Mar 3, 2022

Hey, @longquanzheng
If reset from event 2 from java-client or cli it fails
Error: reset failed Error Details: BadRequestError{Message: nDCStateRebuilder unable to rebuild mutable state to event ID: 1, version: -24, baseLastEventID + baseLastEventVersion is not the same as the last event of the last batch, event ID: 2, version :-24 ,typicaly because of attemptting to rebuild to a middle of a batch} ('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)

@avitkovskaya-syberry
Copy link

@longquanzheng, hi! Can you pls provide info how to reset such executions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants