Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC error on get_online_features (NOAUTH authentication required) #2285

Closed
andrijaperovic opened this issue Feb 5, 2022 · 6 comments
Closed

Comments

@andrijaperovic
Copy link

Expected Behavior

get_online_features should return the correct result:

features = client.get_online_features(
    feature_refs=["driver_statistics:avg_daily_trips"],
    entity_rows=entities_with_timestamp[["driver_id"]].T.to_dict().values()).to_dict()

print("\nFeatures Output:\n")
print(pd.DataFrame.from_dict(features))

Current Behavior

Getting a gRPC error when calling get_online_features which points to an issue in feast-online-serving.
Materialization features is already completed and redis keys are present in redis store (offline_to_online_ingestion).
Calling get_online_features works initially after offline_to_online_ingestion, but not predictably in a standalone script.

Error:

details = "Unexpected error when pulling data from from Redis."
debug_error_string = "{"created":"@1644025055.432554000","description":"Error received from peer ipv6:[::1]:6566","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Unexpected error when pulling data from from Redis.","grpc_status":2}"

Traceback (most recent call last):
File "test_ingestion_and_get_features.py", line 220, in <module>
features = client.get_online_features(
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/site-packages/feast/client.py", line 1022, in get_online_features
raise grpc.RpcError(e.details())
grpc.RpcError: Unexpected error when pulling data from from Redis.

Error in feast-online-serving pod logs:

2022-02-04 16:35:24.823 WARN feast-release-feast-online-serving-7f65995d4d-7mghv --- [lt-executor-203] f.s.c.ServingServiceGRpcController : Failed to get Online Features
io.grpc.StatusRuntimeException: UNKNOWN: Unexpected error when pulling data from from Redis.
at io.grpc.Status.asRuntimeException(Status.java:524)
at feast.storage.connectors.redis.retriever.OnlineRetriever.lambda$getFeaturesFromRedis$1(OnlineRetriever.java:114)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at feast.storage.connectors.redis.retriever.OnlineRetriever.getFeaturesFromRedis(OnlineRetriever.java:101)
at feast.storage.connectors.redis.retriever.OnlineRetriever.getOnlineFeatures(OnlineRetriever.java:56)
at feast.serving.service.OnlineServingServiceV2.getOnlineFeatures(OnlineServingServiceV2.java:126)
at feast.serving.controller.ServingServiceGRpcController.getOnlineFeaturesV2(ServingServiceGRpcController.java:96)
at feast.proto.serving.ServingServiceGrpc$MethodHandlers.invoke(ServingServiceGrpc.java:314)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.opentracing.contrib.grpc.TracingServerInterceptor$2.onHalfClose(TracingServerInterceptor.java:235)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:820)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: io.lettuce.core.RedisCommandExecutionException: NOAUTH Authentication required.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at feast.storage.connectors.redis.retriever.OnlineRetriever.lambda$getFeaturesFromRedis$1(OnlineRetriever.java:104)
... 29 more
Caused by: io.lettuce.core.RedisCommandExecutionException: NOAUTH Authentication required.
at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:108)
at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:120)
at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:59)
at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:654)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:614)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:565)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more

Steps to reproduce

Install helm chart on AKS v1.22:
https://github.com/Azure/feast-azure/blob/rijai/feastchart/cluster/setup/feast-0.9.5-helmchart/README.md

Update feast-onilne-serving deployment image to use latest image tag which supports auth (0.26.3):
feast-dev/feast-java-old#43

Set feast-release-feast-online-serving configmap to use external redis:

feast:
  core-host: "feast-release-feast-core""
  core-grpc-port: 6565
  active_store: online
  stores:
    - name; online
      type: REDIS
      config:
        host: "external-redis.net"
        port: 6380
        ssl: true
        password: "password"
      subscriptions:
        - name: "*"
          project: "*"
grpc:
  server:
    port: 6566
server:
  port: 18099

Define FeatureTable and run offline_to_online_ingestion, then in a separate script run get_online_features.

Specifications

  • Version: 0.9.5
  • Platform: Azure Kubernetes Service (v1.22.4)
  • Subsystem:
  • Redis version: 4.1.14

Possible Solution

Instead of relying on auto-discovery feature of lettuce.io redis client, use RESP2 protocol on lower versions of Redis.
redis/lettuce#1543 (comment)

Tried making a change in RedisClient.java:

  private RedisClient(StatefulRedisConnection<byte[], byte[]> connection) {
    this.asyncCommands = connection.async();

    // Disable auto-flushing
    this.asyncCommands.setAutoFlushCommands(false);
  }

  public static RedisClientAdapter create(RedisStoreConfig config) {

    RedisURI uri = RedisURI.create(config.getHost(), config.getPort());

    if (config.getSsl()) {
      uri.setSsl(true);
    }

    if (!config.getPassword().isEmpty()) {
      uri.setPassword(config.getPassword());
    }

    io.lettuce.core.RedisClient redisClient = io.lettuce.core.RedisClient.create(uri);
    redisClient.setOptions(
        io.lettuce.core.ClientOptions.builder()
            .protocolVersion(ProtocolVersion.RESP2)
            .build());

    StatefulRedisConnection<byte[], byte[]> connection = redisClient.connect(new ByteArrayCodec());

    return new RedisClient(connection);
  }

However I run into the following issue at runtime:

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [feast.serving.service.ServingServiceV2]: Factory method 'servingServiceV2' threw exception; nested exception is java.lang.NoClassDefFoundError: io/lettuce/core/protocol/ProtocolVersion
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185)
at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:650)
@andrijaperovic
Copy link
Author

andrijaperovic commented Feb 5, 2022

Also probably worth mentioning that I still get this error with Redis Version 6.0.
Initially get_online_features works fine but then fails after some time, it appears to work fine again after a restart of feast-release-feast-online-serving pod.

EDIT:
May be correlated with primary-secondary failover as I see two failovers in the cache:
redis/lettuce#338

@adchia
Copy link
Collaborator

adchia commented Feb 7, 2022

any reason you're using the older version of Feast? Is it because you want a Spark based ingestion?

@andrijaperovic
Copy link
Author

andrijaperovic commented Feb 7, 2022

any reason you're using the older version of Feast? Is it because you want a Spark based ingestion?

@adchia that's correct. Also the helm chart we are using to install Feast on aks (azure kubernetes service) relies on deprecated installation strategy, unless there is a recommended approach for installing on third party Kubernetes service providers (i.e. other than google and aws). We are also running Kubeflow on our cluster so would like feast to be accessible internally for retrieving the historical and online features.

@andrijaperovic
Copy link
Author

andrijaperovic commented Feb 8, 2022

Was able to make some headway by specifying REDIS_CLUSTER in the config and porting the changes from @xiaoyongzhu for auth (feast-dev/feast-java-old#43) in RedisClusterClient.java create method to append password to each RedisURI in redisURIList.
Then proceeded to simulate a failover event by doing a force reboot of the cluster primary, and seems like we encounter a different problem:

Caused by: java.util.concurrent.ExecutionException: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 500 millisecond(s)
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at feast.storage.connectors.redis.retriever.OnlineRetriever.lambda$getFeaturesFromRedis$1(OnlineRetriever.java:104)
	... 29 more

Not sure if the adaptive refresh is supposed to handle a new primary with new IP in this scenario.
++ @pyalex

EDIT:

Looks like feast-online-serving recovered eventually after a period of 15 minutes or so.
Turns out Lettuce does not handle dropped connections by default without configuring TCP-keepalive via netty epoll:
redis/lettuce#1428 (comment)

After configuring the TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT in the channel options and setting the timeout config option appropriately in the feast serving config properties get_online_features works much more predictable (still observing ~1 minute of downtime with StatusCode.DEADLINE_EXCEEDED error).

@andrijaperovic
Copy link
Author

@adchia
Copy link
Collaborator

adchia commented Feb 17, 2022

This was merged in. Mind adding a PR that ports the fix into the main repo too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants