Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytewax materialization engine fails when loading feature_store.yaml #3893

Closed
gterziysky opened this issue Jan 18, 2024 · 0 comments · Fixed by #3912
Closed

Bytewax materialization engine fails when loading feature_store.yaml #3893

gterziysky opened this issue Jan 18, 2024 · 0 comments · Fixed by #3912

Comments

@gterziysky
Copy link

gterziysky commented Jan 18, 2024

Expected Behavior

Loading the feature_store.yaml file from within a Bytewax pod should work.

Current Behavior

yaml.safe_load() raises an error while trying to reconstruct the object below:

  • pathlib.PosixPath

The error occurs while running materialization using Bytewax at the point where the feature_store.yaml is loaded. The code where this happens is in sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py. Below is an excerpt:

# ...
 with open("/var/feast/feature_store.yaml") as f:
        feast_config = yaml.safe_load(f) # <---- yaml.safe_load() fails
# ...

The exact message is as below:

Defaulted container "process" out of: process, init-hostfile (init)
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Traceback (most recent call last):
  File "/bytewax/dataflow.py", line 15, in <module>
    feast_config = yaml.safe_load(f)
  File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
  File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:pathlib.PosixPath'
  in "/var/feast/feature_store.yaml", line 119, column 12

Interestingly, method _create_configuration_map() of class BytewaxMaterializationEngine uses yaml.dump() instead of yaml.safe_dump() to write the config in the first place:

    # ...
    def _create_configuration_map(self, job_id, paths, feature_view, namespace):
        """Create a Kubernetes configmap for this job"""

        feature_store_configuration = yaml.dump(self.repo_config.dict())
    # ...

When I tried to replace yaml.dump by yaml.safe_dump() I got the following error:

yaml.representer.RepresenterError: ('cannot represent an object', <RedisType.redis: 'redis'>)

It appears that yaml.SafeDumper and yaml.SafeLoader cannot find the appropriate representers and/or constructors for RedisType.redis and path.PosixPath. Perhaps those objects do not have corresponding to_yaml() and from_yaml() methods.

Steps to reproduce

Run the materialization:

feast materialize  --views "EXAMPLE_FEATURE_VIEW" '2023-10-30T00:00:00' '2023-10-30T23:59:59'

Give it some time and check the pods:

kubectl get pods -n bytewax
NAME                                                    READY   STATUS   RESTARTS   AGE
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-0-9kxgt   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-1-d8n4r   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-2-wmmsd   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-3-c8gn7   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbn   0/1     Error    0          25s

Then upon inspecting the logs, I see the error from above:

kubectl logs -n bytewax dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbn

Specifications

Possible Solution

I was able to make it work by modifying sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py to use yaml.load() instead of yaml.safe_load() and rebuilding the Bytewax docker image:

    with open("/var/feast/feature_store.yaml") as f:
        #feast_config = yaml.safe_load(f)
        feast_config = yaml.load(f, Loader=yaml.Loader)

        with open("/var/feast/bytewax_materialization_config.yaml") as b:
            # I did not test if yaml.safe_load() works for the bytewax config, but just went ahead and replaced it too 
            #bytewax_config = yaml.safe_load(b)
            bytewax_config = yaml.load(b, Loader=yaml.Loader)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant