Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dynamic quantization option for PyTorch models at upload #594

Open
joshdevins opened this issue Sep 7, 2023 · 2 comments
Open

Remove dynamic quantization option for PyTorch models at upload #594

joshdevins opened this issue Sep 7, 2023 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers topic:NLP Issue or PR about NLP model support and eland_import_hub_model

Comments

@joshdevins
Copy link
Member

Dynamic quantization of PyTorch models has proven to be a challenge for two reasons.

(1) Dynamic quantization ties the traced TorchScript model to a particular architecture and makes it non-portable. For example, tracing the model (by using the upload CLI) on an ARM-based M-series Apple processor will make it non-portable to an Intel CPU, and vice versa. Tracing a model in this way also means that any Intel-based optimisations cannot be used. The best practice is to trace the model on the same CPU architecture as the target inference processors. Adding in GPU support adds a further complexity and eland is currently not even capable of tracing with GPU (for now).

(2) "Blind" dynamic quantization at upload time could also be considered as an anti-pattern/not a best practice. Quantization can often damage the accuracy of a model and doing quantization blindly, without evaluating the model afterwards, can produce surprising results at inference.

For these reasons, we believe it is safest to remove dynamic quantization as an option. If users would like to use quantized models, they can do so in PyTorch or transformers directly, and upload their new model with eland's Python methods (as opposed to using the CLI).

@joshdevins joshdevins added bug Something isn't working good first issue Good for newcomers topic:NLP Issue or PR about NLP model support and eland_import_hub_model labels Sep 7, 2023
@davidkyle
Copy link
Member

Dynamic quantisation is controlled by the --quantize parameter to the eland_import_hub_model script. It has always been considered an advanced option and should now be deprecated. The script should emit an warning when the option is used describing the hardware incompatibility problem
.

@davidkyle
Copy link
Member

To understand exactly what happens when quantising on a different architecture to the one used at evaluation I used the eland_import_hub_model to trace a quantised model on an M1 mac and upload it to an X86 linux server for evaluation.

Tracing the model with the --quantize option fails on an M1 Mac with the error:

RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine
Full stack trace Traceback (most recent call last): File "/usr/local/bin/eland_import_hub_model", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/eland/cli/eland_import_hub_model.py", line 235, in main tm = TransformerModel( File "/usr/local/lib/python3.9/dist-packages/eland/ml/pytorch/transformers.py", line 630, in __init__ self._traceable_model.quantize() File "/usr/local/lib/python3.9/dist-packages/eland/ml/pytorch/traceable_model.py", line 43, in quantize torch.quantization.quantize_dynamic( File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 450, in quantize_dynamic convert(model, mapping, inplace=True) File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 535, in convert _convert( File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 573, in _convert _convert(mod, mapping, True, # inplace File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 573, in _convert _convert(mod, mapping, True, # inplace File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 573, in _convert _convert(mod, mapping, True, # inplace [Previous line repeated 3 more times] File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 575, in _convert reassign[name] = swap_module(mod, mapping, custom_module_class_mapping) File "/usr/local/lib/python3.9/dist-packages/torch/ao/quantization/quantize.py", line 608, in swap_module new_mod = qmod.from_float(mod) File "/usr/local/lib/python3.9/dist-packages/torch/ao/nn/quantized/dynamic/modules/linear.py", line 111, in from_float qlinear = cls(mod.in_features, mod.out_features, dtype=dtype) File "/usr/local/lib/python3.9/dist-packages/torch/ao/nn/quantized/dynamic/modules/linear.py", line 35, in __init__ super(Linear, self).__init__(in_features, out_features, bias_, dtype=dtype) File "/usr/local/lib/python3.9/dist-packages/torch/ao/nn/quantized/modules/linear.py", line 150, in __init__ self._packed_params = LinearPackedParams(dtype) File "/usr/local/lib/python3.9/dist-packages/torch/ao/nn/quantized/modules/linear.py", line 27, in __init__ self.set_weight_bias(wq, None) File "/usr/local/lib/python3.9/dist-packages/torch/ao/nn/quantized/modules/linear.py", line 32, in set_weight_bias self._packed_params = torch.ops.quantized.linear_prepack(weight, bias) File "/usr/local/lib/python3.9/dist-packages/torch/_ops.py", line 442, in __call__ return self._op(*args, **kwargs or {}) RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine

The models sentence-transformers/msmarco-MiniLM-L-12-v3 and dslim/bert-base-NER were tested

docker run -it --rm elastic/eland \
    eland_import_hub_model \
      --cloud-id $CLOUD_ID \
      -u elastic -p $CLOUD_PWD \
      --hub-model-id sentence-transformers/msmarco-MiniLM-L-12-v3 \
      --task-type text_embedding \
      --quantize


docker run -it --rm elastic/eland \
    eland_import_hub_model \
      --cloud-id $CLOUD_ID \
      -u elastic -p $CLOUD_PWD \
      --hub-model-id dslim/bert-base-NER \
      --task-type text_embedding \
      --quantize

The 8.9 docker image with version 1.13.1 of PyTorch was used in this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers topic:NLP Issue or PR about NLP model support and eland_import_hub_model
Projects
None yet
Development

No branches or pull requests

2 participants