BUG: fixed model serve fail with HTTP 400 on Bad Request. #5003

abatomunkuev · 2021-11-04T07:38:51Z

Signed-off-by: Andrei Batomunkuev andreibatomunkuev@gmail.com

What changes are proposed in this pull request?

Solves #4897.

Please refer to my root cause analysis of the issue. I have explained that the root cause of the issue might be an incorrect passing value MALFORMED REQUEST to the argument error_code in function _handle_serving_error.

To fix this, I have changed the passing value argument of error_code from MALFORMED REQUEST to BAD REQUEST.

The HyperText Transfer Protocol (HTTP) 500 Internal Server Error server error response code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request. Source

The HyperText Transfer Protocol (HTTP) 400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing). Source

To reproduce the error lets run the following command from the issue

curl -i -X POST -d "{\"data\":0.0199132142]}" -H "Content-Type: application/json" http://localhost:5000/invocations

In our case, the error should be 400 Bad Request since we are passing an incorrect syntax/value of json object. So, the error message should reflect the corresponding issue.

How is this patch tested?

Currently, the unit tests test_scoring_server.py fail since we have updated the value to BAD REQUEST. So, we need to either fix those existing tests or write a new unit test. I will need some help in fixing unit tests, I am a little bit confused which unit test belong to this issue.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

When the users passes an incorrect syntax of json object to the server, the server should display the following message:

HTTP/1.1 400 BAD REQUEST
Server: gunicorn
Date: Thu, 04 Nov 2021 06:29:13 GMT
Connection: close
Content-Type: application/json
Content-Length: 1030

{"error_code": "BAD_REQUEST", "message": "Failed to parse input from JSON. Ensure that input is a valid JSON formatted string.",....

Example of the incorrect json syntax value:

curl -i -X POST -d "{\"data\":0.0199132142]}" -H "Content-Type: application/json" http://localhost:5000/invocations

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2021-11-04T07:48:05Z

@abatomunkuev Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4102308049. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

dbczumar · 2021-11-04T16:55:03Z

Hi @abatomunkuev, the failures in https://github.com/mlflow/mlflow/runs/4104942369?check_suite_focus=true appear to be caused by test cases asserting that the response of certain scoring operations with bad input is 500. Now that we've changed these to 400, can we update the test cases as well?

The failures in https://github.com/mlflow/mlflow/runs/4104942145?check_suite_focus=true appear to be unrelated. I'm investigating the cause.

dbczumar · 2021-11-04T17:08:12Z

The flavors tests have been fixed via https://github.com/tensorflow/tensorflow/pull/52927/files, which was just released with TensorFlow 2.6. I've restarted those tests. We'll still need to address the other failures noted above.

abatomunkuev · 2021-11-05T01:55:05Z

@dbczumar Hello! I am going to work on updating the tests.

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev · 2021-11-07T13:05:38Z

@dbczumar Hello! I have made some changes to the following test case in tests/models/test_cli.py:

mlflow/tests/models/test_cli.py

Line 468 in 0f9d1e0

    
           def _validate_with_rest_endpoint(scoring_proc, host_port, df, x, sk_model, enable_mlserver=False):

I wondering should we update the following line?

mlflow/tests/models/test_cli.py

Line 494 in 0f9d1e0

assert scoring_response_dict["error_code"] == ErrorCode.Name(MALFORMED_REQUEST)

Should we change ErrorCode.Name(MALFORMED_REQUEST) to ErrorCode.Name(BAD_REQUEST)?

dbczumar · 2021-11-12T04:19:41Z

@dbczumar Hello! I have made some changes to the following test case in tests/models/test_cli.py:

mlflow/tests/models/test_cli.py

Line 468 in 0f9d1e0

def _validate_with_rest_endpoint(scoring_proc, host_port, df, x, sk_model, enable_mlserver=False):

I wondering should we update the following line?

mlflow/tests/models/test_cli.py

Line 494 in 0f9d1e0

assert scoring_response_dict["error_code"] == ErrorCode.Name(MALFORMED_REQUEST)

Should we change ErrorCode.Name(MALFORMED_REQUEST) to ErrorCode.Name(BAD_REQUEST)?

Hi @abatomunkuev . Yes, let's change MALFORMED_REQUEST to BAD_REQUEST.

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev · 2021-11-14T03:11:34Z

@dbczumar done!

dbczumar

LGTM! Thanks @abatomunkuev !

dbczumar · 2021-11-15T03:33:32Z

@abatomunkuev Almost there! Looks like test_build_docker and test_build_docker_with_env_override are still expecting 500 responses, so the tests are failing. Can you update them?

dbczumar · 2021-11-18T06:47:25Z

@abatomunkuev Apologies for misinterpreting the test output. It appears that the server is still responding with 500s where 400s are now expected.

abatomunkuev · 2021-11-18T16:15:33Z

@dbczumar Should we update the error codes in pyfunc/scoring_server/__init__.py?

In some functions we are still returning MALFORMED_REQUEST.

https://github.com/abatomunkuev/mlflow/blob/18e0d2caa8ba7849f4f53ec6c08ce4eb3fe1a1f7/mlflow/pyfunc/scoring_server/__init__.py#L74-L177

I am not sure if its a good way to solve the issue.

We can either update _validate_with_rest_endpoint() test case. What do you think?

dbczumar · 2021-11-18T18:53:39Z

@dbczumar Should we update the error codes in pyfunc/scoring_server/__init__.py?

In some functions we are still returning MALFORMED_REQUEST.

https://github.com/abatomunkuev/mlflow/blob/18e0d2caa8ba7849f4f53ec6c08ce4eb3fe1a1f7/mlflow/pyfunc/scoring_server/__init__.py#L74-L177

I am not sure if its a good way to solve the issue.

We can either update _validate_with_rest_endpoint() test case. What do you think?

@abatomunkuev I think it's best to change other instances to BAD_REQUEST. MALFORMED_REQUEST currently produces incorrect behavior, as the correct response type to invalid inputs is a 400 error.

abatomunkuev · 2021-11-18T19:29:19Z

Okay, I am going to update the code in pyfunc/scoring_server/__init__.py

dbczumar · 2021-11-18T19:56:21Z

Okay, I am going to update the code in pyfunc/scoring_server/__init__.py

Thank you!

dbczumar · 2021-11-22T23:01:53Z

@abatomunkuev Looks like there are a few more small failures in test cases that expect MALFORMED_REQUEST instead of BAD_REQUEST. Can we update the test cases?

abatomunkuev · 2021-11-22T23:07:33Z

@dbczumar Sure, I just need to find which test cases fail.

dbczumar · 2021-11-22T23:10:09Z

@abatomunkuev https://github.com/mlflow/mlflow/runs/4257679812?check_suite_focus=true and https://github.com/mlflow/mlflow/runs/4257680158?check_suite_focus=true should be helpful

abatomunkuev · 2021-11-22T23:33:42Z

@dbczumar Thanks!

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev · 2021-11-23T00:24:37Z

@dbczumar Tests were updated, it should resolve the issue in flavors.

However, in model. We got 2 failures. It seems to me that we are still getting 500 INTERNAL ERROR.

dbczumar · 2021-11-23T00:49:53Z

@abatomunkuev Perhaps there's something special about the empty string case being tested here?

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev · 2021-11-23T02:33:50Z

@dbczumar Could you review the changes? I hope this time it will work.

I am wondering if we should leave MALFORMED_REQUEST in this case?
6677ca4#diff-e607d3e6aefa036b1bfe9416fd1aa8b0934120f458cf263ddebab224f832e4a5R99

dbczumar · 2021-11-23T03:32:40Z

Thanks @abatomunkuev. Your change looks good - let's get rid of MALFORMED_REQUEST in this file. Can we also use BAD_REQUEST here? https://github.com/mlflow/mlflow/pull/5003/files#diff-e607d3e6aefa036b1bfe9416fd1aa8b0934120f458cf263ddebab224f832e4a5R99

abatomunkuev · 2021-11-23T03:45:40Z

@dbczumar Sure!

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

dbczumar · 2021-11-23T05:53:47Z

@abatomunkuev Can we resolve conflicts with the master branch for mlflow/pyfunc/scoring_server/__init__.py?

abatomunkuev · 2021-11-23T15:02:06Z

@dbczumar Yes.

What do I need to do to resolve this conflict?

Signed-off-by: dbczumar <corey.zumar@databricks.com>

abatomunkuev · 2021-11-23T20:37:24Z

@dbczumar
https://github.com/abatomunkuev/mlflow/blob/0439bfb25e94d44814c6cd3af3c4a84af368ee3f/mlflow/pyfunc/scoring_server/__init__.py#L116-L136

I think this is portion of code causes the error. I don't know why it causes the error, since we changed all MALFORMED_REQUEST instances to BAD_REQUEST.

Or maybe we should update the value of the data argument.
https://github.com/abatomunkuev/mlflow/blob/0439bfb25e94d44814c6cd3af3c4a84af368ee3f/tests/models/test_cli.py#L480

dbczumar · 2021-11-23T21:36:42Z

@abatomunkuev Let me know if you need help investigating this.

abatomunkuev · 2021-11-23T22:11:31Z

@dbczumar I would like to run tests locally. However, I got the error building a docker image to serve the model.

I'm running a test: pytest tests/models/test_cli.py --large

abatomunkuev · 2021-11-24T20:15:51Z

I am going to update the value of the data argument.
https://github.com/abatomunkuev/mlflow/blob/0439bfb25e94d44814c6cd3af3c4a84af368ee3f/tests/models/test_cli.py#L480
Maybe something like this:

scoring_response = endpoint.invoke(data="this is,,,, not valid json", content_type=content_type)

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

dbczumar · 2021-11-24T20:17:28Z

@abatomunkuev Did we identify an issue with the empty string?

abatomunkuev · 2021-11-24T20:20:46Z

@dbczumar Not complete sure, but it might be the cause.

Also, I have a question. How can I test the cases that involves Docker containers & serving the model on my local machine. I am not able to do that, it gives me the error.

dbczumar · 2021-11-24T20:29:19Z

@abatomunkuev Do you have Docker installed on your machine?

abatomunkuev · 2021-11-24T20:35:31Z

@dbczumar Yes, I do have.

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-12-02T11:59:01Z

@dbczumar @abatomunkuev I pushed a commit to fix the test failures: 4308be6

It looks like mlserver doesn't respect the status code of MlflowException.

dbczumar · 2021-12-03T06:40:26Z

Fantastic! Thank you so much for your contribution, @abatomunkuev , and thank you for identifying the cause of the failure, @harupy !

abatomunkuev force-pushed the issue#4897 branch from f4a7ac5 to c052b33 Compare November 4, 2021 07:39

github-actions bot added area/scoring MLflow Model server, model deployment tools, Spark UDFs rn/bug-fix Mention under Bug Fixes in Changelogs. labels Nov 4, 2021

abatomunkuev force-pushed the issue#4897 branch 2 times, most recently from fdb9b38 to a6340c8 Compare November 4, 2021 11:41

BUG: fixed model serve fail with HTTP 400 on Bad Request.

468f390

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev force-pushed the issue#4897 branch from a6340c8 to 468f390 Compare November 4, 2021 11:46

TEST: Updated a test case to reflect an update with HTTP errors fix

6c4cedf

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

Updated assert statement, changed from MALFORMED_REQUEST to BAD_REQUEST

18e0d2c

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev force-pushed the issue#4897 branch from 7feba10 to 18e0d2c Compare November 12, 2021 20:37

dbczumar approved these changes Nov 14, 2021

View reviewed changes

Updated tests: changed from MALFORMED request to BAD_REQUEST

0a549e3

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev force-pushed the issue#4897 branch from ceca430 to 0a549e3 Compare November 23, 2021 00:00

BUG: updated all instances from MALFORMED_REQUEST to BAD_REQUEST

97d3a71

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

abatomunkuev force-pushed the issue#4897 branch from 63361b8 to 97d3a71 Compare November 23, 2021 01:53

Changed error code when parsing tf_serving_input

6677ca4

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

Changed the error code when parsing tf_serving_input

208ab21

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

dbczumar added 2 commits November 23, 2021 09:39

Merge master, fix conflicts

af19aac

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Format

0439bfb

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Updated the value of the data argument that passes invalid json

008e192

Signed-off-by: Andrei Batomunkuev <abatomunkuev@myseneca.ca>

harupy added 2 commits December 2, 2021 20:49

fix test failures

4308be6

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix assert message

2be91d1

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar merged commit f827fa4 into mlflow:master Dec 3, 2021

dbczumar mentioned this pull request Aug 9, 2022

[BUG] mlflow models serve fails with HTTP 500 instead of 400 on bad input #4897

Closed

23 tasks

BUG: fixed model serve fail with HTTP 400 on Bad Request. #5003

BUG: fixed model serve fail with HTTP 400 on Bad Request. #5003

Conversation

abatomunkuev commented Nov 4, 2021 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Nov 4, 2021

dbczumar commented Nov 4, 2021

dbczumar commented Nov 4, 2021

abatomunkuev commented Nov 5, 2021

abatomunkuev commented Nov 7, 2021

dbczumar commented Nov 12, 2021

abatomunkuev commented Nov 14, 2021

dbczumar left a comment

Choose a reason for hiding this comment

dbczumar commented Nov 15, 2021 • edited

dbczumar commented Nov 18, 2021

abatomunkuev commented Nov 18, 2021

dbczumar commented Nov 18, 2021

abatomunkuev commented Nov 18, 2021

dbczumar commented Nov 18, 2021

dbczumar commented Nov 22, 2021

abatomunkuev commented Nov 22, 2021

dbczumar commented Nov 22, 2021

abatomunkuev commented Nov 22, 2021

abatomunkuev commented Nov 23, 2021 • edited

dbczumar commented Nov 23, 2021

abatomunkuev commented Nov 23, 2021

dbczumar commented Nov 23, 2021

abatomunkuev commented Nov 23, 2021

dbczumar commented Nov 23, 2021

abatomunkuev commented Nov 23, 2021 • edited

abatomunkuev commented Nov 23, 2021 • edited

dbczumar commented Nov 23, 2021

abatomunkuev commented Nov 23, 2021

abatomunkuev commented Nov 24, 2021

dbczumar commented Nov 24, 2021

abatomunkuev commented Nov 24, 2021

dbczumar commented Nov 24, 2021

abatomunkuev commented Nov 24, 2021

harupy commented Dec 2, 2021 • edited

dbczumar commented Dec 3, 2021

abatomunkuev commented Nov 4, 2021 •

edited

dbczumar commented Nov 15, 2021 •

edited

abatomunkuev commented Nov 23, 2021 •

edited

abatomunkuev commented Nov 23, 2021 •

edited

abatomunkuev commented Nov 23, 2021 •

edited

harupy commented Dec 2, 2021 •

edited