Replies: 1 comment 3 replies
-
Have you looked at @withsmilo's post here? #1260 I think the idea is to provide a Router concept in BentoML that will be able to allow users to compose results from multiple bento services. Similar to your 2nd option, but we will provide some APIs to make it easier to build. Another very common way to do this is to implement a simple "backend-for-frontend" layer, essentially a standalone API server that calls X, Y, Z under the hood. GraphQL is one of the popular ways to do it these days. If your team already has something like that, you can try to re-use it as well, assuming the number of models and the overall graph is not being changed frequently. |
Beta Was this translation helpful? Give feedback.
-
Suppose I have services X, Y and Z. They each operate on the same type of input (e.g. a DataFrame or an Image) and they each produce the same output (e.g JSON).
Now, suppose that these services are often (but not always) called together - i.e. most times, we want the results of X, Y and Z together. Of course, I could simply ask clients of the service to call all three separately and be done with it. However, if I wanted to provide a service that did this for them, what would be the best way to do this?
The options I've thought of are:
a new bento service XYZ that basically just reproduces X, Y and Z in one service. This is obviously a bad idea for a lot of reasons such as code duplication and resource duplication (i.e. same model running in many places). It has an advantage of being simple to implement and also reduce duplication of preprocessing (e.g. if we preprocess the image in each case, then we can do this just once).
a bento service XYZ that just calls X, Y and Z. In this case, I essentially would be implementing the same code the client would and take the input, send to each service and combine the output to send back. The main downside of this is that it's configuration requires some hard coding of where it expects X, Y and Z to be available. e.g There is no way to specify at run time the ports I expect to see X, Y and Z on. I'm also not sure how things like batching would be handled.
Create the XYZ service and within create endpoints for X, Y and Z separately (e..g predict_X, predict_Y). This would then eliminate code duplication and potentially resource duplication (if two end points use the same artifact, I'm hoping it is only loaded once). The main downside would be coupling X/Y/Z
Any thoughts/suggestions on best practice for such a problem?
Beta Was this translation helpful? Give feedback.
All reactions