Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Search #18501

Open
1 of 2 tasks
nathansandi opened this issue May 14, 2024 · 21 comments
Open
1 of 2 tasks

Advanced Search #18501

nathansandi opened this issue May 14, 2024 · 21 comments
Assignees
Labels
component/c8-api All things unified C8 API, e.g. C8 REST kind/proposal Change proposal

Comments

@nathansandi
Copy link
Contributor

nathansandi commented May 14, 2024

Description

The business problem that we are trying to solve is to have an easy yet expressive way for users to search/filter Camunda entities, allowing them, for example, to search using logical (AND, OR, IN) and comparison operators (gt, lt, gte, lte).

This allows our Search endpoints to build custom filters that support defining fine-grained, effective result sets. Plus, other components can reuse this if they need an API for advanced search.

Final proposal

We had 3 proposals listed on this document: https://docs.google.com/document/d/1k_xze5QLnKdNhF2ecwrzhawMscVi1-B-irE1GgwwUnM/edit , this is a final proposal, as output of the user interviews.

Main points extracted from the interviews:

  • Users think Proposal 3 is easiest to understand, also considering it is similar to C7.
  • Users would like to have AND operations inside ORs. This comes close to the concept of a full REST Language (Proposal 4 in the Google document linked above that was not approved). This increases the complexity of the implementation and maintainability a lot. Also, most cases can be covered by combining $in and AND queries with ORs.

More details on the interview feedback:
https://docs.google.com/document/d/1ZLlwESNSkZtzUcRFuJVXjbO162xIIhnW2RF0Egi8-GU/edit?usp=sharing

The goal of this issue is follow up with the Proposal 3, with concise operator definitions, reading similar to C7, allowing arbitrary filter options

Example:

{
  "filter": {
    "assignee": {"$eq": "demo"}, // implicit equality operator
    "candidateGroups": { "$in": ["groupA", "groupB"] },
    "taskVariables.orderVolume": {"$eq": 10000}, // path access, allowed for a defined set of objects like variables
    "$or": [
       {"priority": {"$eq": "high"}},
       {"dueDate": { "$lt": "<date>" }}
    ]
  }
}

Task Variable Search

The taskVariables notation provides a method to search and filter tasks based on their associated variables, enabling a more granular and precise query. Variables in this context refer to custom data points or attributes assigned to tasks within the Camunda system. By using the taskVariables.<variableName> format, users can specify which variable they want to filter on.

Example

{
  "taskVariables.orderVolume": {"$eq": 10000}
}

Conditional Operations

Equality Check

Operator Syntax Description
$eq field: { "$eq": value } Filters tasks where field is equal to value.
$neq field: { "$neq": value } Filters tasks where field is not equal to value.

Comparison Operations

The Comparison operations will support the follow data type:

  • Date
  • Int/Long/Decimal
Operator Syntax Description
$gte field: { "$gte": value } Filters tasks where field is greater than or equal to value.
$gt field: { "$gt": value } Filters tasks where field is greater than value.
$lte field: { "$lte": value } Filters tasks where field is less than or equal to value.
$lt field: { "$lt": value } Filters tasks where field is less than value.

Logical Operations

All filter options in the root element, outside of $or elements, are connected as AND.

Operator Syntax Description
$or "$or": [ {condition1}, {condition2} ] Filters tasks where at least one of the conditions is true.
and field: test All conditions out of or will be considered as an "and"

IN

We will support only Strings on IN operations

Syntax Description
field: { "$in": [value1, value2] } Filters tasks where field is one of the values in the provided array.

Acknowledgment:

@nathansandi nathansandi added component/c8-api All things unified C8 API, e.g. C8 REST kind/proposal Change proposal labels May 14, 2024
@tmetzke
Copy link
Member

tmetzke commented May 15, 2024

Thanks, @nathansandi, for this proposal 👍
If @aleksander-dytko considers this a worthwhile addition, I'd ask you to turn this into a fully-fledged technical proposal, adding the following:

  • The set of supported operators, e.g. $lt, $lte, $gt, $gte, $in, $or.
  • The values each operator accepts, e.g.
    • $lt accepts all primitive types or just numeric values?
    • $in accepts an array of primitive types, but can element types be mixed, or must all elements be of the same type?
    • $or accepts an array of filter objects, but can $and be nested inside? If so, how deep can the nesting go?

You can have a look at https://restdb.io/docs/querying-with-the-api as an example. If you're unsure about any of those things, we can use this proposal space to discuss it or use any other document.

Please also clarify where you'd like such discussions to happen, here in the issue or in the Google Doc.
This helps all interested parties to stay up-to-date with one source of truth.

Thank you!

p.s.: I adjusted the proposal description to separate background information from the final proposal for the API. You can also use a slide deck to describe the operators if you like, for example.

@nathansandi
Copy link
Contributor Author

Thank you :) I am defntelly gonna work on this ASAP @tmetzke in the most detailed way.

@nathansandi
Copy link
Contributor Author

@tmetzke I have added the informations, I focused this issue from now only on the chosen proposal, in order to keep the focus to refine the Proposal 3 from the document, so this is the reason I have created the issue over here.

So I would like focus the discussions here, once the document was between more proposals, and now I am presenting the final one.

@tmetzke
Copy link
Member

tmetzke commented May 17, 2024

Thanks, @nathansandi, reads a lot better now and is clearer, nice job 👍
Good job with the operators, I like the overview you created ⭐

I noticed the following things:

  • We only have two groups of operators: logical and conditional operators. Let's focus on those, this makes it easier to follow.
    • Logical operators are $in, $or, and and (only implicit, we could introduce $and as well).
    • The remaining ones are conditional operators, i.e. $lt, $lte, $gt, $gte, and equality (only implicit using <field>: <value>, we could introduce $eq as well).
  • You have not described the path access taskVariables.orderVolume in any way but the example uses this.
  • If we don't want support nesting $and in $or yet, only mention it here and create a follow-up proposal. Feel free to link it from here if you like.

Again, really good direction already 👍

@nathansandi
Copy link
Contributor Author

Thanks, @nathansandi, reads a lot better now and is clearer, nice job 👍 Good job with the operators, I like the overview you created ⭐

I noticed the following things:

  • We only have two groups of operators: logical and conditional operators. Let's focus on those, this makes it easier to follow.

    • Logical operators are $in, $or, and and (only implicit, we could introduce $and as well).
    • The remaining ones are conditional operators, i.e. $lt, $lte, $gt, $gte, and equality (only implicit using <field>: <value>, we could introduce $eq as well).
  • You have not described the path access taskVariables.orderVolume in any way but the example uses this.

  • If we don't want support nesting $and in $or yet, only mention it here and create a follow-up proposal. Feel free to link it from here if you like.

Again, really good direction already 👍

I agree with the sugegstions, I updated there, I prefer without $eq to make equality explicit, because many users are familiar with the tradition way to query, and equality would force them adapt to a different way to setup the API. What do you think?

@tmetzke
Copy link
Member

tmetzke commented May 21, 2024

Thanks for updating, @nathansandi 👍

I prefer without $eq to make equality explicit, because many users are familiar with the tradition way to query, and equality would force them adapt to a different way to setup the API.

The explicit operator $eq as another option (allowing both explicit and implicit EQ) could allow for easier query generation, e.g. from frontend filter definitions. It's always the same schema then: <field>: { <operator>: <value> }.
But no strong opinion here, let's see what the peer review phase will bring 🙂
I'm fine with just having the implicit version you currently documented 👍

The other definitions look good to me so far 👍

@steff46
Copy link

steff46 commented May 22, 2024

@nathansandi As we briefly discussed on zoom last week, I am wondering about the impact of this proposal. Is this only meant to apply to the task search endpoint in our new API or are we aiming at having this for all other search endpoints as well? I believe it was meant for the task endpoint only, but would love to clear that up to be sure it's not supposed to be part of every search endpoint.

In the back of my head I'm thinking about the identity endpoints we would like to make compliant with the POST /v2/<resource>/search style, but would require lots of work (if resonably possible at all) to implement these logical operators.

@tmetzke
Copy link
Member

tmetzke commented May 22, 2024

@steff46, this is a general proposal for all C8 REST endpoints. If accepted and documented in the guidelines, this will be the standard for all search endpoints offering advanced search.

However, how we execute this new guideline, is a different topic. We can iteratively roll this out to specific endpoints, for example. Having the guideline does not imply all search endpoints must implement this immediately. In the long run, though, all search endpoints should support this unless it doesn't make sense for them.

@tmetzke
Copy link
Member

tmetzke commented May 22, 2024

The exact execution also is a product decision, driven by @aleksander-dytko as the general DRI for the C8 REST API. Having the guideline in 8.6, for example, does not imply implementation in 8.6 as well.

@steff46
Copy link

steff46 commented May 22, 2024

Thanks @tmetzke ... I thought we were aiming at having the API in place in 8.6 fully compliant with ... or better fully implementing the guideline. Sorry for the misunderstanding.

@tmetzke
Copy link
Member

tmetzke commented May 22, 2024

All good, that is the goal for the current state of the guidelines. We need to make clear what needs to be considered for the endpoints for 8.6 if we update the guidelines in between 👍

@nathansandi
Copy link
Contributor Author

nathansandi commented May 22, 2024

@tmetzke Just an additional point, I was having a chat with @steff46 , and maybe we should take in consideration also if all search endpoints really required advanced search mechanisms. Like user case. We know users requires a lot for Task Search (From the Tasklist perspective), but maybe not the same for Users endpoint. So in case the propose advances, we do not need change all search endpoints at once, but do it incrementally (at least from my perspective), giving priority for the Task API, once there are a high number of requests from the community for those advanced resources for Task. But I agree this decision is up to the DRI. cc: @aleksander-dytko

@tmetzke
Copy link
Member

tmetzke commented May 29, 2024

We always strive to build things incrementally. I consider this a base value in the engineering department. Thus, adopting this with priority in one endpoint first (e.g. task search) makes sense.
However, we should strive to build advanced search generically in the backend so other search endpoints can easily adopt and support it. We'll need to consider this when we start implementing it.

@marcosgvieira
Copy link
Contributor

marcosgvieira commented May 29, 2024

Good work @nathansandi. I see Advanced Queries potentially to take advantage of the streamlining project and with a "helper" that makes the implementation easier (based on configuration, class abstraction, etc) so the adoption after the initial implementation should be smooth by any other method that needs that - we already talked a lot about this topic so from my part, I'm ok with the output :)

@aleksander-dytko
Copy link

Hi @nathansandi, I see a huge value in providing the Advanced Search Query capabilities in our C8 API. Great job on the proposal and the whole research project! ⭐

I just have a few minor questions/suggestions:

  1. I would propose to make it more clear in the descirbtion that this is a proposal for C8 API, not only Tasklist endpoints. We can use Tasklist as the example in the business problem space.

  2. This one is not clear to me - shouldn't is be "taskVariables.variableName": {"$operator": variableValue} ?

By using the taskVariables.&lt;variableName&gt; format, users can specify which variable they want to filter on.

  1. Operators: What about neq (not equal to) operator? I believe it would be valuable to have this to just complete a list here. I agree with @tmetzke that wrote:

The explicit operator $eq as another option (allowing both explicit and implicit EQ) could allow for easier query generation.

  1. I strongly believe we should make this Advanced Query as a standard when implementing C8 API. I see a huge value added for (current) Tasklist and Operate endpoints - specifically Tasks, Variables, Instances (Process, Decision, Flownode) and Incidents. I would propose to implement this by default for any new endpoint, but have an option if it doesn't provide value / the effort is too big compared with value, etc. If we decide not to implement this for a specific endpoint, the decision should be documented in the epic to make it explicit.

@nathansandi
Copy link
Contributor Author

Thanks for the review and comments @aleksander-dytko, lets jump over the questions.

  1. I would propose to make it more clear in the descirbtion that this is a proposal for C8 API, not only Tasklist endpoints. We can use Tasklist as the example in the business problem space.

We can follow this as a standard, my concern here, besides tasklist, is there user case that users will need advance query for processes (for example). Again, I am not against it, just considering the complexity of the implementation for endpoints that may not be really necessary have a complex mechanism as Tasks. Apart of that, I am going to update the description to make it more clear. I think this also solve the point mentioned on the question (4)

  1. This one is not clear to me - shouldn't is be "taskVariables.variableName": {"$operator": variableValue} ?

I havent added more operators for variable, because we have a limitation over the data types. On the current state, variables are saved as String only. This means for work on that, we should introduce data types on variable, We still can discover them using JSON Stringfy, but it may lead imprecisions. I would suggest introduce complex operators for Variables in a different iteration, but on the start, we can use only equals and not equals.

  1. Operators: What about neq (not equal to) operator? I believe it would be valuable to have this to just complete a list here. I agree with @tmetzke that wrote:

I have no strong preference on that, It is more to keep users close to what they already have today. But with this solution we are going to a path with similar complexity to Rest API query language. I will update the proposal with the inclusion of equals and not equals. cc: @tmetzke

@aleksander-dytko
Copy link

Thanks for considering the comments @nathansandi !

We can follow this as a standard, my concern here, besides tasklist, is there user case that users will need advance query for processes (for example). Again, I am not against it, just considering the complexity of the implementation for endpoints that may not be really necessary have a complex mechanism as Tasks. Apart of that, I am going to update the description to make it more clear. I think this also solve the point mentioned on the question (4)

That's a good point. When thinking about this one again, we should explicitly add this to the API guidelines and use it when the endpoint requires advanced search. So, by default, we don't implement advanced search but rather when there is a specific need for it. With 8.6, we can provide this for specified endpoints (like Tasks, Variables, process instances) and then later act on the feedback.

We would need to mention this rule in our docs to create clear expectations and guarantees for customers.

I havent added more operators for variable, because we have a limitation over the data types. On the current state, variables are saved as String only. This means for work on that, we should introduce data types on variable, We still can discover them using JSON Stringfy, but it may lead imprecisions. I would suggest introduce complex operators for Variables in a different iteration, but on the start, we can use only equals and not equals.

My concern was more about the taskVariables.&lt;variableName&gt; itself but I see it's changed now so everything clear. Thanks!

Next steps:

I would ask @romansmirnov for the last round of review to double-check if we haven't missed anything important. After that is done, @nathansandi could you contribute this to the guideline document?

@dlavrenuek
Copy link
Contributor

dlavrenuek commented May 30, 2024

Thank you for the proposal. From my point of view the proposal is missing filtering string values by like or includes. An example why we need it is search in our applications' UI

@tmetzke
Copy link
Member

tmetzke commented May 30, 2024

Good ideas, @dlavrenuek 👍
Would like and includes be synonyms here or refer to two separate additional filter options?

Regarding, like, do you have a syntax proposal in mind? Things that come to my mind here:

  • name: { "$like": "%na" } <-- SQL-like syntax
  • name: { "$like": ".*na" } <-- Regex syntax

@dlavrenuek
Copy link
Contributor

What WE need is the ability to search as includes ignoring case, which can be achieved by { "$like": "na" } while we would define the database query as like "%na%". Our users/customers will definitely want to use placeholders, so allowing a placeholder, as you suggest, make sense. Although postgres seems to support regex, with user provided regex I would be very carefull. It opens the possibility for users to create very inefficient queries by using inefficient regex and we should prevent inefficiency by design. Therefore I suggest allowing a placeholder, which can be sql-like % or just an implementation agnostic whitespace like * (this is my preference).

So my suggestion is to allow following syntax with * as placeholder

name: { "$like": "*na" }

This should also not block us from adding regex support in the future if we want to. This could be done by an additional query name: { "$regex": ".*(super)+" } which explicitly shows that it is a regex comparison

@tmetzke
Copy link
Member

tmetzke commented May 30, 2024

What WE need is the ability to search as includes ignoring case, which can be achieved by { "$like": "na" } while we would define the database query as like "%na%". Our users/customers will definitely want to use placeholders, so allowing a placeholder, as you suggest, make sense. Although postgres seems to support regex, with user provided regex I would be very carefull. It opens the possibility for users to create very inefficient queries by using inefficient regex and we should prevent inefficiency by design. Therefore I suggest allowing a placeholder, which can be sql-like % or just an implementation agnostic whitespace like * (this is my preference).

So my suggestion is to allow following syntax with * as placeholder

name: { "$like": "*na" }

This should also not block us from adding regex support in the future if we want to. This could be done by an additional query name: { "$regex": ".*(super)+" } which explicitly shows that it is a regex comparison

Makes loads of sense, I like the * placeholder suggestion for a new operation $like 👍
I was also thinking that regex support might be a bit too error-prone/calling for inefficient user queries.

Great idea also to add a separate operation $regex in the future if there is user demand for it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/c8-api All things unified C8 API, e.g. C8 REST kind/proposal Change proposal
Projects
None yet
Development

No branches or pull requests

6 participants