Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Subqueries and Quantifiers in Comparison, and Exists #3465

Merged
merged 1 commit into from
Nov 2, 2020

Conversation

GabrielAlacchi
Copy link
Contributor

@GabrielAlacchi GabrielAlacchi commented Nov 2, 2020

This is my first contribution to the elixir ecosystem as a whole, and it likely won't be my last!

This work builds on #3264 which introduced the expr IN (subquery) syntax to Ecto.Query. This laid out the groundwork for also supporting comparison operators with subqueries, including ANY and ALL quantifiers. I will also submit a PR to ecto_sql to implement the SQL in the adapters if this is merged.

Postgres, MySQL, and most others (I believe) support these operations.

My Proposed DSL Syntax

Currently we can perform an IN subquery operation as follows

from(p in Post, where: p.id in subquery(some_query_expression))

I simply implemented comparison with quantifiers as

from(p in Post, where: p.rating >= all(some_query_expression))
from(p in Post, where: p.rating >= any(some_query_expression))

and of course >= can be substituted for any comparison operator! You can also re-use the same subquery syntax if you are expecting a single row to be returned from the subquery

avg_subquery = from(p in Post, select: avg(p.rating))
from(p in Post, where: p.rating >= subquery(avg_subquery))

Implementation

I implemented this by including a :quantifier field in the Ecto.SubQuery struct. This signals to the adapter that it should include this quantifier when converting the subquery to SQL, and if the underlying database doesn't support quantifiers then it can raise an exception.

@GabrielAlacchi GabrielAlacchi changed the title Add Support for Subqueries and Quantifier in Comparison Add Support for Subqueries and Quantifiers in Comparison Nov 2, 2020
@josevalim
Copy link
Member

Hi @GabrielAlacchi! Welcome and thanks for the PR!

I think the PR is moving in the correct direction but I don't think we need to make the quantifier part of the subquery. Instead, you want to do something similar to :in, where the quantifier is kept in the AST:

  def escape({quantifier, _, [subquery]} = expr, type, params_acc, vars, env) when quantifier in [:all, :any] do
    {subquery, params_acc} = escape_subquery(right, type, params_acc, vars, env)
    {{:{}, [], [quantifier, [], [subquery]]}, params_acc}
  end

This should be enough (or close enough) to make it work! Plus you don't have to worry about validating the left in any(...) syntax because the database will complain anyway. If you want to validate it, you can validate it in the builder, while we are building the query, but IMO you can skip it.

Once you do this, we just need docs on Ecto.Query.API and the second test you added to filter_test.exs :)

@GabrielAlacchi GabrielAlacchi force-pushed the quantifier branch 2 times, most recently from 7053284 to a5b8c0a Compare November 2, 2020 17:57
@GabrielAlacchi GabrielAlacchi changed the title Add Support for Subqueries and Quantifiers in Comparison Add Support for Subqueries and Quantifiers in Comparison, and Exists Nov 2, 2020
@GabrielAlacchi GabrielAlacchi force-pushed the quantifier branch 3 times, most recently from 59516d2 to fd1fbfb Compare November 2, 2020 18:28
@GabrielAlacchi
Copy link
Contributor Author

Hey @josevalim thanks for the follow up. It definitely comes down to the philosophy of whether you'd like Ecto to enforce correct queries at compile time or just let the database complain if any of these statements are used in the wrong place. I was calling escape_quantifier_or_subquery from the escape clause in builder for comparison operators. This enforced correct usage of any and all at compile time, but if this is not a concern simply treated them as functions and putting them directly into the AST actually avoids us needing to change the adapters. This also made it trivial to also support exists which means this also might be enough to close #1479.

I still needed to modify prewalk in planner since the adapters don't seem to handle {:subquery, index} in the AST, but rather expect an Ecto.SubQuery, so I made sure that these also get replaced when prewalk reaches an all, any or exists node. I also took the chance to validate that there's only 1 field in the query at that point, to be consistent with the implementation of left in subquery.

@@ -281,7 +281,7 @@ defmodule Ecto.Query.Builder do
rtype = quoted_type(left, vars)

{left, params_acc} = escape(left, ltype, params_acc, vars, env)
{right, params_acc} = escape(right, rtype, params_acc, vars, env)
{right, params_acc} = escape_subquery(right, rtype, params_acc, vars, env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this change. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to support the case where we don't want to use a quantifier. For example:

avg_rating = from(p in Post, select: avg(p.rating))
from(p in Post, where: p.rating > subquery(avg_rating))

Without this we get

** (Ecto.Query.CompileError) `subquery(query)` is not a valid query expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you also want to support row constructors. I see. :) I would skip it for now since we don't support the ROW operation built-in. We can always add it later in the future though if someone has a proper use case. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're referring too, yes row constructors are related to this work. I didn't know you could do this type of comparison with multiple fields using row constructors https://dev.mysql.com/doc/refman/8.0/en/row-subqueries.html

For now I'm just dealing with the scalar case where the subquery must return a single column and be compared to a scalar expression. I use this a lot for use cases like finding "above average" rows, rows that are in certain percentiles and stuff like that.

By the way, I also tested all these queries locally with an old PHX project and they generate valid SQL without any adapter changes =).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't want to work on supporting ROW constructors yet, I've never seen them used constructively in this context so it's best to just use a fragment for that kind of thing, but the scalar comparison is not too difficult to support with the groundwork that's already been laid out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those queries have to return a single column and a single row, then it is equivalent foo >= all(sub) anyway, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess databases support both syntaxes for backwards compatibility, we don't necessarily need to. Since it's one less thing to maintain and worry about I will drop the >= subquery syntax.

@@ -1105,16 +1105,47 @@ defmodule Ecto.Query.Planner do
{{:in, in_meta, [left, right]}, acc}
end

defp prewalk({:in, in_meta, [left, {:subquery, i}]}, kind, query, expr, acc, adapter) do
# Case when a subquery is at the Right Hand Side of either an in or quantifier based expression
defp prewalk({in_or_comp, op_meta, [left, {:subquery, i}]}, kind, query, expr, acc, adapter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can reverse these changes too to what they were! (i.e. check only for :in)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also letting comparison operators go into this block for the same reasons as above. It's not necessarily to use a quantifier when comparing with a subquery. That's why I'm handling the case where a subquery without a quantifier is at the RHS of an "in" or comparison operator the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above. I missed you wanted to support row constructors too but I would like to skip the row constructors one for now. :)

Comment on lines 1287 to 1293
case where.expr do
{:>=, _, [_, {:all, _, [%Ecto.SubQuery{}] }]} ->
:ok

_ ->
raise "planner did not replace {:subquery, index} with an Ecto.SubQuery struct in quantified comparison"
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case where.expr do
{:>=, _, [_, {:all, _, [%Ecto.SubQuery{}] }]} ->
:ok
_ ->
raise "planner did not replace {:subquery, index} with an Ecto.SubQuery struct in quantified comparison"
end
assert {:>=, _, [_, {:all, _, [%Ecto.SubQuery{}] }]} = where.expr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do similar change below. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this syntax, thanks!

end

assert_raise Ecto.QueryError, fn ->
from(p in Post, where: p.id > subquery(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from(p in Post, where: p.id > subquery(s))
from(p in Post, where: p.id > any(s))

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid SQL assuming the subquery returns a single column and a single row. If the database finds that the subquery returns more than 1 row it'll throw a cardinality exception. I'm simply testing that this syntax correctly throws a QueryError in the case that there's >1 field in the select.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in this case I would rather have folks using all/any with proper limit, rather than support a syntax designed for row constructs which we don't properly handle. :D

@GabrielAlacchi
Copy link
Contributor Author

GabrielAlacchi commented Nov 2, 2020

@josevalim Ok I dropped the row constructor syntax. I will add documentation next and squash the branch. Should the docs for any, all and exists be added to Ecto.Query.API?

@josevalim
Copy link
Member

I will add documentation next and squash the branch. Should the docs for any, all and exists be added to Ecto.Query.API?

Yes, and exists too!

@GabrielAlacchi
Copy link
Contributor Author

Added some docs and squashed.

@GabrielAlacchi GabrielAlacchi force-pushed the quantifier branch 3 times, most recently from 038d27c to 74608a0 Compare November 2, 2020 20:46
@josevalim
Copy link
Member

Awesome job as your first contribution @GabrielAlacchi, thank you! ❤️

@josevalim josevalim merged commit 49fd4d5 into elixir-ecto:master Nov 2, 2020
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@GabrielAlacchi
Copy link
Contributor Author

Obrigado @josevalim! You went to USP right? My GF is from Brazil and studied CS there =)

@josevalim
Copy link
Member

Yes, I am from Poli-USP! Please send her regards. :)

@GabrielAlacchi
Copy link
Contributor Author

GabrielAlacchi commented Nov 2, 2020

She sends regards as well and respectfully says #racacaaso #chupapoli Whatever that means?

@augnustin
Copy link

augnustin commented Apr 26, 2021

@GabrielAlacchi Thanks for the PR.

I'm trying to use the exists method, without success. :(

I would love to write:

from(m in Model, where: exists(from a in assoc(m, :assoc))

but I can't, can I?

I also tried:

from(m in Model, where: exists(from a in Assoc, where: a.model_id == parent_as(:assoc).id))

but I get:

** (Ecto.SubQueryError) the following exception happened when compiling a subquery.

    ** (Ecto.QueryError) could not find named binding `parent_as(:assoc)` in query:
    
    from u0 in MyApp.Assoc,
      where: u0.model_id == parent_as(:assoc).id,
      select: u0
    

The subquery originated from the following query:

from d0 in MyApp.Model,
  where: exists(subquery(from u0 in MyApp.Assoc,
  where: u0.model_id == parent_as(:assoc).id,
  select: u0)),
  select: d0

    (elixir) lib/enum.ex:1940: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir) lib/enum.ex:1940: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ecto) lib/ecto/repo/queryable.ex:208: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    (stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
    (iex) lib/iex/evaluator.ex:257: IEx.Evaluator.handle_eval/5
    (iex) lib/iex/evaluator.ex:237: IEx.Evaluator.do_eval/3
    (iex) lib/iex/evaluator.ex:215: IEx.Evaluator.eval/3
    (iex) lib/iex/evaluator.ex:103: IEx.Evaluator.loop/1

Can I get more hints on this?

Cheers

@josevalim
Copy link
Member

Where have you defined the named binding? See more info here: https://hexdocs.pm/ecto/Ecto.Query.html#module-named-bindings

@augnustin
Copy link

Thanks @josevalim for your reactivity!

Indeed it does work with:

from(m in Model, as: :model, where: exists(from a in Assoc, where: a.model_id == parent_as(:model).id))

Any chance to make it work with Ecto.assoc?

Otherwise I need to fetch association module, foreign and related key, which doesn't feel DRY at all...

Cheers

@augnustin
Copy link

augnustin commented Apr 26, 2021

Besides, I'm trying to pass the :model atom as a parameter in a function and I get:

  def get_assoc_data(query, assoc) do
    case query_to_module(query).__schema__(:association, assoc) do
      %{
        queryable: queryable,
        related_key: related_key,
        owner_key: owner_key
      } -> {queryable, owner_key, related_key}
    end
  end

  def build_exists_subquery(query, assoc, as) do
    {assoc_struct, owner_key, foreign_key} = get_assoc_data(query, assoc)
    from(a in assoc_struct, where: field(a, ^foreign_key) == field(parent_as(as), ^owner_key))
  end


# usage:

    from(
      m in query,
      as: :query,
      where: not exists(build_exists_subquery(Model, :assoc, :query))
    )
== Compilation error in file lib/utils.ex ==
** (Protocol.UndefinedError) protocol String.Chars not implemented for {:^, [line: 24], [{:owner_key, [line: 24], nil}]}
    (elixir) lib/string/chars.ex:3: String.Chars.impl_for!/1
    (elixir) lib/string/chars.ex:22: String.Chars.to_string/1
    lib/ecto/query/builder.ex:588: Ecto.Query.Builder.escape_field!/3
    lib/ecto/query/builder.ex:75: Ecto.Query.Builder.escape/5
    lib/ecto/query/builder.ex:291: Ecto.Query.Builder.escape/5
    lib/ecto/query/builder/filter.ex:67: Ecto.Query.Builder.Filter.build/6
    expanding macro: Ecto.Query.where/3
    lib/utils.ex:24: ExQueb.Utils.build_exists_subquery/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants