Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

Open
ThibBal opened this issue Feb 10, 2021 · 2 comments

Comments

@ThibBal
Copy link
Contributor

ThibBal commented Feb 10, 2021

Data model :

  • admin_users table (extends users table)
  • zones table
  • admin_users_zones table to store admin_user_id,zone_id

Problem encountered: when listing admin_users (using find() method in UserDao.php) with their zones and ordering on them, my lines are duplicated. But the "count" method still returns the correct number.

Capture d’écran 2021-02-09 à 18 22 54

Here is the SQL request generated by TDBM:

SELECT DISTINCT 
  `users`.`id` AS `users____id`, `users`.`email` AS `users____email`, 
 `zones`.`id` AS `zones____id`,  `zones`.`label` AS `zones____label`
FROM   `admin_users`   
  LEFT JOIN `users` ON (    `admin_users`.`id` =   `users`.`id`)   
  LEFT JOIN `admin_users_zones` ON (    `admin_users_zones`.`admin_user_id` =   `admin_users`.`id`)   
  LEFT JOIN `zones` ON (    `admin_users_zones`.`zone_id` =   `zones`.`id`)   
  LEFT JOIN `admin_users_poles` ON (    `admin_users`.`pole_id` =   `admin_users_poles`.`id`)   
  LEFT JOIN `admin_users_roles` ON (    `admin_users`.`role_id` =   `admin_users_roles`.`id`)   
  LEFT JOIN `user_status` ON (    `users`.`status_id` =   `user_status`.`id`)
WHERE   (  (  (    `users`.`lastname` LIKE   '%%')
  OR (    `users`.`firstname` LIKE   '%%')))
  AND (  (    `users`.`email` LIKE   '%admin@mail.com%'))
ORDER BY   `zones`.`label` ASC LIMIT 10

Data model creation:

create users (id, email)
create roles (id, label)
create users_roles (id_user, id_role)

insert into roles (1, 'A'), (2, 'b')
insert into users (1, 'admin')
insert into users_roles (1, 1), (1, 2)

Method called: ->find('', [], 'roles.label ASC')

@homersimpsons
Copy link
Collaborator

Overview

Thank you for your report !

This problem is kind of tricky as we may not reflect the correct user intent.

Reproducibility

I will use the following schema for this comment:

Database Setup

Here is a MySQL script to set up the issue

CREATE TABLE `users` (
`id` INT NOT NULL,
`email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`));

CREATE TABLE `roles` (
`id` INT NOT NULL,
`label` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`));

CREATE TABLE `users_roles` (
`id_user` INT NOT NULL,
`id_role` INT NOT NULL,
PRIMARY KEY (`id_user`, `id_role`),
CONSTRAINT `fk_id_user__users` FOREIGN KEY (`id_user`) REFERENCES `users` (`id`),
CONSTRAINT `fk_id_role__roles` FOREIGN KEY (`id_role`) REFERENCES `roles` (`id`));

INSERT INTO `users` (`id`, `email`) VALUES ('1', 'admin');
INSERT INTO `roles` (`id`, `label`) VALUES ('1', 'a');
INSERT INTO `roles` (`id`, `label`) VALUES ('2', 'b');
INSERT INTO `users_roles` (`id_user`, `id_role`) VALUES ('1', '1');
INSERT INTO `users_roles` (`id_user`, `id_role`) VALUES ('1', '2');

Using the schema and running the generated SQL query leads to the aforementioned issue:

SELECT DISTINCT users.*, roles.label
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  ORDER BY roles.label ASC;
users.id users.email roles.label
1 admin a
1 admin b

A Solution

I guess the best we can try is to use a GROUP BY in conjunction with MIN and MAX aggregate that would depend on the expected order.

This would lead to generate something like

SELECT users.*, MIN(roles.label) as order_0 -- Here we use `MIN` as the order is `ASC`
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 ASC;
SELECT users.*, MAX(roles.label) as order_0 -- Here we use `MAX` as the order is `DESC`
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 DESC;

Notice that we can easily extend this to more order clauses, thus we could have something like:

SELECT users.*, MIN(roles.label1) as order_0, MAX(roles.label2) as order_1
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 ASC, order_1 DESC;

We should care about the order_* naming as this can conflict with a retrieved column

Additional notes

Also, PostgreSQL seems to have a DISTINCT ON keyword to retrieve distinct results on a subset of fields. The doc states the following:

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first

So under the hood it might act the same as the proposed implementation.

@moufmouf
Copy link
Member

Hey @homersimpsons , hey @ThibBal ,

Ok, @homersimpsons , your comment is spot on.
Now, I have a real question. We are trying to order a row by a column that can have many values (we don't even need a many to many relationship... a one to many relationship has the same problem). As far as I can tell, this does not make any sense. If we choose "MIN" or "MAX" arbitrarily, we might actually confuse the user that is maybe expecting the opposite result.

Wouldn't the right thing to do be to throw an exception instead?
We could detect we try to order an a column that is related to the main bean in a 1 to many or many to many way, and then throw an exception, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants