Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 105 exclude candidates from cluster #6448

Merged

Conversation

zyadtaha
Copy link
Contributor

@zyadtaha zyadtaha commented Mar 14, 2024

Fixes #105

Changes proposed in this pull request:

  • Changed the layout according to importance hierarchy.
  1. Merge checkbox
  2. Individual candidates with checkbox
  3. New cell value
  4. Cluster size
  5. Row count
  • Changed bullet points into checkboxes which is disabled if the corresponding merge option has not been selected.
  • On selecting merge option for a particular cluster, all individual candidates in the cluster get selected by default. Then the user can deselect an individual candidate to exclude it from being a part of the cluster.
  • The three buttons Export clusters, Merge selected & re-cluster and Merge selected & Close will include only checked candidates in the cluster.
  • If the user changes the number of candidates included in the cluster, the values in Cluster size, Row count will change dynamically.
  • When Browse this cluster button is clicked, the unchecked candidates will not be included.
  • Added alert if the Browse this cluster button is clicked and no candidates are checked.

@zyadtaha zyadtaha force-pushed the issue-105-exclude-candidates-from-cluster branch from 9ebdbec to 2061660 Compare March 15, 2024 05:40
@zyadtaha zyadtaha force-pushed the issue-105-exclude-candidates-from-cluster branch from 26cd6ea to ed9cd64 Compare March 15, 2024 16:55
@zyadtaha zyadtaha marked this pull request as ready for review March 15, 2024 16:56
@zyadtaha
Copy link
Contributor Author

Hi @wetneb,
The issue is almost done, there is a one test which fails and i am trying to get familiar with Cypress to solve this problem.
In the mean time, could you please review my changes ?
Very excited to hear your feedback!
clustring-dialog

@zyadtaha
Copy link
Contributor Author

I had solved the problem!
Excited for your review and feedback @wetneb

@github-actions github-actions bot added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Priority: Medium Represents important issues that need to be addressed but are not urgent Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. logic Changes to the data model, to the way operations, expressions work Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. clustering Issues related to the clustering operation, to merge similar values in a text column labels Mar 17, 2024
@wetneb
Copy link
Sponsor Member

wetneb commented Mar 17, 2024

Looking great! I will try to have a look soon. In the meantime tagging @cooperzoe too, as this involves changes to the interface. @cooperzoe what do you think of the screenshot above, where the "merge" checkbox has been moved to the first column and the bullet points have been turned into checkboxes? See the original discussion in #105 to motivate those changes.

(P.S.: @zyadtaha I changed your issue description so that it includes "Fixes #105" instead of a more complicated link to the issue. This syntax is required for GitHub to properly link the pull request to the issue on the right-hand side)

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 19, 2024

I took the time to try it out and I really like it!

I noticed that you took care of updating the cluster size and row counts depending on which choices in a cluster are selected. And the "Browse this cluster" link too. While I appreciate the care, I wonder if it's really the ideal behavior.
Consider the state in which we arrive just after having computed clusters, where no clusters will be selected.
The "row count" and "cluster size" columns contain only zeros, which is not super informative. If you try using the facets on the right-hand side to filter for clusters of a certain size, you get no results. For it to work, you'd need to first select all clusters, which sort of implies that you consider them all good without having looked at them, in a sense.

For those reasons, I'd vote for a simpler option: keep the "Browse this cluster", "Cluster size" and "Row count" features static (always encompassing all choices in the cluster, without adapting to their checkboxes). But that's just my opinion!
Maybe @ostephens @thadguidry @Critic-A would have thoughts on the matter?

@thadguidry
Copy link
Member

@wetneb I am OK for dynamic changes based on the checkboxes per cluster value and like that idea. What was your worry here about dynamic updates on each cluster as a user unchecks each or instead massively changes all using the Select All / Deselect All buttons? Also, if unchecking a few values in a single cluster, should be fast enough I think regardless if users have 200k cluster values in that single cluster, yeah?

The labeling "Cluster size" never did make sense to me and often confused folks in training, maybe it's better to relabel with "Cluster value count". The feedback I'd heard was that they thought "size" was something about the length of cluster strings. Other than that, no further opinion.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 19, 2024

I am not worried by performance at all, just about usability. If my comment above does not make it clear, it might be easier to understand my concern by trying out the current prototype. Just run clustering, use the right-hand side filters and see what happens.

@zyadtaha
Copy link
Contributor Author

Hi @wetneb,
Thanks for your feedback!
so currently I should keep "Browse this cluster", "Cluster size" and "Row count" features static, right?
I also agree with changing "Cluster size" labelling to "Cluster value count" as @thadguidry mentioned or another clearer name if you have any suggestions.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 21, 2024

@zyadtaha I'll just double-check with others if they agree, to make sure I don't make you do unnecessary work.

@thadguidry do you agree with making the cluster fields static again or should I describe my concerns with them being dynamic in another form (with screenshots?)

@ostephens @Critic-A any thoughts?

@thadguidry
Copy link
Member

@wetneb screenshots please. Env. not in a good state (harddrive again)

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 25, 2024

@thadguidry here is my explanation, with added screenshots. Any clearer?

Consider the state in which we arrive just after having computed clusters, where no clusters will be selected.

image

The "row count" and "cluster size" columns contain only zeros, which is not super informative.
If you try using the facets on the right-hand side to filter for clusters of a certain size, you get no results.

image

This is confusing since the facet histograms do show that some clusters match the filters I have set up.

For it to work, you'd need to first select all clusters, which sort of implies that you consider them all good without having looked at them, in a sense.

@thadguidry
Copy link
Member

@wetneb Hmm, it's almost like we shouldn't show any data in the range sliders until at least 1 cluster is selected? If I look at the descriptions above the range sliders...they say "... in cluster". That almost leads me to believe that maybe we word that as "...in selected clusters" above the range sliders? Dunno.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 25, 2024

To me, requiring clusters to be selected for them to appear in the filters goes against the purpose of those filters, which is (in my opinion) to help you review subsets of clusters and mark them as mergeable. That's why I am proposing to keep the "Browse this cluster", "Cluster size" and "Row count" features static, independent of the selection.

@thadguidry
Copy link
Member

Ah, those sliders always help you find clusters to merge and not change or modify them, right? In that case, I also agree with you.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 28, 2024

@zyadtaha then we could indeed go for keeping the "Browse this cluster", "Cluster size" and "Row count" features static.

@zyadtaha
Copy link
Contributor Author

@wetneb The "Browse this cluster", "Cluster size" and "Row count" features are now static.

@zyadtaha
Copy link
Contributor Author

Hi @wetneb,
Are there additional requirements for this PR?
If not, I am excited to merge this cool feature into the codebase!

@tfmorris
Copy link
Member

tfmorris commented Apr 1, 2024

@zyadtaha it is a holiday weekend in many areas and European countries, including where @wetneb is based, also have Monday off, so it might be a day or two before you get a reply.

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the code changes super closely yet, but testing this interactively it looks like we have a small regression in this use case:

  • run clustering
    image
  • tick the checkbox of one cluster (which ticks the checkboxes in all its values)
    image
  • use the first slider on the right-hand side to change the filtering settings
    image

-> with this new code, no clusters are returned, whereas the histogram in the filters indicate that some should be shown.

I have checked that this bug isn't present on the master branch (clusters are shown after following this workflow).

The dataset on which I tried this can be downloaded here (on the "Producteur" column).

@zyadtaha
Copy link
Contributor Author

Hi @wetneb,
I reproduced the issue, then tried to debug and find the cause of the problem in the clustering-dialog.js, but I am currently stuck.
Could you please give me a hint on how to solve the issue or where to start?

@zyadtaha zyadtaha requested a review from wetneb April 25, 2024 00:49
Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as expected on my side!
I only have a couple of nitpicks on the code left.
Perhaps it could be worth having a new Cypress test case to demonstrate the functionality, if you feel like it.

@zyadtaha zyadtaha requested a review from wetneb May 19, 2024 03:42
@zyadtaha
Copy link
Contributor Author

Hi @wetneb,

I made the changes you suggested and added a test case with Cypress.
I am looking forward to merge this because I'm worried it might be forgotten.

Thanks for your feedback!

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@wetneb wetneb merged commit a296552 into OpenRefine:master May 20, 2024
9 checks passed
@wetneb
Copy link
Sponsor Member

wetneb commented May 20, 2024

Congratulations @zyadtaha for fixing such an old issue! Your contribution will be appreciated by a lot of users :)

@zyadtaha zyadtaha deleted the issue-105-exclude-candidates-from-cluster branch May 20, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column logic Changes to the data model, to the way operations, expressions work Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Priority: Medium Represents important issues that need to be addressed but are not urgent Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Option to exclude some candidates in a cluster from being merged
4 participants