Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to exclude some candidates in a cluster from being merged #105

Closed
tfmorris opened this issue Oct 15, 2012 · 17 comments · Fixed by #6448
Closed

Option to exclude some candidates in a cluster from being merged #105

tfmorris opened this issue Oct 15, 2012 · 17 comments · Fixed by #6448
Assignees
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column imported from old code repo Issue imported from Google Code in 2010 logic Changes to the data model, to the way operations, expressions work Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Priority: Medium Represents important issues that need to be addressed but are not urgent Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@tfmorris
Copy link
Member

tfmorris commented Oct 15, 2012

Original author: iainsproat (June 30, 2010 19:18:19)

In the cell cluster & merge dialog, it should be possible to exclude individual candidates from being part of a single cluster.

An example I came across was the following cluster:
"Paris 1, Paris 2, Paris 3, Piraeus, Paris 4....".
If I merge this then Piraeus is erroneously changed to Paris.
I'd like an option to exclude Piraeus from being merged.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=105

@tfmorris
Copy link
Member Author

From stefano.mazzocchi@gmail.com on July 01, 2010 07:55:20:
hmm, what about just using external recon to do this?

@tfmorris
Copy link
Member Author

From iainsproat on July 01, 2010 08:48:54:
Using external recon loses all the benefits of doing merge & cluster.

If the reconciliation engine doesn't give a high score to the cell value, e.g. 'Paris 1' is not given a high match as being the city of Paris, I'd have to go through each of the Paris cells and ensure it gets reconciled to the correct Paris topic. Not ideal.

@thadguidry thadguidry added the clustering Issues related to the clustering operation, to merge similar values in a text column label Apr 18, 2018
@wetneb wetneb added the design proposal needed a precise proposal for a user interface for this feature would be welcome label Mar 7, 2023
@Critic-A
Copy link

Critic-A commented Mar 8, 2023

Hi @wetneb
I am Kritika, an Outreachy applicant. I explored the Open Refine interface and played around and analysed the various features and UX and I would like to contribute to this issue.

Issue: Individual candidates cannot be excluded from being part of a single cluster under the current UX
(I'm attaching a mockup of the example mentioned in the earlier comment)
Group 1

Suggestion: When the user selects merge option for a particular cluster, all individual values in the cluster should get selected by default to be merged. The user can deselect an individual candidate to exclude it from being a part of the cluster

So this how the new flow would look like-

STEP 1: User selects merge option
Group 2

STEP 2: User deselects the individual candidate to be excluded from the cluster
Group 3

STEP 3: User executes the merge operation

Link to Figma file: https://www.figma.com/file/IaIjW1z4YtAgE4Vr6LGcMR/Outreachy%2FOpenRefine%2F%23105?node-id=0%3A1&t=63ibwTqhEIlCNR6C-1

I'd really like to know your inputs on this suggestion.
Cheers✌️

@wetneb wetneb removed the design proposal needed a precise proposal for a user interface for this feature would be welcome label Mar 8, 2023
@Critic-A
Copy link

Critic-A commented Mar 8, 2023

@lozanaross @wetneb
This is in continuation of the previous comment and the "Welcome and guidelines to the Outreachy UX design internship applicants" discussion on the OpenRefine Discourse Forum.

I have recorded and posted some of my work on this issue. I would like to request that this issue be assigned to me.
Or if you think I'd be able to contribute on some other issue, I'll be glad to know that too.
Thank you

@lozanaross lozanaross added the outreachy design New or existing issues associated with the Outreachy design internship label Mar 8, 2023
@Critic-A
Copy link

Hi @lozanaross @wetneb
If you could please spare some time and review my first comment in this thread, that'll be great and I'll work on improving it.
P.S. I understand that you already have a lot on your plate and feedbacks take time, so thanks in advance :P

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 13, 2023

@Critic-A I like it! It feels very natural to me.

One slight worry is that we need to dynamically change bullet points into checkboxes, which is perhaps not so elegant because the bullet points and the checkboxes might not take the exact same space, so it is likely that it will change the layout of the content slightly. I wonder if it would be an option to only have checkboxes? They could be disabled if the corresponding merge option has not been selected yet.

Also, because the user first needs to click the Merge checkbox, for me it would seem a bit more natural to have it on the left-hand side of the cluster candidates (because I read left to right). I am not sure if that should be swapped for right-to-left languages (I do not think we do this sort of customization anywhere in OpenRefine, but maybe we should?)

@Critic-A
Copy link

@wetneb
Yes, having disabled check boxes instead of bullets would be great.

Regarding having the merge checkbox on the left, I agree with your suggestion.

Infact, I was actually thinking that we can change the layout a bit. The primary focus here is on the individual candidate and what they'll look like when they get merged. The cluster size and row count follows them in the importance hierarchy.

So the layout could look something like (from left to right)

  • Merge checkbox
  • Individual candidates with checkbox
  • New cell value
  • Cluster size
  • Row count

Would love to know your views. I'll post the wireframe iterations in a bit.

@Critic-A
Copy link

Critic-A commented Mar 13, 2023

Hi @wetneb
Here's a draft of the iterations I tried

As I mentioned in my previous comment, I tried changing the layout and introduced checkboxes instead of bullet points.

All fields requiring click-based interaction from the user's end are placed towards the left end.
The user flow looks as follows:

Selecting the 'merge' checkbox
🔽
All the previously disabled checkboxes of individual cluster values get selected
🔽
Unselect the individual values you wish to not include
🔽
The value of cluster size and row count changes
🔽
Enter new cell value
🔽
Execute the cluster function

Also, if the user unselects an individual candidate and then proceeds to unslect the merge option, the selection in the individual candidates remains intact. Eg- if I unselected Paris 1 and then unselected Merge for that cluster, the Paris 1 would remain unselected till any other further operation.

Here are the two iterations. The difference lies in how the disabled state of the checkbox looks like if the merge option is not selected yet. I feel if the checkboxes are already checked even in the default state, it is catching the user's eye and creating a monotony which isn't necessary.

g1

g2

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 13, 2023

Thanks a lot! It looks good to me. I agree with your assessment on the default state of checkboxes.
Let's see what @lozanaross thinks of the whole proposal.

@ostephens
Copy link
Sponsor Member

Generally this looks good to me but I have a couple of questions:

  • Should the Row Count and Cluster size change as the user changes the values include in the cluster?
  • If the user unselects some values in the cluster then uses the "Browse this cluster" option, are the unchecked values included?

If the answer to these is No then I wonder if we can make it clearer that the selections are about whether the item is included in the Merge, rather than whether it is included in the Cluster?

@Critic-A
Copy link

Critic-A commented Mar 14, 2023

@ostephens @lozanaross @wetneb
Cluster size represents the number of individual values in the cluster, and the row count represents the sum of the rows in each value in the cluster.
So as the user changes the number of values to be included in the cluster, in my opinion, it would be helpful to update the values in these two columns as it represents the impact that the cluster operation will have when executed. Also, retaining the original values does not contribute much to enhancing the UX.

Similarly, if the user unselects some values in the cluster and then uses the "Browse this cluster" option, the unchecked values should not be included.

@lozanaross
Copy link

Thanks @Critic-A, I like the design proposal above, it is logical and looks fitting. I would also recommend the behaviour you describe below as this is what I would also expect as a user.

@ostephens @lozanaross @wetneb Cluster size represents the number of individual values in the cluster, and the row count represents the sum of the rows in each value in the cluster. So as the user changes the number of values to be included in the cluster, in my opinion, it would be helpful to update the values in these two columns as it represents the impact that the cluster operation will have when executed. Also, retaining the original values does not contribute much to enhancing the UX.

Similarly, if the user unselects some values in the cluster and then uses the "Browse this cluster" option, the unchecked values should not be included.

@wetneb @ostephens do you foresee any problems with implementing this type of dynamic changing of values to be included in the cluster?

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 15, 2023

It sounds intuitively doable.

@ostephens
Copy link
Sponsor Member

From a behaviour perspective the proposal makes sense to me

From a practical perspective I think it should be possible although I'm slightly concerned that it could have a performance impact (I'm not sure if you'd have to recalculate all the clusters, or if you could just update the single cluster that has been changed by the selection)

@Critic-A
Copy link

Hi @ostephens @wetneb @lozanaross
Does this issue require more contribution from my side?
If not, I'd be glad to be assigned to any other issue which you think I'd be able to contribute to
Cheers!

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 16, 2023

Yes I think this is a consensual and actionable design proposal, thank you very much!

@wetneb wetneb removed the outreachy design New or existing issues associated with the Outreachy design internship label Mar 12, 2024
@zyadtaha
Copy link
Contributor

zyadtaha commented Mar 12, 2024

Could you assign this to me ? @wetneb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column imported from old code repo Issue imported from Google Code in 2010 logic Changes to the data model, to the way operations, expressions work Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Priority: Medium Represents important issues that need to be addressed but are not urgent Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants