Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds command for adding rows #6461

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

steve-kasica
Copy link

@steve-kasica steve-kasica commented Mar 18, 2024

Fixes #1855

Changes proposed in this pull request:

  • Adds a command for adding rows by prepending rows to an existing project.
  • Adds a new model change for inserting new rows into an existing project.

@wetneb wetneb requested a review from tfmorris March 19, 2024 09:27
@wetneb
Copy link
Sponsor Member

wetneb commented Mar 19, 2024

Exciting! Congratulations on your first PR to OpenRefine :) Since that's part of a conversation with @tfmorris I'll let him review first. I am withdrawing the corresponding GSoC project proposal since you are quite close to having completed it already. It was probably a bit too light for GSoC anyway.

@tfmorris tfmorris self-assigned this Mar 19, 2024
Copy link
Member

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thank you very much.

I've made a few comments and suggestions, but my biggest question concerns the front end. Do you also have the UI pieces for this or should we create a separate issue to cover those?

For the backend code that's here, most of my comments are minor, but probably the most important is test coverage for the new command.

@steve-kasica
Copy link
Author

I don't have UI code written yet, but I'm happy to work on it since I've already started thinking about some edge cases to handle w.r.t the UI. I don't mind keeping the UI work in this PR, and not creating a separate issue since it goes hand-in-hand.

@Ahmed-Elgamel
Copy link
Contributor

Ahmed-Elgamel commented Mar 25, 2024

Hi @steve-kasica ,I have the UI finished if you want I can push it to this pr. Also, after I added the UI it seems that whatever number of rows I give to the server it only add one row and it adds it to the beginning of the data not at the end.Is this behavior intended?

@tfmorris
Copy link
Member

tfmorris commented Apr 1, 2024

You can address the linter failures with the command ./refine lint.

For the UI, let's keep things simple to start and just do "Add N rows at the beginning" (which was the original use case) and perhaps "Add N rows at the end." More complicated scenarios (other insertion points, non-blank lines, etc) we can leave for future enhancements.

@steve-kasica
Copy link
Author

Sorry I must have my GitHub notification settings not setup correctly because I wasn't aware of the activity on this PR from the last couple weeks. I was held up on trying to support multiple cell types in the UI, but @wetneb convinced me to ship what I have at the Meetup today. Anyway, I've pushed those UI changes and will tackle the remaining unresolved changes from @tfmorris tomorrow.

@steve-kasica
Copy link
Author

Hi @steve-kasica ,I have the UI finished if you want I can push it to this pr. Also, after I added the UI it seems that whatever number of rows I give to the server it only add one row and it adds it to the beginning of the data not at the end.Is this behavior intended?

@Ahmed-Elgamel Sorry I didn't see this until now. Is there somewhere I can see the UI you've built? I didn't run into those issues with the UI I just pushed, but I bet they stem from serializing new row data from the client and server.

@steve-kasica
Copy link
Author

Here's an example for adding blank rows, with the updated input suggested by @magdmartin during the meetup today.

add-blank-rows.mp4

@wetneb
Copy link
Sponsor Member

wetneb commented Apr 11, 2024

As I mentioned in the community call yesterday, given that it's your first PR in the project I'd intuitively recommend aiming for something simpler first. The scope proposed by @tfmorris above seemed like a good aim to have:

For the UI, let's keep things simple to start and just do "Add N rows at the beginning" (which was the original use case) and perhaps "Add N rows at the end."

which I understood as "let's not let users input any cell data in this dialog for now".

We can try to work on getting your more ambitious proposal merged but I can imagine that the scale of the review comments and changes required can become too much to handle. To summarize briefly what I'd ask for:

  • the UI would need at least styling work to make sure it follows the rest of the look-and-feel of the application, for consistency, and I wouldn't be surprised if more involved design changes come up
  • The java tests are quite minimal as it stands, they should do more assertions on the project data to check its contents (beyond looking at row counts)
  • this is all without having looked at the actual code in detail, where I expect some changes to be necessary
  • nor have I tried it out interactively yet, which generally brings up other sorts of suggested changes :)

If people prefer to stick to the current proposal of letting users input data directly in this dialog, then I can have a go at reviewing this, but want to manage expectations about the follow-up work required that can be significant.

@wetneb
Copy link
Sponsor Member

wetneb commented Apr 15, 2024

Hi @steve-kasica, I wonder if your "thumbs up" reaction on the post above means a preference for:

  1. sticking to the current scope of the PR (including editing), in which case you would like us to continue reviewing it in this state

OR

  1. scaling down to only adding blank rows?

@steve-kasica
Copy link
Author

👍 to rolling back to only adding blank rows.

@steve-kasica
Copy link
Author

I've added a UI for adding blank rows and finished increasing test coverage. I found a bug when reverting changes when row are appended to the project.

Here's a short demo: https://youtu.be/j4iEtsR0JdM

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! That looks a lot more manageable to get merged.

Here are a few screenshots to make it easier for others to chime in about the UI:
image
image

For the dialog, I'd change the "Submit Query" button to "OK", marking it as primary by adding the button-primary class and position it next to "Cancel" in the same way that we have normalized other dialogs in #6482 (it might be sufficient for that to just rebase your branch on top of the latest master branch).

class Row {
#isFlagged;
#isStarred;
#cells;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this class is necessary because since we are only supporting adding blank rows, we should be able to only send the number of blank rows (and not their JSON serialization) to the backend, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do that, but I think in the future we would wish that designed this feature from the beginning to send rows that are blank, as opposed to the number of blank rows, see #6461 (comment).

private static String INDEX_FIELD = "index"; // Field name for index to insert new rows in `change.txt`
private static String EOC = "/ec/"; // end of change marker, not end of file, in `change.txt`

public RowAdditionChange(List<Row> additionalRows, int insertionIndex) {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to only store the number of additional rows, since they are known to be blank. That would be simpler and would save memory.

On the other hand you could say that it's useful to make it possible to support non-blank rows at the change level for later, so that projects created with later versions of OpenRefine could also be read by earlier versions, but that's generally something that's too hard to support reliably to advertise it to users. Also, if we are to support arbitrary rows, we would probably need to serialize the recon pool too, so that rows can contain recon values.

No strong feelings either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have strong feelings in favor of the current implementation, sending the server rows that are blank. This design decision setups future development for supporting non-blank rows. I think adding rows of new data into an existing project is the primary task that this feature will address and padding a table with blank rows is an edge case, although supporting expressive use of data tables is still important and I understand the pragmatic importance of incremental enhancements in a widely used application such as OpenRefine.

I also think OpenRefine benefits from a new model change for a small subset of new rows, as per this TODO item in MassRowChange. RowAdditionChange begins to address it.

// TODO: This replaces all rows in the project with a new set of rows, but if only a small percentage of the rows
// are being changed, it'd be much more efficient to apply a change list of new/modified rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with row data which is required to be blank as long we we're clear about that restriction in the code comments and that the tests match whatever we're claiming to support (I haven't reviewed the tests yet).

@github-actions github-actions bot added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. extension Making it easier to extend OpenRefine's UI and backend gsoc/outreachy Projects proposed for internships. Please hold back from these tasks if you are not elligible. labels Apr 24, 2024
@tfmorris
Copy link
Member

The dialog button changes look good. I've only done cursory testing so far, but my first attempt generated something weird which I wanted to report. I did: 1) prepend, 2) append, 3) append and got the following change history:

[
  {
    "op": "core/row-addition",
    "rows": [
      {
        "starred": false,
        "flagged": false,
        "cells": []
      }
    ],
    "index": 0,
    "description": "Add rows"
  },
  {
    "op": "core/row-addition",
    "rows": [
      {
        "starred": false,
        "flagged": false,
        "cells": []
      }
    ],
    "index": 1623,
    "description": "Add rows"
  },
  {
    "op": "core/row-addition",
    "rows": [
      {
        "starred": false,
        "flagged": false,
        "cells": []
      }
    ],
    "index": 1629,
    "description": "Add rows"
  }
]

I suspect that this was because I was initially in record mode and switched to row mode between steps 2 & 3, since the project ended up with 1625 records and 1630 rows. Obviously append should use the row count, not the record count.

I'll do some more testing and code review shortly...

@steve-kasica
Copy link
Author

steve-kasica commented Apr 25, 2024

Thanks for catching this bug @tfmorris. I can reproduce it. I assumed theProject.rowModel.total would always provide the total rows in a project, even in records mode. The food dataset I was testing on before isn't a good example records-mode data.

I also found another related issue. If I append a row in records mode, it inserts it in the middle of project because it's using record count instead of row count.

image

If I just replace theProject.rowModel.total with theProject.metadata.rowCount, would that fix that issue?

steve-kasica and others added 8 commits April 25, 2024 13:36
Adds a new command, operation, and model change to support prepending a
contiguious block of new rows to an existing project.
Fixes bug where this operation was not being serialized correctly and adds test to
check for this issue.
Adds tests for AddRowsCommand and RowAdditionChange classes
Updates code style of row addition feature classes
with the `./refine lint` command
Changes AddRowsCommand to extend Command rather than
EngineDependentCommand for consistency with existing
core commands.
Adds a basic UI for adding new rows to a project with
Cypress e2e tests.
Co-authored-by: Tom Morris <tfmorris@gmail.com>
Co-authored-by: Tom Morris <tfmorris@gmail.com>
Updates comments and reformats code in add rows dialog JavaScript files.
Modify row addition source code and tests to follow organization
conventions: adds copyright header and updates static variables to
follow naming convention.
Exposes the insertion index parameter for adding a new block of rows at
the operation and command level.
Solves off-by-one error when adding new rows to the
end of the project and also changes exception thrown
when insertion index is out of bounds to IndexOutOfBoundsException
Fixes bug causing throw exception when reverting changes to additional rows
appended to the end of the project and adds tests to catch this bug.
Updates add rows dialog HTML, JavaScript, and Cypress tests to
allow users to add any number of blank rows to the beginning or
end of a project.
Changes dialog footer buttons to match dialog style in OpenRefine#6482
Merges two tests for the response body of a successful request to
the add rows command into one test.
Adds rows and insertion index parameters to serialization and adds tests for both
serialization and deserialization.
Uses the modelChanged parameter to updateOptions argument in `Refine.postCoreProcess`
to trigger UI update, utilizing existing client-side utilities. Also adds e2e test
to test for facet update after adding new rows.
Modifies JavaScript to use `theProject.metadata.rowCount` to specify the last
row index in a project, instead of `theProject.rowModel.total`. Also adds
Cypress e2e test to catch bug when appending rows in records mode caused by
this issue.
@wetneb
Copy link
Sponsor Member

wetneb commented May 27, 2024

My understanding of the status of this PR is that @steve-kasica has responded to reviews. @tfmorris do you still feel responsible for reviewing this or should someone take over?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Making it easier to extend OpenRefine's UI and backend gsoc/outreachy Projects proposed for internships. Please hold back from these tasks if you are not elligible. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement "Prepend rows" command from gokbutils
4 participants