Solve Excel importing from URL problem Fixes #6418 #6537

Kurocifer · 2024-04-15T00:59:37Z

Changes proposed in this pull request:

In the getFileSource method of the ImportingUtitilities file, use the fileName as the key to the getString method of the JsonUtilities instead of the url, so it returns the name of the file to be used instead of it's download url.

When the Excel importer will be invoked after downloading from the URL, the sheet selection logic will compare against the local filename rather than comparing against the original URL as it had been doing.

wetneb

Thanks a lot for the in-depth debugging this probably required, and welcome to the project!

I would normally ask for a test but in this case I suspect it would be a fairly involved and heavy integration test if we want to test the whole thing.

Or you could perhaps add a simple unit test just checking that getFileSource behaves as expected on a supplied ObjectNode? That should intuitively be a lot easier and still be useful, no?

tfmorris

Thanks for working on this. As I was afraid of, this causes the provenance information to be lost. The import metadata ends up looking like this:

when the fileSource field should have the URL in it, like this:

In all other cases, we are able to make this single variable do double duty, but in this case we need to different values, so things are likely going to have to be refactored. Unfortunately, I haven't had the time to look into how extensive that refactoring will need to be.

Feel free to continue to investigate if you want, but we'll also totally understand if you want to pick an easier task to work on as your first contribution.

Kurocifer · 2024-04-20T14:54:12Z

Thanks for the review. I now get the problem the changes cause.

Yes I will like to keep working on this issue yet seeing how deep the refactoring may take it would be better I do it alongside someone that masters the code base better.

In this case I think the the test can wait right ?

wetneb · 2024-04-23T07:55:30Z

For what it's worth I don't find it critical that the fileSource field contains the full URL. As far as I know we don't have any machinery in place to let people really reuse those import metadata objects - we just awkwardly expose them as JSON in the project metadata dialog. For instance, when creating a project from a local file, we will also not have the full path to the file. The clipboard import is similarly unreproducible of course. So in a sense, importing via URLs is the odd one out there, rather than the norm. Currently, users cannot really count on this field to be able to locate back the file from which the project was created.

Because making this import metadata really contribute to reproducibility is a whole topic on its own and will require larger changes, I don't think it would hurt to merge this as it stands as it solves the bug as far as I can tell (I could reproduce the bug on master some days ago and confirm this PR fixes it).

But I am not opposed to looking for other solutions if we want to preserve the exact current behaviour of this field.

Add a getFileName method in the ImportingUtilities so it can be used to get the name of the file from the fileRecord.

Add a new variable (fileName) in the ImportingParserBase method,that will hold the return value of the new getFileName method of the ImportingUtilities. So it can be parsed to the parseOneFile method instead of the fileSource. This is done to keep the provenance information from being lost.

Kurocifer · 2024-04-30T15:53:31Z

Okay went off for some while. I just made some changes I don't know if this solution is better as it manages to preserve the provenance information.
The recent commits undo the changes in the previous suggestion, and proposes a different solution.

Use a new variable "fileName" in the ImportingParserBase file whose value is gotten from the new method getFileName of the ImportingUtitlities, and parse it to the parseOneFile method in ImportingParserBase instead of the fileSource(this variable is still there). So the excel importer can use the name of the file for the comparision.
A unit test for the getFileName method.

if it is still necessary, I can add the unit test for the getFileSource method.

tfmorris · 2024-05-03T04:22:01Z

Thanks for sticking with it! I'l try to review before the weekend.

Kurocifer · 2024-05-06T19:09:06Z

Please how can I solve the problem with the checks that are not successful ?

tfmorris · 2024-05-10T04:26:16Z

Sorry for the delay. You can lint your changes so that they pass the formatting check by doing:

./refine lint

(or refine.bat lint)

and then committing the changes that it makes.

Solve Excel importing from URL problem Fixes OpenRefine#6418

3c96ac9

github-actions bot added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. XLS(X) About the Excel import / export functionality labels Apr 15, 2024

wetneb reviewed Apr 18, 2024

View reviewed changes

tfmorris requested changes Apr 18, 2024

View reviewed changes

Kurocifer added 3 commits April 30, 2024 15:22

fix: issue OpenRefine#6418. Add a getFileName method

ab51969

Add a getFileName method in the ImportingUtilities so it can be used to get the name of the file from the fileRecord.

Add unit test for getFileName method

d5e28b6

Format changes properly to pass formatting check.

791e7cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve Excel importing from URL problem Fixes #6418 #6537

Solve Excel importing from URL problem Fixes #6418 #6537

Kurocifer commented Apr 15, 2024

wetneb left a comment

tfmorris left a comment

Kurocifer commented Apr 20, 2024

wetneb commented Apr 23, 2024

Kurocifer commented Apr 30, 2024

tfmorris commented May 3, 2024

Kurocifer commented May 6, 2024

tfmorris commented May 10, 2024

Solve Excel importing from URL problem Fixes #6418 #6537

Are you sure you want to change the base?

Solve Excel importing from URL problem Fixes #6418 #6537

Conversation

Kurocifer commented Apr 15, 2024

wetneb left a comment

Choose a reason for hiding this comment

tfmorris left a comment

Choose a reason for hiding this comment

Kurocifer commented Apr 20, 2024

wetneb commented Apr 23, 2024

Kurocifer commented Apr 30, 2024

tfmorris commented May 3, 2024

Kurocifer commented May 6, 2024

tfmorris commented May 10, 2024