Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve Excel importing from URL problem Fixes #6418 #6537

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Kurocifer
Copy link

Fixes #6418

Changes proposed in this pull request:

  • In the getFileSource method of the ImportingUtitilities file, use the fileName as the key to the getString method of the JsonUtilities instead of the url, so it returns the name of the file to be used instead of it's download url.

When the Excel importer will be invoked after downloading from the URL, the sheet selection logic will compare against the local filename rather than comparing against the original URL as it had been doing.

@github-actions github-actions bot added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. XLS(X) About the Excel import / export functionality labels Apr 15, 2024
Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the in-depth debugging this probably required, and welcome to the project!

I would normally ask for a test but in this case I suspect it would be a fairly involved and heavy integration test if we want to test the whole thing.

Or you could perhaps add a simple unit test just checking that getFileSource behaves as expected on a supplied ObjectNode? That should intuitively be a lot easier and still be useful, no?

Copy link
Member

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. As I was afraid of, this causes the provenance information to be lost. The import metadata ends up looking like this:
Screenshot 2024-04-18 at 1 49 15 PM
when the fileSource field should have the URL in it, like this:
Screenshot 2024-04-18 at 1 52 20 PM

In all other cases, we are able to make this single variable do double duty, but in this case we need to different values, so things are likely going to have to be refactored. Unfortunately, I haven't had the time to look into how extensive that refactoring will need to be.

Feel free to continue to investigate if you want, but we'll also totally understand if you want to pick an easier task to work on as your first contribution.

@Kurocifer
Copy link
Author

Thanks for the review. I now get the problem the changes cause.

Yes I will like to keep working on this issue yet seeing how deep the refactoring may take it would be better I do it alongside someone that masters the code base better.

In this case I think the the test can wait right ?

@wetneb
Copy link
Sponsor Member

wetneb commented Apr 23, 2024

For what it's worth I don't find it critical that the fileSource field contains the full URL. As far as I know we don't have any machinery in place to let people really reuse those import metadata objects - we just awkwardly expose them as JSON in the project metadata dialog. For instance, when creating a project from a local file, we will also not have the full path to the file. The clipboard import is similarly unreproducible of course. So in a sense, importing via URLs is the odd one out there, rather than the norm. Currently, users cannot really count on this field to be able to locate back the file from which the project was created.

Because making this import metadata really contribute to reproducibility is a whole topic on its own and will require larger changes, I don't think it would hurt to merge this as it stands as it solves the bug as far as I can tell (I could reproduce the bug on master some days ago and confirm this PR fixes it).

But I am not opposed to looking for other solutions if we want to preserve the exact current behaviour of this field.

Add a getFileName method in the ImportingUtilities so it can be used to
get the name of the file from the fileRecord.
Add a new variable (fileName) in the ImportingParserBase method,that
will hold the return value of the new getFileName method of the
ImportingUtilities. So it can be parsed to the parseOneFile method
instead of the fileSource.

This is done to keep the provenance information from being lost.
@Kurocifer
Copy link
Author

Okay went off for some while. I just made some changes I don't know if this solution is better as it manages to preserve the provenance information.
The recent commits undo the changes in the previous suggestion, and proposes a different solution.

  • Use a new variable "fileName" in the ImportingParserBase file whose value is gotten from the new method getFileName of the ImportingUtitlities, and parse it to the parseOneFile method in ImportingParserBase instead of the fileSource(this variable is still there). So the excel importer can use the name of the file for the comparision.

  • A unit test for the getFileName method.

if it is still necessary, I can add the unit test for the getFileSource method.

@tfmorris
Copy link
Member

tfmorris commented May 3, 2024

Thanks for sticking with it! I'l try to review before the weekend.

@Kurocifer
Copy link
Author

Please how can I solve the problem with the checks that are not successful ?

@tfmorris
Copy link
Member

Sorry for the delay. You can lint your changes so that they pass the formatting check by doing:

./refine lint

(or refine.bat lint)

and then committing the changes that it makes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Issues related to software defects or unexpected behavior, which require resolution. XLS(X) About the Excel import / export functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excel importer doesn't work from URL
3 participants