New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve Excel importing from URL problem Fixes #6418 #6537
base: master
Are you sure you want to change the base?
Solve Excel importing from URL problem Fixes #6418 #6537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the in-depth debugging this probably required, and welcome to the project!
I would normally ask for a test but in this case I suspect it would be a fairly involved and heavy integration test if we want to test the whole thing.
Or you could perhaps add a simple unit test just checking that getFileSource
behaves as expected on a supplied ObjectNode
? That should intuitively be a lot easier and still be useful, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. As I was afraid of, this causes the provenance information to be lost. The import metadata ends up looking like this:
when the fileSource
field should have the URL in it, like this:
In all other cases, we are able to make this single variable do double duty, but in this case we need to different values, so things are likely going to have to be refactored. Unfortunately, I haven't had the time to look into how extensive that refactoring will need to be.
Feel free to continue to investigate if you want, but we'll also totally understand if you want to pick an easier task to work on as your first contribution.
Thanks for the review. I now get the problem the changes cause. Yes I will like to keep working on this issue yet seeing how deep the refactoring may take it would be better I do it alongside someone that masters the code base better. In this case I think the the test can wait right ? |
For what it's worth I don't find it critical that the Because making this import metadata really contribute to reproducibility is a whole topic on its own and will require larger changes, I don't think it would hurt to merge this as it stands as it solves the bug as far as I can tell (I could reproduce the bug on master some days ago and confirm this PR fixes it). But I am not opposed to looking for other solutions if we want to preserve the exact current behaviour of this field. |
Add a getFileName method in the ImportingUtilities so it can be used to get the name of the file from the fileRecord.
Add a new variable (fileName) in the ImportingParserBase method,that will hold the return value of the new getFileName method of the ImportingUtilities. So it can be parsed to the parseOneFile method instead of the fileSource. This is done to keep the provenance information from being lost.
Okay went off for some while. I just made some changes I don't know if this solution is better as it manages to preserve the provenance information.
if it is still necessary, I can add the unit test for the getFileSource method. |
Thanks for sticking with it! I'l try to review before the weekend. |
Please how can I solve the problem with the checks that are not successful ? |
Sorry for the delay. You can lint your changes so that they pass the formatting check by doing:
(or and then committing the changes that it makes. |
Fixes #6418
Changes proposed in this pull request:
When the Excel importer will be invoked after downloading from the URL, the sheet selection logic will compare against the local filename rather than comparing against the original URL as it had been doing.