Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV max filesize check (#190) #191

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

daglueck
Copy link

@daglueck daglueck commented Oct 12, 2023

❤️ Thank you for your contribution!

Description

Fixes #190 by means of adding a file size check to invenio_previewer.extensions.csv_papaparsejs.can_preview.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Third-party code

If you've added third-party code (copy/pasted or new dependencies), please reach out to an architect.

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@daglueck
Copy link
Author

For the sake of simplicity I've added the file size check to can_preview rather than adding a separate function like validate_csv.

I've also used 1 * 1024 * 1024 as fallback size (rather than 10 * 1024 * 1024 like in the old CSV previewer) since that is the default value of PREVIEWER_MAX_FILE_SIZE_BYTES

@lnielsen
Copy link
Member

@psaiz Question: The new CSV file previewer is supposed to stream the CSV file so it's not loading the full file right?

@daglueck Given the answer from @psaiz I would say that the limit should be significantly higher to allow for larger CSV files to be previewed - say e.g. 100MB.

@@ -17,7 +17,14 @@

def can_preview(file):
"""Determine if the given file can be previewed."""
return file.is_local() and file.has_extensions(".csv", ".dsv")
max_file_size = current_app.config.get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, would it make sense to have a validator function (i.e. validate_csv) to separate concerns?

This is done for other extensions (e.g. json)

@daglueck
Copy link
Author

Based on your comments, I've added a separate validation function (@alejandromumo ) and increased the max file size to 100 * 1024 * 1024 (@lnielsen ). I've added a new config variable PREVIEWER_CSV_MAX_BYTES to differentiate CSV from JSON/XML. Let me know if you had imagined this in a different way.

@lnielsen I didn't know that a new csv previewer was added. How does it work with the max file size, will it simply not load the file if it's too big (as with XML/JSON I guess) or will it truncate the file (like with TXT)? I was assuming here it won't load the file since otherwise there might be parsing issues. If on the other hand it does truncate the file, the wording of PREVIEWER_CSV_MAX_BYTES might need to be adjusted.

@daglueck daglueck removed their assignment Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PREVIEWER_MAX_FILE_SIZE_BYTES does not apply to CSV files
3 participants