Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCS Fs implementation #331

Merged
merged 7 commits into from Dec 22, 2021
Merged

Add GCS Fs implementation #331

merged 7 commits into from Dec 22, 2021

Conversation

0xmichalis
Copy link
Collaborator

@0xmichalis 0xmichalis commented Dec 22, 2021

Continuation of #296

Changes vs the original PR:

  • Based on the latest master
  • Uses client as its base abstraction, not bucket => bucket is thus a part of path, the first component in it (!)
  • Has kinks and quirks of the GCS API "ironed out"
  • Has its own test suite now, based off the TarFS one with extra parts to test the writing/folders making scenarios
  • The suite allows for testing both with mocks and with the real bucket (see commented out block in the tests setup)
  • File modes are exactly those that are provided during files/directories opening. If not provided, defaults to 0755

Bonus:

  • Supports gs://<bucket>/<path> URLs too, even though it's explicitly excluded from tests, since afero in general does not work with URLs.

The original PR notes, which still apply to this version of the implementation + extra for this version:

Limitations:

  • No Chmod support - The GCS ACL could probably be mapped to *nix style permissions but that would add another level of complexity and is ignored in this version.
  • No Chtimes support - Could be simulated with attributes (gcs a/m-times are set implicitly) but that's is left for another version.
  • NOTE: Not thread safe - Also assumes all file operations are done through the same instance of the GcsFs. File operations between different GcsFs instances are not guaranteed to be consistent.

Performance implications

  • Sequential reads are performant
  • Sequential writes are performant.
  • Seek + Read or ReadAt is performant after the initial seek. (produces a warning)
  • Alternating reads/writes to the same file handler are highly inefficient. To get consistent FS behavior using an API that separates readers and writers we close any open readers before an write as well close open writers before a read (ensure the data is committed).
  • Seek + Write such as WriteAt, Truncate, Seek+Write will work as expected but with significant overhead. Doing a seek + write will in effect download the old file/object, overlay it with the new writes and save it back. This is done in a streaming fashion so large files will not clog the memory but will trigger a full download and upload of the file/object.

@0xmichalis 0xmichalis changed the title Gcs Fs Add GCS Fs implementation Dec 22, 2021
@0xmichalis 0xmichalis merged commit d70f944 into spf13:master Dec 22, 2021
@0xmichalis 0xmichalis deleted the gcs branch December 22, 2021 10:09
@drakkan
Copy link
Contributor

drakkan commented Dec 27, 2021

Hi,

with this PR merged all applications that include afero (many applications include it indirectly, for example as a viper dependency) increase their binary size of about 3.5MB.

Could you please move all GCS related imports to the gcsfs package the same way you do for sftpfs? This way only the users that really want to interact with a GCS bucket will have these additional dependencies, thank you

@0xmichalis
Copy link
Collaborator Author

Hi @drakkan, feel free to open a PR and I will merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants