Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Support s3 #2183

Open
emilylange opened this issue Nov 10, 2019 · 10 comments
Open

Storage: Support s3 #2183

emilylange opened this issue Nov 10, 2019 · 10 comments

Comments

@emilylange
Copy link

It would be awesome to support s3 storage.
s3 seems to be the more common object storage and much easier to set up in case someone wants to self host the object storage backend (e.g. using minio).

@nono
Copy link
Member

nono commented Nov 12, 2019

Yes, it would be nice. If someone wants to contribute, don't hesitate to ask me how to start.

It is a not a priority for our team as a very large majority of the self-hosted don't want an object storage backend (using the local file-systems works for them and is less complicated).

@ramnes
Copy link

ramnes commented Nov 14, 2019

I'm definitely interested in this. I'm currently looking for a "personal cloud" that I can self-host and that my family and myself can easily use. Cozy ticks all the boxes except for the storage, as I definitely don't want to handle it myself. Achieving a proper highly-available and replicated storage is quite time consuming; it's a job for companies, not individuals (and I guess that it's a good selling point for your commercial solution, eh).

I'm thinking of using something based on FUSE to mount a S3-compliant storage as a local filesystem, but I'm concerned by the overall performance as well as the networking costs implied. Also I'm not sure that existing tools correctly support anything else than Amazon S3 itself. Are you aware of people already doing this that shared or could share their experience?

@nono
Copy link
Member

nono commented Nov 14, 2019

If you want a storage managed by a company, you can use Swift. For example, OVHcloud has an offer in its public cloud for it: https://www.ovh.com/world/public-cloud/object-storage/.

But, to be honest, I don't think that having a reliable storage without the same thing for the database, the network, etc. is a big deal. And if you want all of that, it is really difficult to do as an individual. Currently, using the Cozy Cloud offers is the only solution to that but I hope that some organizations can offer alternatives in the future: maybe some non-profit organizations like Framasoft or maybe companies like OVHcloud can be interested by that.

@ramnes
Copy link

ramnes commented Nov 14, 2019

If you want a storage managed by a company, you can use Swift. For example, OVHcloud has an offer in its public cloud for it: https://www.ovh.com/world/public-cloud/object-storage/.

Yes, it's exactly the kind of stuff I was thinking of when writing "something based on FUSE to mount a S3-compliant storage as a local filesystem".

But, to be honest, I don't think that having a reliable storage without the same thing for the database, the network, etc. is a big deal.

I'm not sure of what you're trying to say here, but if I understand you correctly, well, I don't care about a network downtime or the Cozy interface being temporarily unusable due to a database outage. What I really just want is my personal data to be safe all cost, and ideally secure (as in encrypted).

So I want guarantees on the storage, and thus I want to offload this to a company. And here's why some integration between Cozy and something like S3 makes a lot of sense to me and probably others. That would make all of this much simpler.

Also, I don't know how you're currently handling the storage for the hosted service, but such a feature would allow you (the Cozy company) to use something like OpenIO ("cocorico!") or Ceph internally, which are great storage systems that you might want to use if you're not already.

@nono
Copy link
Member

nono commented Nov 14, 2019

For the hosted service, we are using Swift. That's why I've suggested that you can use it. The code is Open Source: https://github.com/cozy/cozy-stack/tree/master/model/vfs/vfsswift. The documentation about that is really poor, but I think it's not too complicated: you just need to configure it in the cozy.yaml configuration file: https://github.com/cozy/cozy-stack/blob/master/cozy.example.yaml#L60 and it should work.

Our devops have looked at Swift and Ceph (I think OpenIO was too young at that time), and Swift was better for our use case (eventual consistency vs strong consistency).

So I want guarantees on the storage, and thus I want to offload this to a company. And here's why some integration between Cozy and something like S3 makes a lot of sense to me and probably others. That would make all of this much simpler.

Well, S3 is an Object Storage, not a file-system. It stores blobs, but there is no directories or filenames. So, is you lose the database, the file contents will be here in S3, but it won't be easy to know that blob 959f39e0-e929-0137-a9f0-543d7eb8149c is the file /Photos/Christmas 2015/img-1234.jpg. With a few files, you can open them one by one and rename them. With a few thousands, I don't think it is a viable option.

And, about mounting something like S3 with fuse, you can but it will be a lot slower as S3 is not meant to be used like that. Maybe Ceph can be used that way: I'm not sure about the performances, but it would be a better candidate than S3 as cephFS is an official project.

@ramnes
Copy link

ramnes commented Nov 14, 2019

For the hosted service, we are using Swift. That's why I've suggested that you can use it. The code is Open Source: https://github.com/cozy/cozy-stack/tree/master/model/vfs/vfsswift. The documentation about that is really poor, but I think it's not too complicated: you just need to configure it in the cozy.yaml configuration file: https://github.com/cozy/cozy-stack/blob/master/cozy.example.yaml#L60 and it should work.

Oh, everything you said before now makes sense to me! I just didn't know that Cozy supported Swift, and thus I didn't understand why you brought it to the discussion in the first place apart from giving an S3 contender for a FUSE filesystem. Anyway, this is really great, and now I understand why S3 isn't a priority for you.

Well, S3 is an Object Storage, not a file-system. It stores blobs, but there is no directories or filenames. So, is you lose the database, the file contents will be here in S3, but it won't be easy to know that blob 959f39e0-e929-0137-a9f0-543d7eb8149c is the file /Photos/Christmas 2015/img-1234.jpg. With a few files, you can open them one by one and rename them. With a few thousands, I don't think it is a viable option.

I know. My point was just that a dowtime of the database is not a problem as long as I can recover.

And, about mounting something like S3 with fuse, you can but it will be a lot slower as S3 is not meant to be used like that. Maybe Ceph can be used that way: I'm not sure about the performances, but it would be a better candidate than S3 as cephFS is an official project.

I know too. Hence me asking for experience feedbacks in my first comment.

All in all, Swift support is really nice and will probably have me join the bandwagon. But for the community, supporting the S3 protocol also would make a lot of sense as it's much more widely used. Half-joke: you could even replace the Swift connector by a S3 connector and then use Swift3 internally so that everyone's happy!

@gedw99
Copy link

gedw99 commented Sep 11, 2023

Maybe just move to arrow and iceberg.

The files are in s3.
The meta of the file system can be anywhere. SQLite, Manifests ( how icebergs does it).

mots all golang too so very easy.

works with good old minio so you can run that locally .

Can also work with local fs using an adapter.

iit btw is also a db. Duckdb can US arose and iceberg . Very very fast.

Duckdb also Runs in s browser as wasm. It’s fully supported and fast .

Thos would radically simplify the architecture and unify it.

i can see that couch db is the rock that was chosen for storage. Duckdb has Json Transformers.
It also has search .
so it might be a a good match for Cozy

So you get a file system and db all based on s3 that is easy to run

it’s multi version too :)

I would help but very bogged down with work- the project goals and code are too class - well done

@jessebot
Copy link

For implementing an s3 backend, would it just be creating another file like the one for swift here?:
https://github.com/cozy/cozy-stack/blob/master/cmd/swift.go

I can probably help with that. Is there anything else I should know?

@nono
Copy link
Member

nono commented Dec 4, 2023

There are other files that should be added/changed:

@jessebot
Copy link

jessebot commented Dec 5, 2023

Thanks, @nono for the in-depth pointers! This is very helpful :)

I will take a shot at this when I have some spare time in a couple of weeks, hopefully, but if anyone else would like to grab this before then, please feel free to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants