Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colon (:) in in object_store::path::{Path} is not handled on Windows #5592

Open
thomasfrederikhoeck opened this issue Apr 4, 2024 · 6 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@thomasfrederikhoeck
Copy link

Describe the bug
When creating a Path on Windows colon (:) is not sanitized like < and |.

To Reproduce

use object_store::path::{Path};

fn main() {
    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
    // let path: Result<Path, object_store::path::Error> = Path::parse(path);
    let path_from = Path::from(path);
    println!( "{:?}", path_from);

    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
    let path_parse: Result<Path, object_store::path::Error> = Path::parse(path);
    println!( "{:?}", path_parse);

    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\<file|.parquet".to_string();
    // let path: Result<Path, object_store::path::Error> = Path::parse(path);
    let path_from = Path::from(path);
    println!( "{:?}", path_from);

    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\<file|.parquet".to_string();
    let path_parse: Result<Path, object_store::path::Error> = Path::parse(path);
    println!( "{:?}", path_parse);
}
Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03:04:06.000003%5Cfile.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\file.parquet" })
Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03:04:06.000003%5C%3Cfile%7C.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\<file|.parquet" })

See SO for forbidden chars https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names
Expected behavior
I would have expected the output to be

Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03%3A04%3A06.000003%5Cfile.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\file.parquet" })
Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03%3A04%3A06.000003%5C%3Cfile%7C.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\<file|.parquet" })

Additional context
This came up here delta-io/delta-rs#2382

@thomasfrederikhoeck thomasfrederikhoeck changed the title Colon (:) in in object_store::path::{Path} is not handled in Windows Colon (:) in in object_store::path::{Path} is not handled on Windows Apr 4, 2024
@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog and removed bug labels Apr 5, 2024
@tustvold
Copy link
Contributor

tustvold commented Apr 5, 2024

This would be a breaking change but probably makes sense on balance

@thomasfrederikhoeck
Copy link
Author

@tustvold I guess it would be breaking but would anyone rely on a path that can't exists? Or does this sanitizing happen for the same set of chars across all platforms?

@tustvold
Copy link
Contributor

tustvold commented Apr 5, 2024

It is the same set of chars across all platforms, although I suppose we could just add some special logic for windows within LocalFilesystem...

@thomasfrederikhoeck
Copy link
Author

thomasfrederikhoeck commented Apr 8, 2024

@tustvold if you think that solution is good I can give it a go. Might be a little out of my leauge.

@tustvold
Copy link
Contributor

tustvold commented Apr 8, 2024

I think probably we should add specific handling within LocalFilesystem on Windows. Just because Windows filesystems are a complete mess, I don't think that shouldn't leak out. This would also avoid a breaking change

Perhaps we could explicitly escape them or something on Windows platforms? Is there standard way the windows ecosystem works around this limitation?

@thomasfrederikhoeck
Copy link
Author

@tustvold I did a lot of googling and from what I can tell there is no default way. I think it is a good route to go to just explicitly escape them. Maybe just starting with : since I expect it to be much more heavily used for data given it is used for timestamps compared to the other chars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants