Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding more date types #5444

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

Lordworms
Copy link
Contributor

supporting more date32 types

adding tests

Which issue does this PR close?

Closes #5442

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

supporting more date32 types

adding tests
@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 29, 2024
@Lordworms
Copy link
Contributor Author

Lordworms commented Feb 29, 2024

I choose to use regex for multi-match, could it cause much efficiency decrease?

@tustvold
Copy link
Contributor

Have you run the date parsing benchmarks against this change, I would expect the code in this PR to represent a fairly significant performance regression

@Lordworms
Copy link
Contributor Author

Have you run the date parsing benchmarks against this change, I would expect the code in this PR to represent a fairly significant performance regression

could you please tell me how to trigger it in CI/CD ?

@tustvold
Copy link
Contributor

tustvold commented Feb 29, 2024

could you please tell me how to trigger it in CI/CD ?

We don't have automated benchmarks in CI/CD at the moment, but they can be run with cargo bench.

In this case something like this should work

RUSTFLAGS="-C target-cpu=native"  cargo bench -p arrow-cast --bench parse_date

@Lordworms
Copy link
Contributor Author

could you please tell me how to trigger it in CI/CD ?

We don't have automated benchmarks in CI/CD at the moment, but they can be run with cargo bench.

In this case something like this should work

RUSTFLAGS="-C target-cpu=native"  cargo bench -p arrow-cast --bench parse_date

Got it, I will test it now, Thanks a lot.

@Lordworms
Copy link
Contributor Author

could you please tell me how to trigger it in CI/CD ?

We don't have automated benchmarks in CI/CD at the moment, but they can be run with cargo bench.

In this case something like this should work

RUSTFLAGS="-C target-cpu=native"  cargo bench -p arrow-cast --bench parse_date

Yeah, that regressed a lot, I will find another way to do it
image

@tustvold
Copy link
Contributor

Marking as a draft for now, feel free to mark ready for review when you would like me to take another look

@tustvold tustvold marked this pull request as draft February 29, 2024 20:15
return None;
}
(digits[5], digits[7] * 10 + digits[8])
use regex::Regex;
Copy link
Contributor

@Dandandan Dandandan Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to precompile / reuse the regex. That might return most of the performance (although manual implementation probably would still be more efficient).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd expect using a regex, particularly with capture groups, to be slower than something hand-rolled here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll add some hand-rolled next week

@v1gnesh
Copy link

v1gnesh commented Mar 7, 2024

@BurntSushi was looking at / thinking about datetime stuff in Rust. Tagging him in case there's room for collab / idea sharing.

Copy link

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BurntSushi was looking at / thinking about datetime stuff in Rust. Tagging him in case there's room for collab / idea sharing.

Was there something specific you had in mind?

(digits[5], digits[7] * 10 + digits[8])
use regex::Regex;
let re = Regex::new(
r"^(?:(\d{1,2})[/\-](\d{1,2})[/\-](\d{1,4})|(\d{1,4})[/\-](\d{1,2})[/\-](\d{1,2}))$",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want \d here? \d is Unicode-aware by default. So in addition to matching 3, for example, it will also match 𝟛.

return None;
}
(digits[5], digits[7] * 10 + digits[8])
use regex::Regex;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd expect using a regex, particularly with capture groups, to be slower than something hand-rolled here.

@v1gnesh
Copy link

v1gnesh commented Mar 8, 2024

Was there something specific you had in mind?

@BurntSushi Not particularly 😅
I just thought that with you mulling over date/time types in Rust, and with similar work (adding date/time types) happening in arrow-rs, there may be opportunities to reduce duplication, share ideas, implementations and such.

@BurntSushi
Copy link

Was there something specific you had in mind?

@BurntSushi Not particularly 😅 I just thought that with you mulling over date/time types in Rust, and with similar work (adding date/time types) happening in arrow-rs, there may be opportunities to reduce duplication, share ideas, implementations and such.

I'm working on a new datetime library. I'm still in the "proving it out stage" and I don't know if it will work out. But if it does, it's months away.

@BurntSushi
Copy link

I think if you have specific questions, I'd be happy to try and field them, but otherwise I don't really know anything about what y'all are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support more date inferences in the csv reader
5 participants