Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare fastfield format for null index #1691

Merged
merged 5 commits into from Nov 28, 2022
Merged

prepare fastfield format for null index #1691

merged 5 commits into from Nov 28, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Nov 22, 2022

With the upcoming null handling/sparse data, the format will change for fast fields.
Serialize a null index footer that reflects a Full column, to avoid compatibility issues for the next tantivy version

@fulmicoton
Copy link
Collaborator

fulmicoton commented Nov 22, 2022

I don't understand what this PR is about. Can you edit the description and the commit message? @PSeitz

@PSeitz
Copy link
Contributor Author

PSeitz commented Nov 23, 2022

I don't understand what this PR is about. Can you edit the description and the commit message? @PSeitz

I modified the commit message

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is putting us on the right track for backward compatibility.

Here are the issue I have:

  • The footer is constant length.
  • It implements stuff for one future that is not there.
  • The footer is 11 bytes long and replicated for every column. once we have sparse columns we may end up with a lot of columns. 11 bytes is a bit wasteful (not so terrible but well)

I think we could...
End the fastfield file by a codec version and a magic number.

A codec object would have a fn load(&self, data: OwnedBytes) -> Result<_, _> instead of the load method in the fastfield_codecs crate.

@PSeitz
Copy link
Contributor Author

PSeitz commented Nov 24, 2022

It implements stuff for one future that is not there.

In just describes the current state, which will be compatible with the future.

The footer is 11 bytes long and replicated for every column. once we have sparse columns we may end up with a lot of columns. 11 bytes is a bit wasteful (not so terrible but well)

It's 18bytes, that's ~2kb for 100columns, I don't think this will ever be relevant. No problem to change it though.

I think we could...
End the fastfield file by a codec version and a magic number.

I think that's a good idea in general and should be added.

The downside to do that instead, is to introduce a new version and with it all the annoying version handling when one will be fine.

A codec object would have a fn load(&self, data: OwnedBytes) -> Result<_, _> instead of the load method in the fastfield_codecs crate.

We already have that

fn open_from_bytes(mut data: OwnedBytes, header: NormalizedHeader) -> io::Result<Self::Reader>

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2022

Codecov Report

Merging #1691 (7ce6abe) into main (600548f) will decrease coverage by 0.00%.
The diff coverage is 92.07%.

@@            Coverage Diff             @@
##             main    #1691      +/-   ##
==========================================
- Coverage   94.05%   94.05%   -0.01%     
==========================================
  Files         256      258       +2     
  Lines       49253    49400     +147     
==========================================
+ Hits        46324    46461     +137     
- Misses       2929     2939      +10     
Impacted Files Coverage Δ
fastfield_codecs/src/format_version.rs 59.25% <59.25%> (ø)
fastfield_codecs/src/null_index_footer.rs 97.77% <97.77%> (ø)
common/src/serialize.rs 86.90% <100.00%> (+0.48%) ⬆️
fastfield_codecs/src/compact_space/mod.rs 96.78% <100.00%> (+0.02%) ⬆️
fastfield_codecs/src/lib.rs 98.89% <100.00%> (+<0.01%) ⬆️
fastfield_codecs/src/serialize.rs 87.44% <100.00%> (+0.80%) ⬆️
ownedbytes/src/lib.rs 98.65% <100.00%> (+0.02%) ⬆️
src/fastfield/mod.rs 99.73% <100.00%> (ø)
src/schema/schema.rs 98.79% <0.00%> (+0.15%) ⬆️
src/fastfield/multivalued/mod.rs 99.22% <0.00%> (+0.77%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

self.cardinality.serialize(writer)?;
self.null_index_codec.serialize(writer)?;
VInt(self.null_index_byte_range.start).serialize(writer)?;
VInt(self.null_index_byte_range.end).serialize(writer)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to encoding end-start instead of end

@fulmicoton fulmicoton merged commit 1119e59 into main Nov 28, 2022
@fulmicoton fulmicoton deleted the prepare_ff_format branch November 28, 2022 08:15
ppodolsky pushed a commit to izihawa/tantivy that referenced this pull request Dec 8, 2022
* prepare fastfield format for null index
* add format version for fastfield
* Update fastfield_codecs/src/compact_space/mod.rs
* switch to variable size footer
* serialize delta of end
This was referenced Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants