Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python-bindings|prover_disk|tests: Add pickle support for DiskProver #303

Closed

Conversation

xdustinface
Copy link
Contributor

This is needed for plots caching in chia-blockchain.

Based on #302

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new constructor should validate its inputs, and have tests ensuring it rejects invalid inputs correctly. (I.e. we shouldn't trust the pickled file to be uncorrupted or trustworthy)

python-bindings/chiapos.cpp Show resolved Hide resolved
python-bindings/chiapos.cpp Show resolved Hide resolved
python-bindings/chiapos.cpp Outdated Show resolved Hide resolved
src/prover_disk.hpp Show resolved Hide resolved
src/prover_disk.hpp Show resolved Hide resolved
@xdustinface xdustinface force-pushed the pr-pickle-disk-prover branch 3 times, most recently from fc8b780 to df9990d Compare August 9, 2021 17:07
@arvidn
Copy link
Contributor

arvidn commented Aug 9, 2021

I think it's a bit weird to lock the mutex in the move constructor, just like in the destructor. Another thread better not have a reference to the object as it's about to be destructed. If it does, locking the murex won't help.

Some of the getters seems to copy the vectors a bit gratuitously as well.

@arvidn
Copy link
Contributor

arvidn commented Aug 9, 2021

My comment in the G1 pickle support applies here too. Can we rely on pickle being a stable format across python versions?

We need to ensure we don't accidentally break backwards compatibility.

@xdustinface
Copy link
Contributor Author

xdustinface commented Aug 9, 2021

Another thread better not have a reference to the object as it's about to be destructed.

Hmm better not yeah but how to prevent it? If we don't lock it in the move constructor we could end up moving the object while another thread has the mutex locked i.e. its still fetching a proof or so (relies on the data we are about to move), no?

My comment in the G1 pickle support applies here too. Can we rely on pickle being a stable format across python versions?

We need to ensure we don't accidentally break backwards compatibility.

The soon coming change in chia-blockchain will cache the following data structure:

@dataclass
class Cache:
    version: int
    plots: Dict[Path, PlotInfo]
    plot_filename_paths: Dict[str, Tuple[str, Set[str]]]
    failed_to_open_filenames: Dict[Path, int]
    no_key_filenames: Set[Path]

which contains the version field i guess that's good enough?

@arvidn
Copy link
Contributor

arvidn commented Aug 9, 2021

Another thread better not have a reference to the object as it's about to be destructed.

Hmm better not yeah but how to prevent it? If we don't lock it in the move constructor we could end up moving the object while another thread has the mutex locked i.e. its still fetching a proof or so (relies on the data we are about to move), no?

My point is that locking the mutex won't help in that case. We have UB either way. So, we could assert that the mutex isn't locked maybe, but fundamentally, I think we just have to assume the users of this class aren't doing shady stuff with it.

We need to ensure we don't accidentally break backwards compatibility.

which contains the version field i guess that's good enough?

If the pickle file format changes, you won't ever get to see the version number.

Using our Streamable format may be an alternative, but that would create a dependency on the chia-blockchain repo which would be a hassle.

@xdustinface
Copy link
Contributor Author

My point is that locking the mutex won't help in that case. We have UB either way. So, we could assert that the mutex isn't locked maybe, but fundamentally, I think we just have to assume the users of this class aren't doing shady stuff with it.

I mean, technically there is still a possible case where its no UB if we lock the mutex like if we have two threads and we trigger stop for both, one of them has currently the mutex locked and the other wants to move the object to a shutdown thread or whatever idea one might have, right? But if we don't lock the mutex this case would be an UB. Not that this is super likely but its there and a reason to just lock it unless i miss something.

If the pickle file format changes, you won't ever get to see the version number.

You can still always use older protocols if there will be a non-backwards compatible protocol bump at some point.

https://docs.python.org/3/library/pickle.html#module-pickle
....
The pickle serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary.
....

Using our Streamable format may be an alternative, but that would create a dependency on the chia-blockchain repo which would be a hassle.

Thats what i started with... but not to add Streamable as dependency here but rather just to implement a __bytes__ and a from_bytes for DiskProver in the python bindings by adding serialization to the C++ part and have DiskProver serialized as Streamable in chia-blockchain then. But this also required to change a lot stuff in the Streamable implementation because it doesn't support Dict, Set and the dynamic custom types at this point. I got the custom types working kind of by prepending the size of the serialized data in the pybind implementation before returning in __bytes__ but for the Dict part i ended up in a blind end at some point where i wasn't able to de-serialize it because of some weird "isinstance can't be used for nested type annotations" thing. I even already spent quite some time implementing basic serialization for the C++ part here xdustinface@a84ad93 and for DiskProver itself here xdustinface@748c3df. Thats what i also wanted to use for the implementation of the pickle support initially but basically the moment i wanted to submit the PR for the serialization part i realized its simpler to just add another constructor for DiskProver to make it simpler and add less new code 😄

@xdustinface xdustinface marked this pull request as ready for review August 10, 2021 00:48
@lgtm-com
Copy link

lgtm-com bot commented Aug 10, 2021

This pull request introduces 1 alert when merging 44494f8 into 97aa105 - view on LGTM.com

new alerts:

  • 1 for Comparison of narrow type with wide type in loop condition

@arvidn
Copy link
Contributor

arvidn commented Aug 13, 2021

I think the EnsureValid() patch should be reverted.

Generally, classes should avoid using 2-phase initialization. If a constructor fails to establish the class invariant, it should fail. That way, all instances of a class are known good, no half-initialized objects.

It seems the reason why you want the DiskProver to have such "invalid" or half-initialized state is because the de-serialization code (de-pickling) isn't controlled by you, and it doesn't support the concept of individual items in a list failing, without failing everything.

The modern alternative to 2-phase initialization is to use std::optional<DiskProver>. If all of that is too impractical, I think the impact of the compromise to use 2-phase initialization should at least be mitigated and isolated. The EnsureValid() patch exposes the possibly-uninitialized nature of this class to all of its users, that's why I think it should be reverted.

exposing is_valid() to allow filtering out objects that failed to load immediately after de-pickling seems like a reasonable compromise.

@xdustinface xdustinface force-pushed the pr-pickle-disk-prover branch 2 times, most recently from 8f3bee4 to f01ba97 Compare August 24, 2021 02:22
@xdustinface
Copy link
Contributor Author

xdustinface commented Aug 24, 2021

@arvidn I restructured the commit history a bit and put some stuff into separate commits.

The modern alternative to 2-phase initialization is to use std::optional<DiskProver>. If all of that is too impractical, I think the impact of the compromise to use 2-phase initialization should at least be mitigated and isolated. The EnsureValid() patch exposes the possibly-uninitialized nature of this class to all of its users, that's why I think it should be reverted.

Unfortunately it seems like its not possible to use std::optional with pybind/pickle here. I tried a bit but couldn't get it working in reasonable time. If you have a solution feel free to provide a commit. Otherwise, see below.

I think the EnsureValid() patch should be reverted.

I refactored it, see b395860. Note that the alternative constructor still throws the same way like it did before but the new DiskProver::Validate is not longer enforced by all class methods, it just throws if it gets called + there is still the non-throwing DiskProver::IsValid.

and a `is_valid` binding.

Allows to validate if the prover's file exists and matches expectations.
Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how or why one would use the Validate() or is_valid() functions. It would seem simpler to just attempt to look up a proof and have it fail. It's fine to defer the failure until we actually need to do the lookup, right?

@@ -47,74 +47,52 @@ struct plot_header {
// The DiskProver, given a correctly formatted plot file, can efficiently generate valid proofs
// of space, for a given challenge.
class DiskProver {
const size_t nTableBeginPointerCount{11};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to add an immutable member like this. This makes DiskProver objects not movable.
Did you mean to make this static?
Also, I don't think we should use hungarian notation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes DiskProver objects not movable.

What do you mean? Maybe not by default but its still movable 🤷‍♂️ However, i agree, it makes sense to have it static, see aa66d3a.

Also, I don't think we should use hungarian notation.

I dropped it in 96f0445 to stay consistent with the codebase, but whats your point here? I'm just curious because i'm used to always use it for primitives or standard container types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is widespread agreement that encoding type information in variable names is undesirable, as documented in the cpp core guidelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Maybe not by default but its still movable

Right, I noticed later that you implement the move constructor explicitly, and just don't touch this member.

for (uint32_t i = 0; i < c2_entries - 1; i++) {
SafeRead(disk_file, c2_buf, c2_size);
this->C2.push_back(Bits(c2_buf, c2_size, c2_size * 8).Slice(0, k).GetValue());
uint32_t nExpectedSize = ((this->table_begin_pointers[10] - this->table_begin_pointers[9]) / (Util::ByteAlign(k) / 8)) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 94cad1a

} else {
throw std::invalid_argument("Invalid plot file format");
if (k < kMinPlotSize || k > kMaxPlotSize) {
throw std::invalid_argument("Invalid k: " + std::to_string(k));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment above this constructor suggests the inputs are not validated, but the seem to be, at least to some extent. Are there some aspects left that aren't validated, or is the comment outdated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not say inputs are not validated. It says the inputs are not validated against the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

std::vector<uint64_t> table_begin_pointers;
std::vector<uint64_t> C2;

void Read(std::vector<uint8_t>& id_in, std::vector<uint8_t>& memo_in, uint8_t& k_in, std::vector<uint64_t>& table_begin_pointers_in, std::vector<uint64_t>& C2_in) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names of these parameters confuse me. The _in suffix suggests they are inparameters, but they all appear to be output parameters. Did you consider returning them in a tuple, as multiple return values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a tuple would make this more confusing than better. See a9ca083.

SafeRead(disk_file, memo_in.data(), memo_in.size());

table_begin_pointers_in = std::vector<uint64_t>(nTableBeginPointerCount, 0);
C2_in = std::vector<uint64_t>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only benefit I can think of, of passing these vectors in as out-parameters is that they can reuse memory they may already have allocated. However, when you assign them like this, rather than .resize(0), you defeat that on benefit.

I would think, though, that these allocations are cheap anyway, and that it would make sense to return them instead of taking them as out-parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use the resize approach here because i wanted to save one line 😄 (setting the contents or at least the first element of table_begin_pointers_out to zero) and it obviously doesn't really matter given the size of this vectors.

std::vector<uint64_t> table_begin_pointers;
std::vector<uint64_t> C2;

void Read(std::vector<uint8_t>& id_in, std::vector<uint8_t>& memo_in, uint8_t& k_in, std::vector<uint64_t>& table_begin_pointers_in, std::vector<uint64_t>& C2_in) const
{
std::lock_guard<std::mutex> lock(_mtx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mutex didn't used to be locked when reading this, as far as I can tell. This is only called from the filename constructor, right? I see it's also called from Validate(). It looks like this mutex is unneccessary though, but we can leave that for a future patch.

C2_in.push_back(Bits(c2_buf, c2_size, c2_size * 8).Slice(0, k).GetValue());
}

delete[] c2_buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this leaks if SafeRead() throws an exception. This was an issue before your patch, I just want to make a note to remember to fix it later

@xdustinface
Copy link
Contributor Author

It's not clear to me how or why one would use the Validate() or is_valid() functions. It would seem simpler to just attempt to look up a proof and have it fail. It's fine to defer the failure until we actually need to do the lookup, right?

I'm using it to drop invalid provers after cache loading, see Chia-Network/chia-blockchain@3b0b971#diff-1c950ad1eb2a79b51cb4f61cf39b8fe8d09d28ad963ce7dd24107053ff316d14R116

@arvidn
Copy link
Contributor

arvidn commented Aug 24, 2021

I'm using it to drop invalid provers after cache loading,

It would seem to me that doing that would defeat the purpose of caching these entries in the first place. If we're reading it back from the files unconditionally, to validate it, couldn't we just do that to begin with, and not also read the pickled cache?

Perhaps I'm missing something

@arvidn
Copy link
Contributor

arvidn commented Aug 27, 2021

is this PR abandoned now?

@xdustinface
Copy link
Contributor Author

is this PR abandoned now?

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants