New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHIP-0033: Add additional partial headers #114
base: main
Are you sure you want to change the base?
CHIP-0033: Add additional partial headers #114
Conversation
Thanks, @felixbrucker! This CHIP has been assigned CHIP-33. It is now a |
Let me add a few use cases where this would greatly help the pool operator day-to-day:
(If it makes sense to add any of that directly into the CHIP, anyone please feel free to re-use the above as you see fit.) |
Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For? One problem is that the harvester may be harvesting plots that are not associated with your pool at all - so the various |
We do, but that's usually not helpful, as they typically all share the same outgoing public IP.
Yes, that's true for the capacity/plot count headers and is also addressed in the CHIP: "It will make sense to also add capacities and plot counts scoped to the plotnft of the proof of the partial". It'd probably make sense only sending these stats scoped to the plotnft/launcherid of the partial. |
X-Forwarded-For should have the original internal IP address in most cases - it does for me (although in my case it is an IPv6 addy). At least this is my understanding... |
I might suggest using some RFC language - perhaps saying something like implementations SHOULD (MUST?) only send |
X-Forwarded-For is added by HTTP proxies in the chain, and they use the IP they see. If you have a private/NATed IP (as is typical for IPv4 end user sites), all a proxy (or the final HTTP server) will see is the public IP you were NATed to. (The NAT itself operates on a lower layer and doesn't add HTTP headers.) For IPv6 the situation can be slightly different, if you have a full public subnet (like a /64 or /56) delegated to your end user site and your internal devices actually use that. (I assume that your IPv6 address being visible externally falls under this case.) |
Yes, as early already said, the farmer ip is neither suitable nor available for identification of the farmer (i'm assuming that's the point of bringing it up?) |
In the chip i just documented the current PRs state, earl has code for scoped capacity/plot counts based on the PR iirc, which can and should be integrated. Whether they are additive (both scoped and unscoped are included) or replacing (unscoped headers are not present) i'll leave up for debate, i'm fine with either. They should have distinctive names tho, indicating that it is the scoped capacity/plot count (which is the case for earls code) Maybe it makes sense to link it in here as well, and merge into the PR based on what the community prefers (see above). |
My concern here is that providing the farmer peer id, this proposal makes it harder for me as a farmer to protect my privacy. Not all farmers want to "show their hand", and make public the total amount of space they are farming. As it is, with the harvester id in the header, I have to run different harvesters to mask my total space: otherwise that value can be used to correlate me across different plotnfts and even across different pools. Running multiple harvesters is well-documented, so that is not an unsurmountable burden, and large farmers generally do that anyway. But if I now also have to run multiple farmers to avoid being correlated, that is a whole additional level of effort. If this proposal goes through, then people running a normal single-farmer setup will suddenly be completely exposed for the amount of space they have. I think privacy should be the default. We shouldn't be causing people to automatically give out information that can can be used to identify them. Maybe there are other approaches that could accomplish the same thing on an opt-in basis? Something as simple as a user-settable farmer "name" that is only returned if set? Then people can choose to allow their farmers to be identified if they want to. Ideally we'd also replace harvester id with a similar setup, but I suspect that's no longer possible at this point. |
Is this referring to the estimated space on pool A and B with different plot nfts but same farmer being able to be matched, assuming the farmer id is publicly accessible through both pools?
To be fair users of a normal setup generally do not split their harvesters to mask their capacity on different pools, and as such are already "completely exposed".
We can already identify the pseudonym by launcher id and harvester id, farmer id is just another part of the already established chain of software components in a farming setup which is not identified yet.
Having a name is confusing, because a name is not an identifier. If the concern is that the farmer id is matchable across pools why not combine it with a part of the plotnft, then it is scoped per plotnft but unique per farmer. We just need to make sure some part of it is still easily identifiable/matching the farmer peer id for humans, so users can match a pool side farmer to a physical farmer and give it an appropriate name.
That would be a breaking change in the pooling protocol |
Let me recap, just to make sure I understand the scenario correctly: a person farms a certain amount of space and wants to mask the total amount. So this person splits this space over several different plotnfts (to circumvent the obvious matching by launcher id) and also uses multiple different harvesters (to circumvent matching by harvester id). That person may also distribute the different plotnfts over different pools. For this case, the proposed change would clearly be a change in identifiability. The person would have to adapt their setup to also use multiple different farmers, in analogue to the harvesters, to circumvent matching by farmer id. Did I get that right? |
Yes, exactly. Thank you for being so much more succinct. |
Yes.
True, small farmers probably don't care. But multiple harvesters is very common on >100TiBe farms.
Right! The last part of that chain of software components is the node id, at which point the farmer would be completely identified. And to avoid identification, the farmer would then have to run completely separate farms, which would be a real pain.
I think the idea you're proposing is to concatenate the plotnft id and the farmer id, then hash that? I agree that would work.
I know. And that makes it impossible. I don't know what requirement prompted it to be included originally, but I would have proposed not including it, if I had been aware at the time. |
Yes, but the reasoning for multiple harvesters is generally not to mask their capacity on different pools but to split resources/hdd access |
Agreed! I think we're saying the same thing on this point. :-) |
Thanks for clarifying, @fizpawiz. So a suggested way forward:
|
I like this idea, personally. Would love to do have the option to do the same with the harvester id.
I'm still confused why anyone wants the stats. Self-reported values can be manipulated, especially when the code reporting them is open-source. What happens when someone tweaks their rig to report more space than they have, then falsely claim the pool is ripping them off, pointing to the statistics as "proof"? Or tweaks to under-report space, and then claims to have found some "weird trick" to improve returns? "I'll tell you the trick for just $99!" The partials are the proof of space, which can be trusted. If the reported space is used for ranking farmers on the pool leaderboard, you can be sure people will "adjust" their reported space to move up. Now the pools will need to add code to catch people faking their reported space, and decide what to do when they detect it. And what if it is close, and so it is hard to be sure? Most pools take 1%, so a small tweak by an unscrupulous large farmer can really make the pool look bad, or good, at the whim of the farmer. And accusations would be difficult, or impossible, to prove one way or the other, making such accusations in public potentially very sticky from a PR perspective.
Yes, that would be appropriate and important if sending statistics. |
There are many reasons, for example users want to see how their farm performs compared to its actual capacity, users want to see/get notified if their plot count changes/drops, users want to monitor (re)plotting progress .. etc If you don't need it you can disable it, not a problem.
Nothing, nothing happens
Nobody said anything of the like, as you said its a self reported metric that can not be trusted, why would anyone use it for leaderboards
pools use effort to track their performance, not unverifiable user reported metrics, so no he can not, he can only make his own farm look good or bad
Nothing changes to how it is right now with the exception that the farmer in question would screenshot his pool harvester page instead of his chia gui to show his "real" capacity |
What are the next steps for this? It has been a while since the last update. Can we review/merge the PR in chia-blockchain now? |
...
Makes sense to me. I worry about pools using this self-reported info, but that's up to them to protect from users gaming the pool by tweaking the stats. I think we landed on a plan forward, @xearl4 so clearly described above. Would love to also have the option to mask the harvester id too. I don't actually know the CHIP process very well. What is the next step, and who takes it? |
At this point it sounds like we have general consensus from the community (other than potentially masking the harvester ID), and we have an implementation in place. I'll move the CHIP to I'll leave the CHIP in Assuming no additional changes are needed after two weeks in Best case scenario, this CHIP is finalized and the implementation is in |
This CHIP is now in |
This is the corresponding CHIP for Chia-Network/chia-blockchain#17788