Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHIP-0033: Add additional partial headers #114

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

felixbrucker
Copy link

This is the corresponding CHIP for Chia-Network/chia-blockchain#17788

@danieljperry danieljperry changed the title Add additional partial headers CHIP-0033: Add additional partial headers Mar 29, 2024
@danieljperry
Copy link
Contributor

Thanks, @felixbrucker! This CHIP has been assigned CHIP-33. It is now a Draft. Feel free to leave a review here, or to discuss it in the #chips channel of our Discord.

@xearl4
Copy link

xearl4 commented Mar 30, 2024

Let me add a few use cases where this would greatly help the pool operator day-to-day:

  1. chia-harvester-version: Very recently, a third-party harvester software was discovered to be not forming blocks properly when also pooling. Users of this harvester thus performed an inadvertent dead-weight attack on the pool. As pool operators, we can currently neither support nor block these users. Exposing the harvester version would fix that. This becomes a more pressing concern once CHIP-22 adoption increases.

  2. chia-farmer-peer-id: The pool reward payout address is set by the farmer process. We regularly have support cases where users are running multiple farmer processes which have different payout addresses configured and then race to update the payout address each time they contact the pool. Users are then wondering why some of their payouts go to a seemingly wrong payout address. chia-farmer-peer-id would allow us to assist users by telling them which farmer process has the wrong payout address configured.

  3. chia-farmer-peer-id: More generally than 2., many of our users run a (maybe somewhat unusual) setup of running farmer and harvester processes on each machine with directly attached disks. For example, a farmer with 10 storage nodes uses a single head node to run a full node and runs one farmer process and one harvester process on each storage node. We currently have no way to discern between the farmer nodes, and thus can mostly have to rely on the information sent by the farmer process which contacted us most recently. With the peer ID, we'd have an identifier to associate the information sent by the farmer with, and improve the information shown to pool users. We can properly show version and payout address per farmer peer, for example. We could also aggregate pool stats upwards from harvester to farmer to launcher id.

(If it makes sense to add any of that directly into the CHIP, anyone please feel free to re-use the above as you see fit.)

@emlowe
Copy link

emlowe commented Apr 2, 2024

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

One problem is that the harvester may be harvesting plots that are not associated with your pool at all - so the various chia-harvester settings may be sending your information about the farmer that is unrelated entirely to your pool.

@xearl4
Copy link

xearl4 commented Apr 2, 2024

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

We do, but that's usually not helpful, as they typically all share the same outgoing public IP.

One problem is that the harvester may be harvesting plots that are not associated with your pool at all - so the various chia-harvester settings may be sending your information about the farmer that is unrelated entirely to your pool.

Yes, that's true for the capacity/plot count headers and is also addressed in the CHIP: "It will make sense to also add capacities and plot counts scoped to the plotnft of the proof of the partial". It'd probably make sense only sending these stats scoped to the plotnft/launcherid of the partial.

@emlowe
Copy link

emlowe commented Apr 2, 2024

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

We do, but that's usually not helpful, as they typically all share the same outgoing public IP.

X-Forwarded-For should have the original internal IP address in most cases - it does for me (although in my case it is an IPv6 addy). At least this is my understanding...

@emlowe
Copy link

emlowe commented Apr 2, 2024

Yes, that's true for the capacity/plot count headers and is also addressed in the CHIP: "It will make sense to also add capacities and plot counts scoped to the plotnft of the proof of the partial". It'd probably make sense only sending these stats scoped to the plotnft/launcherid of the partial.

I might suggest using some RFC language - perhaps saying something like implementations SHOULD (MUST?) only send chia-harvester information for plots that are specific to this pool and not for other plots in use by the same harvester

@xearl4
Copy link

xearl4 commented Apr 3, 2024

X-Forwarded-For should have the original internal IP address in most cases - it does for me (although in my case it is an IPv6 addy). At least this is my understanding...

X-Forwarded-For is added by HTTP proxies in the chain, and they use the IP they see. If you have a private/NATed IP (as is typical for IPv4 end user sites), all a proxy (or the final HTTP server) will see is the public IP you were NATed to. (The NAT itself operates on a lower layer and doesn't add HTTP headers.)

For IPv6 the situation can be slightly different, if you have a full public subnet (like a /64 or /56) delegated to your end user site and your internal devices actually use that. (I assume that your IPv6 address being visible externally falls under this case.)

@felixbrucker
Copy link
Author

felixbrucker commented Apr 4, 2024

Yes, as early already said, the farmer ip is neither suitable nor available for identification of the farmer (i'm assuming that's the point of bringing it up?)

@felixbrucker
Copy link
Author

I might suggest using some RFC language - perhaps saying something like implementations SHOULD (MUST?) only send chia-harvester information for plots that are specific to this pool and not for other plots in use by the same harvester

In the chip i just documented the current PRs state, earl has code for scoped capacity/plot counts based on the PR iirc, which can and should be integrated. Whether they are additive (both scoped and unscoped are included) or replacing (unscoped headers are not present) i'll leave up for debate, i'm fine with either. They should have distinctive names tho, indicating that it is the scoped capacity/plot count (which is the case for earls code)

Maybe it makes sense to link it in here as well, and merge into the PR based on what the community prefers (see above).

@fizpawiz
Copy link

fizpawiz commented Apr 8, 2024

My concern here is that providing the farmer peer id, this proposal makes it harder for me as a farmer to protect my privacy. Not all farmers want to "show their hand", and make public the total amount of space they are farming. As it is, with the harvester id in the header, I have to run different harvesters to mask my total space: otherwise that value can be used to correlate me across different plotnfts and even across different pools. Running multiple harvesters is well-documented, so that is not an unsurmountable burden, and large farmers generally do that anyway. But if I now also have to run multiple farmers to avoid being correlated, that is a whole additional level of effort. If this proposal goes through, then people running a normal single-farmer setup will suddenly be completely exposed for the amount of space they have.

I think privacy should be the default. We shouldn't be causing people to automatically give out information that can can be used to identify them. Maybe there are other approaches that could accomplish the same thing on an opt-in basis? Something as simple as a user-settable farmer "name" that is only returned if set? Then people can choose to allow their farmers to be identified if they want to. Ideally we'd also replace harvester id with a similar setup, but I suspect that's no longer possible at this point.

@felixbrucker
Copy link
Author

@fizpawiz

Not all farmers want to "show their hand", and make public the total amount of space they are farming.

Is this referring to the estimated space on pool A and B with different plot nfts but same farmer being able to be matched, assuming the farmer id is publicly accessible through both pools?

If this proposal goes through, then people running a normal single-farmer setup will suddenly be completely exposed for the amount of space they have.

To be fair users of a normal setup generally do not split their harvesters to mask their capacity on different pools, and as such are already "completely exposed".

We shouldn't be causing people to automatically give out information that can can be used to identify them.

We can already identify the pseudonym by launcher id and harvester id, farmer id is just another part of the already established chain of software components in a farming setup which is not identified yet.

Maybe there are other approaches that could accomplish the same thing on an opt-in basis? Something as simple as a user-settable farmer "name" that is only returned if set?

Having a name is confusing, because a name is not an identifier. If the concern is that the farmer id is matchable across pools why not combine it with a part of the plotnft, then it is scoped per plotnft but unique per farmer. We just need to make sure some part of it is still easily identifiable/matching the farmer peer id for humans, so users can match a pool side farmer to a physical farmer and give it an appropriate name.

Ideally we'd also replace harvester id with a similar setup, but I suspect that's no longer possible at this point.

That would be a breaking change in the pooling protocol

@xearl4
Copy link

xearl4 commented Apr 8, 2024

My concern here is that providing the farmer peer id, this proposal makes it harder for me as a farmer to protect my privacy. Not all farmers want to "show their hand", and make public the total amount of space they are farming.

Let me recap, just to make sure I understand the scenario correctly: a person farms a certain amount of space and wants to mask the total amount. So this person splits this space over several different plotnfts (to circumvent the obvious matching by launcher id) and also uses multiple different harvesters (to circumvent matching by harvester id). That person may also distribute the different plotnfts over different pools.

For this case, the proposed change would clearly be a change in identifiability. The person would have to adapt their setup to also use multiple different farmers, in analogue to the harvesters, to circumvent matching by farmer id.

Did I get that right?

@fizpawiz
Copy link

Did I get that right?

Yes, exactly. Thank you for being so much more succinct.

@fizpawiz
Copy link

fizpawiz commented Apr 17, 2024

Is this referring to the estimated space on pool A and B with different plot nfts but same farmer being able to be matched, assuming the farmer id is publicly accessible through both pools?

Yes.

To be fair users of a normal setup generally do not split their harvesters to mask their capacity on different pools, and as such are already "completely exposed".

True, small farmers probably don't care. But multiple harvesters is very common on >100TiBe farms.

We can already identify the pseudonym by launcher id and harvester id, farmer id is just another part of the already established chain of software components in a farming setup which is not identified yet.

Right! The last part of that chain of software components is the node id, at which point the farmer would be completely identified. And to avoid identification, the farmer would then have to run completely separate farms, which would be a real pain.

Having a name is confusing, because a name is not an identifier. If the concern is that the farmer id is matchable across pools why not combine it with a part of the plotnft, then it is scoped per plotnft but unique per farmer. We just need to make sure some part of it is still easily identifiable/matching the farmer peer id for humans, so users can match a pool side farmer to a physical farmer and give it an appropriate name.

I think the idea you're proposing is to concatenate the plotnft id and the farmer id, then hash that? I agree that would work.

That would be a breaking change in the pooling protocol

I know. And that makes it impossible. I don't know what requirement prompted it to be included originally, but I would have proposed not including it, if I had been aware at the time.

@felixbrucker
Copy link
Author

True, small farmers probably don't care. But multiple harvesters is very common on >100TiBe farms.

Yes, but the reasoning for multiple harvesters is generally not to mask their capacity on different pools but to split resources/hdd access

@fizpawiz
Copy link

Yes, but the reasoning for multiple harvesters is generally not to mask their capacity on different pools but to split resources/hdd access

Agreed! I think we're saying the same thing on this point. :-)

@xearl4
Copy link

xearl4 commented Apr 17, 2024

Thanks for clarifying, @fizpawiz.

So a suggested way forward:

  • Send a "masked" farmer ID calculated as hash(launcher id | farmer peer id) instead of the plain farmer peer ID. It's important to make this masked ID easily retrievable via GUI and CLI to allow users to look up how local farmer processes are identifying to a pool.
  • Add a pool_send_farm_stats config option, which defaults to true. If enabled (the default), basic summary statistics are sent to the pool with each partial. If disabled, no statistics are sent with partials.
  • If statistics are sent with a partial, these statistics are scoped / limited to the harvester and plotnft of the partial.

@fizpawiz
Copy link

  • Send a "masked" farmer ID calculated as hash(launcher id | farmer peer id) instead of the plain farmer peer ID. It's important to make this masked ID easily retrievable via GUI and CLI to allow users to look up how local farmer processes are identifying to a pool.

I like this idea, personally. Would love to do have the option to do the same with the harvester id.

  • Add a pool_send_farm_stats config option, which defaults to true. If enabled (the default), basic summary statistics are sent to the pool with each partial. If disabled, no statistics are sent with partials.

I'm still confused why anyone wants the stats. Self-reported values can be manipulated, especially when the code reporting them is open-source. What happens when someone tweaks their rig to report more space than they have, then falsely claim the pool is ripping them off, pointing to the statistics as "proof"? Or tweaks to under-report space, and then claims to have found some "weird trick" to improve returns? "I'll tell you the trick for just $99!" The partials are the proof of space, which can be trusted. If the reported space is used for ranking farmers on the pool leaderboard, you can be sure people will "adjust" their reported space to move up. Now the pools will need to add code to catch people faking their reported space, and decide what to do when they detect it. And what if it is close, and so it is hard to be sure? Most pools take 1%, so a small tweak by an unscrupulous large farmer can really make the pool look bad, or good, at the whim of the farmer. And accusations would be difficult, or impossible, to prove one way or the other, making such accusations in public potentially very sticky from a PR perspective.

  • If statistics are sent with a partial, these statistics are scoped / limited to the harvester and plotnft of the partial.

Yes, that would be appropriate and important if sending statistics.

@felixbrucker
Copy link
Author

I'm still confused why anyone wants the stats

There are many reasons, for example users want to see how their farm performs compared to its actual capacity, users want to see/get notified if their plot count changes/drops, users want to monitor (re)plotting progress .. etc

If you don't need it you can disable it, not a problem.

Self-reported values can be manipulated, especially when the code reporting them is open-source. What happens when someone tweaks their rig to report more space than they have

Nothing, nothing happens
as you said its a self reported metric, purely informational to the user

If the reported space is used for ranking farmers on the pool leaderboard

Nobody said anything of the like, as you said its a self reported metric that can not be trusted, why would anyone use it for leaderboards

so a small tweak by an unscrupulous large farmer can really make the pool look bad, or good, at the whim of the farmer

pools use effort to track their performance, not unverifiable user reported metrics, so no he can not, he can only make his own farm look good or bad

And accusations would be difficult, or impossible, to prove one way or the other, making such accusations in public potentially very sticky from a PR perspective.

Nothing changes to how it is right now with the exception that the farmer in question would screenshot his pool harvester page instead of his chia gui to show his "real" capacity

@xearl4
Copy link

xearl4 commented Apr 19, 2024

I'm still confused why anyone wants the stats.

The stats can be extremely useful in assisting farmers to analyze their farms. As a matter of fact, besides the obvious thing pools do (smoothen payout, spread risk), providing farmers insights into their farm is a major feature of many Chia pools. This serves a very similar purpose to a farm monitoring setup based on Prometheus and Grafana. As such, the farmer-sent stats have purely informational value.

Basically it's the difference between showing a user solely their estimated space:
image

Or also being able to show them their actual effective space for reference:
image

We have heard from many users that they'd like to have that feature. We also heard from many former Flexpool users that that's one thing they liked in particular about Flexpool's proprietary flexfarmer.

@felixbrucker
Copy link
Author

What are the next steps for this? It has been a while since the last update. Can we review/merge the PR in chia-blockchain now?

@fizpawiz
Copy link

I'm still confused why anyone wants the stats.

The stats can be extremely useful in assisting farmers to analyze their farms. As a matter of fact, besides the obvious thing pools do (smoothen payout, spread risk), providing farmers insights into their farm is a major feature of many Chia pools. This serves a very similar purpose to a farm monitoring setup based on Prometheus and Grafana. As such, the farmer-sent stats have purely informational value.

...

We have heard from many users that they'd like to have that feature. We also heard from many former Flexpool users that that's one thing they liked in particular about Flexpool's proprietary flexfarmer.

Makes sense to me. I worry about pools using this self-reported info, but that's up to them to protect from users gaming the pool by tweaking the stats. I think we landed on a plan forward, @xearl4 so clearly described above. Would love to also have the option to mask the harvester id too. I don't actually know the CHIP process very well. What is the next step, and who takes it?

@danieljperry
Copy link
Contributor

What is the next step, and who takes it?

At this point it sounds like we have general consensus from the community (other than potentially masking the harvester ID), and we have an implementation in place. I'll move the CHIP to Review.

I'll leave the CHIP in Review for at least a week, and ask for any additional feedback from the community. If everything is looking good, I can move the CHIP into Last Call and we can work on getting the PR#17788 merged.

Assuming no additional changes are needed after two weeks in Last Call, I can move the CHIP to Final. The PR will then be included when we cut the next release branch, so it will be slated for the next release.

Best case scenario, this CHIP is finalized and the implementation is in main three weeks from now. This is subject to change pending community feedback.

@danieljperry
Copy link
Contributor

This CHIP is now in Review. Please leave your reviews here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants