CHIP-0033: Add additional partial headers #114

felixbrucker · 2024-03-29T10:22:56Z

This is the corresponding CHIP for Chia-Network/chia-blockchain#17788

danieljperry · 2024-03-29T22:40:00Z

Thanks, @felixbrucker! This CHIP has been assigned CHIP-33. It is now a Draft. Feel free to leave a review here, or to discuss it in the #chips channel of our Discord.

xearl4 · 2024-03-30T21:57:32Z

Let me add a few use cases where this would greatly help the pool operator day-to-day:

chia-harvester-version: Very recently, a third-party harvester software was discovered to be not forming blocks properly when also pooling. Users of this harvester thus performed an inadvertent dead-weight attack on the pool. As pool operators, we can currently neither support nor block these users. Exposing the harvester version would fix that. This becomes a more pressing concern once CHIP-22 adoption increases.
chia-farmer-peer-id: The pool reward payout address is set by the farmer process. We regularly have support cases where users are running multiple farmer processes which have different payout addresses configured and then race to update the payout address each time they contact the pool. Users are then wondering why some of their payouts go to a seemingly wrong payout address. chia-farmer-peer-id would allow us to assist users by telling them which farmer process has the wrong payout address configured.
chia-farmer-peer-id: More generally than 2., many of our users run a (maybe somewhat unusual) setup of running farmer and harvester processes on each machine with directly attached disks. For example, a farmer with 10 storage nodes uses a single head node to run a full node and runs one farmer process and one harvester process on each storage node. We currently have no way to discern between the farmer nodes, and thus can mostly have to rely on the information sent by the farmer process which contacted us most recently. With the peer ID, we'd have an identifier to associate the information sent by the farmer with, and improve the information shown to pool users. We can properly show version and payout address per farmer peer, for example. We could also aggregate pool stats upwards from harvester to farmer to launcher id.

(If it makes sense to add any of that directly into the CHIP, anyone please feel free to re-use the above as you see fit.)

emlowe · 2024-04-02T16:13:58Z

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

One problem is that the harvester may be harvesting plots that are not associated with your pool at all - so the various chia-harvester settings may be sending your information about the farmer that is unrelated entirely to your pool.

xearl4 · 2024-04-02T17:56:31Z

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

We do, but that's usually not helpful, as they typically all share the same outgoing public IP.

One problem is that the harvester may be harvesting plots that are not associated with your pool at all - so the various chia-harvester settings may be sending your information about the farmer that is unrelated entirely to your pool.

Yes, that's true for the capacity/plot count headers and is also addressed in the CHIP: "It will make sense to also add capacities and plot counts scoped to the plotnft of the proof of the partial". It'd probably make sense only sending these stats scoped to the plotnft/launcherid of the partial.

emlowe · 2024-04-02T20:45:24Z

Don't you get the IP address of the farmer when it contacts the pool - either in the originating source address or maybe X-Forwarded-For?

We do, but that's usually not helpful, as they typically all share the same outgoing public IP.

X-Forwarded-For should have the original internal IP address in most cases - it does for me (although in my case it is an IPv6 addy). At least this is my understanding...

emlowe · 2024-04-02T21:10:42Z

Yes, that's true for the capacity/plot count headers and is also addressed in the CHIP: "It will make sense to also add capacities and plot counts scoped to the plotnft of the proof of the partial". It'd probably make sense only sending these stats scoped to the plotnft/launcherid of the partial.

I might suggest using some RFC language - perhaps saying something like implementations SHOULD (MUST?) only send chia-harvester information for plots that are specific to this pool and not for other plots in use by the same harvester

xearl4 · 2024-04-03T12:52:36Z

X-Forwarded-For should have the original internal IP address in most cases - it does for me (although in my case it is an IPv6 addy). At least this is my understanding...

X-Forwarded-For is added by HTTP proxies in the chain, and they use the IP they see. If you have a private/NATed IP (as is typical for IPv4 end user sites), all a proxy (or the final HTTP server) will see is the public IP you were NATed to. (The NAT itself operates on a lower layer and doesn't add HTTP headers.)

For IPv6 the situation can be slightly different, if you have a full public subnet (like a /64 or /56) delegated to your end user site and your internal devices actually use that. (I assume that your IPv6 address being visible externally falls under this case.)

felixbrucker · 2024-04-04T16:01:09Z

Yes, as early already said, the farmer ip is neither suitable nor available for identification of the farmer (i'm assuming that's the point of bringing it up?)

felixbrucker · 2024-04-04T16:05:45Z

I might suggest using some RFC language - perhaps saying something like implementations SHOULD (MUST?) only send chia-harvester information for plots that are specific to this pool and not for other plots in use by the same harvester

In the chip i just documented the current PRs state, earl has code for scoped capacity/plot counts based on the PR iirc, which can and should be integrated. Whether they are additive (both scoped and unscoped are included) or replacing (unscoped headers are not present) i'll leave up for debate, i'm fine with either. They should have distinctive names tho, indicating that it is the scoped capacity/plot count (which is the case for earls code)

Maybe it makes sense to link it in here as well, and merge into the PR based on what the community prefers (see above).

fizpawiz · 2024-04-08T14:42:30Z

My concern here is that providing the farmer peer id, this proposal makes it harder for me as a farmer to protect my privacy. Not all farmers want to "show their hand", and make public the total amount of space they are farming. As it is, with the harvester id in the header, I have to run different harvesters to mask my total space: otherwise that value can be used to correlate me across different plotnfts and even across different pools. Running multiple harvesters is well-documented, so that is not an unsurmountable burden, and large farmers generally do that anyway. But if I now also have to run multiple farmers to avoid being correlated, that is a whole additional level of effort. If this proposal goes through, then people running a normal single-farmer setup will suddenly be completely exposed for the amount of space they have.

I think privacy should be the default. We shouldn't be causing people to automatically give out information that can can be used to identify them. Maybe there are other approaches that could accomplish the same thing on an opt-in basis? Something as simple as a user-settable farmer "name" that is only returned if set? Then people can choose to allow their farmers to be identified if they want to. Ideally we'd also replace harvester id with a similar setup, but I suspect that's no longer possible at this point.

felixbrucker · 2024-04-08T16:07:45Z

@fizpawiz

Not all farmers want to "show their hand", and make public the total amount of space they are farming.

Is this referring to the estimated space on pool A and B with different plot nfts but same farmer being able to be matched, assuming the farmer id is publicly accessible through both pools?

If this proposal goes through, then people running a normal single-farmer setup will suddenly be completely exposed for the amount of space they have.

To be fair users of a normal setup generally do not split their harvesters to mask their capacity on different pools, and as such are already "completely exposed".

We shouldn't be causing people to automatically give out information that can can be used to identify them.

We can already identify the pseudonym by launcher id and harvester id, farmer id is just another part of the already established chain of software components in a farming setup which is not identified yet.

Maybe there are other approaches that could accomplish the same thing on an opt-in basis? Something as simple as a user-settable farmer "name" that is only returned if set?

Having a name is confusing, because a name is not an identifier. If the concern is that the farmer id is matchable across pools why not combine it with a part of the plotnft, then it is scoped per plotnft but unique per farmer. We just need to make sure some part of it is still easily identifiable/matching the farmer peer id for humans, so users can match a pool side farmer to a physical farmer and give it an appropriate name.

Ideally we'd also replace harvester id with a similar setup, but I suspect that's no longer possible at this point.

That would be a breaking change in the pooling protocol

xearl4 · 2024-04-08T17:58:03Z

My concern here is that providing the farmer peer id, this proposal makes it harder for me as a farmer to protect my privacy. Not all farmers want to "show their hand", and make public the total amount of space they are farming.

Let me recap, just to make sure I understand the scenario correctly: a person farms a certain amount of space and wants to mask the total amount. So this person splits this space over several different plotnfts (to circumvent the obvious matching by launcher id) and also uses multiple different harvesters (to circumvent matching by harvester id). That person may also distribute the different plotnfts over different pools.

For this case, the proposed change would clearly be a change in identifiability. The person would have to adapt their setup to also use multiple different farmers, in analogue to the harvesters, to circumvent matching by farmer id.

Did I get that right?

fizpawiz · 2024-04-17T15:20:29Z

Did I get that right?

Yes, exactly. Thank you for being so much more succinct.

fizpawiz · 2024-04-17T15:34:07Z

Is this referring to the estimated space on pool A and B with different plot nfts but same farmer being able to be matched, assuming the farmer id is publicly accessible through both pools?

Yes.

To be fair users of a normal setup generally do not split their harvesters to mask their capacity on different pools, and as such are already "completely exposed".

True, small farmers probably don't care. But multiple harvesters is very common on >100TiBe farms.

We can already identify the pseudonym by launcher id and harvester id, farmer id is just another part of the already established chain of software components in a farming setup which is not identified yet.

Right! The last part of that chain of software components is the node id, at which point the farmer would be completely identified. And to avoid identification, the farmer would then have to run completely separate farms, which would be a real pain.

Having a name is confusing, because a name is not an identifier. If the concern is that the farmer id is matchable across pools why not combine it with a part of the plotnft, then it is scoped per plotnft but unique per farmer. We just need to make sure some part of it is still easily identifiable/matching the farmer peer id for humans, so users can match a pool side farmer to a physical farmer and give it an appropriate name.

I think the idea you're proposing is to concatenate the plotnft id and the farmer id, then hash that? I agree that would work.

That would be a breaking change in the pooling protocol

I know. And that makes it impossible. I don't know what requirement prompted it to be included originally, but I would have proposed not including it, if I had been aware at the time.

felixbrucker · 2024-04-17T15:47:47Z

True, small farmers probably don't care. But multiple harvesters is very common on >100TiBe farms.

Yes, but the reasoning for multiple harvesters is generally not to mask their capacity on different pools but to split resources/hdd access

fizpawiz · 2024-04-17T21:40:04Z

Yes, but the reasoning for multiple harvesters is generally not to mask their capacity on different pools but to split resources/hdd access

Agreed! I think we're saying the same thing on this point. :-)

xearl4 · 2024-04-17T22:58:48Z

Thanks for clarifying, @fizpawiz.

So a suggested way forward:

Send a "masked" farmer ID calculated as hash(launcher id | farmer peer id) instead of the plain farmer peer ID. It's important to make this masked ID easily retrievable via GUI and CLI to allow users to look up how local farmer processes are identifying to a pool.
Add a pool_send_farm_stats config option, which defaults to true. If enabled (the default), basic summary statistics are sent to the pool with each partial. If disabled, no statistics are sent with partials.
If statistics are sent with a partial, these statistics are scoped / limited to the harvester and plotnft of the partial.

fizpawiz · 2024-04-18T15:19:10Z

Send a "masked" farmer ID calculated as hash(launcher id | farmer peer id) instead of the plain farmer peer ID. It's important to make this masked ID easily retrievable via GUI and CLI to allow users to look up how local farmer processes are identifying to a pool.

I like this idea, personally. Would love to do have the option to do the same with the harvester id.

Add a pool_send_farm_stats config option, which defaults to true. If enabled (the default), basic summary statistics are sent to the pool with each partial. If disabled, no statistics are sent with partials.

I'm still confused why anyone wants the stats. Self-reported values can be manipulated, especially when the code reporting them is open-source. What happens when someone tweaks their rig to report more space than they have, then falsely claim the pool is ripping them off, pointing to the statistics as "proof"? Or tweaks to under-report space, and then claims to have found some "weird trick" to improve returns? "I'll tell you the trick for just $99!" The partials are the proof of space, which can be trusted. If the reported space is used for ranking farmers on the pool leaderboard, you can be sure people will "adjust" their reported space to move up. Now the pools will need to add code to catch people faking their reported space, and decide what to do when they detect it. And what if it is close, and so it is hard to be sure? Most pools take 1%, so a small tweak by an unscrupulous large farmer can really make the pool look bad, or good, at the whim of the farmer. And accusations would be difficult, or impossible, to prove one way or the other, making such accusations in public potentially very sticky from a PR perspective.

If statistics are sent with a partial, these statistics are scoped / limited to the harvester and plotnft of the partial.

Yes, that would be appropriate and important if sending statistics.

felixbrucker · 2024-04-18T16:07:04Z

I'm still confused why anyone wants the stats

There are many reasons, for example users want to see how their farm performs compared to its actual capacity, users want to see/get notified if their plot count changes/drops, users want to monitor (re)plotting progress .. etc

If you don't need it you can disable it, not a problem.

Self-reported values can be manipulated, especially when the code reporting them is open-source. What happens when someone tweaks their rig to report more space than they have

Nothing, nothing happens
as you said its a self reported metric, purely informational to the user

If the reported space is used for ranking farmers on the pool leaderboard

Nobody said anything of the like, as you said its a self reported metric that can not be trusted, why would anyone use it for leaderboards

so a small tweak by an unscrupulous large farmer can really make the pool look bad, or good, at the whim of the farmer

pools use effort to track their performance, not unverifiable user reported metrics, so no he can not, he can only make his own farm look good or bad

And accusations would be difficult, or impossible, to prove one way or the other, making such accusations in public potentially very sticky from a PR perspective.

Nothing changes to how it is right now with the exception that the farmer in question would screenshot his pool harvester page instead of his chia gui to show his "real" capacity

xearl4 · 2024-04-19T20:14:55Z

I'm still confused why anyone wants the stats.

The stats can be extremely useful in assisting farmers to analyze their farms. As a matter of fact, besides the obvious thing pools do (smoothen payout, spread risk), providing farmers insights into their farm is a major feature of many Chia pools. This serves a very similar purpose to a farm monitoring setup based on Prometheus and Grafana. As such, the farmer-sent stats have purely informational value.

Basically it's the difference between showing a user solely their estimated space:

Or also being able to show them their actual effective space for reference:

We have heard from many users that they'd like to have that feature. We also heard from many former Flexpool users that that's one thing they liked in particular about Flexpool's proprietary flexfarmer.

felixbrucker · 2024-05-13T11:19:48Z

What are the next steps for this? It has been a while since the last update. Can we review/merge the PR in chia-blockchain now?

fizpawiz · 2024-05-13T18:09:06Z

I'm still confused why anyone wants the stats.

The stats can be extremely useful in assisting farmers to analyze their farms. As a matter of fact, besides the obvious thing pools do (smoothen payout, spread risk), providing farmers insights into their farm is a major feature of many Chia pools. This serves a very similar purpose to a farm monitoring setup based on Prometheus and Grafana. As such, the farmer-sent stats have purely informational value.

...

We have heard from many users that they'd like to have that feature. We also heard from many former Flexpool users that that's one thing they liked in particular about Flexpool's proprietary flexfarmer.

Makes sense to me. I worry about pools using this self-reported info, but that's up to them to protect from users gaming the pool by tweaking the stats. I think we landed on a plan forward, @xearl4 so clearly described above. Would love to also have the option to mask the harvester id too. I don't actually know the CHIP process very well. What is the next step, and who takes it?

danieljperry · 2024-05-14T01:45:36Z

What is the next step, and who takes it?

At this point it sounds like we have general consensus from the community (other than potentially masking the harvester ID), and we have an implementation in place. I'll move the CHIP to Review.

I'll leave the CHIP in Review for at least a week, and ask for any additional feedback from the community. If everything is looking good, I can move the CHIP into Last Call and we can work on getting the PR#17788 merged.

Assuming no additional changes are needed after two weeks in Last Call, I can move the CHIP to Final. The PR will then be included when we cut the next release branch, so it will be slated for the next release.

Best case scenario, this CHIP is finalized and the implementation is in main three weeks from now. This is subject to change pending community feedback.

danieljperry · 2024-05-14T02:09:28Z

This CHIP is now in Review. Please leave your reviews here.

Add add-additional-partial-headers chip

5b289c4

felixbrucker mentioned this pull request Mar 29, 2024

Add additional partial headers Chia-Network/chia-blockchain#17788

Open

felixbrucker marked this pull request as ready for review March 29, 2024 14:54

Assign Chia-Network#33 and open as a Draft

cbfc711

danieljperry changed the title ~~Add additional partial headers~~ CHIP-0033: Add additional partial headers Mar 29, 2024

Move CHIP-33 to Review

9c8df11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHIP-0033: Add additional partial headers #114

CHIP-0033: Add additional partial headers #114

felixbrucker commented Mar 29, 2024

danieljperry commented Mar 29, 2024

xearl4 commented Mar 30, 2024

emlowe commented Apr 2, 2024

xearl4 commented Apr 2, 2024 •

edited

emlowe commented Apr 2, 2024

emlowe commented Apr 2, 2024

xearl4 commented Apr 3, 2024

felixbrucker commented Apr 4, 2024 •

edited

felixbrucker commented Apr 4, 2024

fizpawiz commented Apr 8, 2024

felixbrucker commented Apr 8, 2024

xearl4 commented Apr 8, 2024

fizpawiz commented Apr 17, 2024

fizpawiz commented Apr 17, 2024 •

edited

felixbrucker commented Apr 17, 2024

fizpawiz commented Apr 17, 2024

xearl4 commented Apr 17, 2024

fizpawiz commented Apr 18, 2024

felixbrucker commented Apr 18, 2024

xearl4 commented Apr 19, 2024

felixbrucker commented May 13, 2024

fizpawiz commented May 13, 2024

danieljperry commented May 14, 2024

danieljperry commented May 14, 2024

CHIP-0033: Add additional partial headers #114

Are you sure you want to change the base?

CHIP-0033: Add additional partial headers #114

Conversation

felixbrucker commented Mar 29, 2024

danieljperry commented Mar 29, 2024

xearl4 commented Mar 30, 2024

emlowe commented Apr 2, 2024

xearl4 commented Apr 2, 2024 • edited

emlowe commented Apr 2, 2024

emlowe commented Apr 2, 2024

xearl4 commented Apr 3, 2024

felixbrucker commented Apr 4, 2024 • edited

felixbrucker commented Apr 4, 2024

fizpawiz commented Apr 8, 2024

felixbrucker commented Apr 8, 2024

xearl4 commented Apr 8, 2024

fizpawiz commented Apr 17, 2024

fizpawiz commented Apr 17, 2024 • edited

felixbrucker commented Apr 17, 2024

fizpawiz commented Apr 17, 2024

xearl4 commented Apr 17, 2024

fizpawiz commented Apr 18, 2024

felixbrucker commented Apr 18, 2024

xearl4 commented Apr 19, 2024

felixbrucker commented May 13, 2024

fizpawiz commented May 13, 2024

danieljperry commented May 14, 2024

danieljperry commented May 14, 2024

xearl4 commented Apr 2, 2024 •

edited

felixbrucker commented Apr 4, 2024 •

edited

fizpawiz commented Apr 17, 2024 •

edited