Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

Open
3 tasks done
gvelez17 opened this issue Mar 24, 2023 · 2 comments
Open
3 tasks done
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding

Comments

@gvelez17
Copy link

Checklist

Installation method

ipfs-update or dist.ipfs.tech

Version

0.18.1 on most nodes, and on several involved nodes.  Some nodes on the network may be running earlier versions, since they are not all under our control.

Config

Note, this is only for one node, however it is the one reflected in the cpu graph below.  Other nodes may have different configs.

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5011",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/9011",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4010",
      "/ip4/0.0.0.0/tcp/4011/ws"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "accessKey": "AKIA6CODUIKZYDOFFI4H",
            "bucket": "ceramic-prod-cas-cpc-node",
            "keyTransform": "next-to-last/2",
            "region": "us-east-2",
            "rootDirectory": "ipfs/blocks",
            "secretKey": "wuIfgHwr7pRUXwsaADrWVphU4F2tyE26GK+rx1Ws",
            "type": "s3ds"
          },
          "mountpoint": "/blocks",
          "prefix": "s3.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
        ],
        "ID": "QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
        ],
        "ID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-1-external.3boxlabs.com/tcp/4011/ws/p2p/QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
        ],
        "ID": "QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-2-external.3boxlabs.com/tcp/4011/ws/p2p/QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
        ],
        "ID": "QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-clay-external.3boxlabs.com/tcp/4011/ws/p2p/QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
        ],
        "ID": "QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "",
    "SeenMessagesTTL": "10m"
  },
  "Reprovider": {},
  "Routing": {},
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {
      "Enabled": false
    },
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  },
  "algorithm": "rsa"
}

Description

Related to the Pubsub Flood issue https://github.com/ipfs/kubo/issues/9665, this is specifically to note that when the flood begins, multiple peers become involved simulaneously though with different seqnos, different messages and different origin from peers.

The result of the flood is a near-max of the CPU on our critical IPFS node for Ceramic Anchor Service

image

Is there some network condition that would simultaneously trigger upwards of 20 different nodes to engage in rebroadcasting of different messages? Is there a setting that would help tune this back?

image

We greatly appreciate the nonce validator added by @vyzo in https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Is there perhaps a backoff setting that should also be used when the activity is happening across the network? If a node detects that it is receiving the identical message from >3 peers, should it be pruning its peer list?

Any advice or suggestions for what to try to turn off the pubsub flood very welcome, we can experiment with individual nodes and if a solution is found we can communicate with our user base to at least get it across much of the network.

@gvelez17 gvelez17 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 24, 2023
@gvelez17 gvelez17 changed the title Multiple peers rebroadcast messages during "flood" Multiple peers rebroadcast messages simultaneously during Pubsub Flood Mar 24, 2023
@gvelez17
Copy link
Author

gvelez17 commented Mar 24, 2023

This may not mean anything, but an analysis of about 30 minutes of data seems to show a different pattern for the messages that begin a chain of duplicates than for other messages.

These were determined by finding message groups by seqno, then filtering for ones where the original peer (From:) matched the receivedFrom header, which we exposed in a slightly modified version of kubo just to output this field.

The messages that kick off a chain of rebroadcasts are majority RESPONSE type messages. (In Ceramic, messages are UPDATE, QUERY, RESPONSE or KEEPALIVE)

# counts from the messages that are received from the original peer
# and later result in rebroadcasts
(Pdb) de.typ.value_counts()
2    3903
0     176
3     164
1       8

# all the messages seen in 30 minutes
Name: typ, dtype: int64
(Pdb) df.typ.value_counts()
2    48306
0    22566
1    21393
3     4011

@BigLep
Copy link
Contributor

BigLep commented Mar 25, 2023

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Per #9665 (comment) , https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2 isn't going to make it into master.

I'm not aware of any issues if you include this module version into your own Kubo build though. I'll let @Jorropo comment. I know we didn't (and aren't planning) to bring it into Kubo master because it brakes interop with the JS stack. We're instead deprecated the pubsub commands: #9717

@aschmahmann aschmahmann added need/analysis Needs further analysis before proceeding and removed need/triage Needs initial labeling and prioritization labels May 22, 2023
@Jorropo Jorropo removed their assignment Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

4 participants