Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon memory usage grows to 8 GB after 5 hours of file adding and pinning #9437

Closed
3 tasks done
pio2398 opened this issue Nov 30, 2022 · 1 comment
Closed
3 tasks done
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@pio2398
Copy link

pio2398 commented Nov 30, 2022

Checklist

Installation method

third-party binary

Version

Kubo version: 0.16.0
Repo version: 12
System version: amd64/linux
Golang version: go1.19.3

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/34538",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
 "Bootstrap": [
   "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
   "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
 ],
 "DNS": {
   "Resolvers": {}
 },
 "Datastore": {
   "BloomFilterSize": 0,
   "GCPeriod": "1h",
   "HashOnRead": false,
   "Spec": {
     "mounts": [
       {
         "child": {
           "path": "blocks",
           "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
           "sync": true,
           "type": "flatfs"
         },
         "mountpoint": "/blocks",
         "prefix": "flatfs.datastore",
         "type": "measure"
       },
       {
         "child": {
           "compression": "none",
           "path": "datastore",
           "type": "levelds"
         },
         "mountpoint": "/",
         "prefix": "leveldb.datastore",
         "type": "measure"
       }
     ],
     "type": "mount"
   },
   "StorageGCWatermark": 90,
   "StorageMax": "10GB"
 },
 "Discovery": {
   "MDNS": {
     "Enabled": false
   }
 },
 "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWEHGcBURfi9hMCv8zGyX3UMpKWqGMSfSNSDjho7J4fWqo"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
 "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

My IPFS was unstable so I decide to remove all config and data and start new fresh instance.
I started by adding some local content and pining some remote and IPFS was killed by oomd. Next try also ended with usage more than 8 GB of ram.

diag: ipfs/QmU3EWqCxYsMN3EkuuMgeMnSsvPGW55NPfn3i9jU7BAJ93

@pio2398 pio2398 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Nov 30, 2022
@Jorropo
Copy link
Contributor

Jorropo commented Nov 30, 2022

Thx, I have looked at your profile and only 3.8GiB of memory is alive on the heap.
I guess you run with the default GOGC value of 200% that means go will only attempt GC when your heap size is 200% bigger than the previous GC run result, however 3.8GiB * 2 = 7.6GiB, and I guess ~400MiB isn't enough to run the rest of your system.
In other words, IPFS is only using half of the ram, the other half is dead values that havn't been reclaimed by Go yet.

Go recently introduced https://pkg.go.dev/runtime/debug#SetMemoryLimit you can set it with GOMEMLIMIT=6GiB (6GiB because you have 8GiB of ram, so it should leave ~1GiB of ram free for the rest of the OS) while starting a go program, this will force a GC to happen when you use more than 6GiB, this can be a performance killer if you use more than 6GiB because then you essentially run the GC permanently, but if you use 5.5GiB, it will run the GC more often to compensate (instead of OOMing). It's like dynamically reducing GOGC when you are about to reach the memory limit.

My test to confirm this behaviour was:

package main

import "runtime/debug"
import "runtime"
import "os"

var leak []byte // lots of memory alive to bias GOGC through higher values

func main() {
	// update freeMemory to the memory on your system
	const freeMemory = 40 * 1024 * 1024 * 1024
	const target = freeMemory / 3 * 2 // try to use two third of the system os (because with default 200% GOGC it will oom before GCing)

	const garbage = 1024 * 1024
	leak = make([]byte, target - garbage)
	for i := range leak {
		leak[i] = 1 // memset to force page commit
	}

	debug.SetMemoryLimit(freeMemory) // coment that line to test with normal GOGC behaviour

	os.Stdout.WriteString("initial leak setupped, now generating garbage!\n")

	var keepAlive []byte
	for i := freeMemory/garbage * 3; i != 0; i-- {
		// run this in for a while, try to generate 3 times more garbage than memory we have
		keepAlive = make([]byte, garbage)
		for i := range keepAlive {
			keepAlive[i] = 1 // memset to force page commit
		}
		runtime.Gosched() // simulate some IO, let the GC maybe run
	}
	leak = keepAlive
}

Using debug.SetMemoryLimit did fix OOMs in this synthetic test by running the GC more often (run with GODEBUG=gctrace=1).

For now the mitigation I'll recomend is manually setting GOMEMLIMIT environment variable to slightly less than how much free memory while starting Kubo.
In the future hopefully we can configure this automagically with #8798.

@Jorropo Jorropo closed this as completed Nov 30, 2022
@Jorropo Jorropo removed the need/triage Needs initial labeling and prioritization label Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

2 participants