Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating too much memory failed to allocate memory (NoMemoryError) #381

Open
jondavid-schober-sp opened this issue Nov 30, 2021 · 5 comments

Comments

@jondavid-schober-sp
Copy link

jondavid-schober-sp commented Nov 30, 2021

I made sure the issue is in bootsnap

disabling bootsnap fixes the issue

Steps to reproduce

Currently unknown. The issue seems to happen intermittently with an identical build process

Expected behavior

Application should run

Actual behavior

NoMemoryError is thrown

 root@f071d6ec885a:/srv/www/ops-lib# ./ops-cli2.rb
 Traceback (most recent call last):
 bin/rails: failed to allocate memory (NoMemoryError)
 root@f071d6ec885a:/srv/www/ops-lib#

System configuration

Bootsnap version: 1.9.1

Ruby version: 2.7.2

Rails version: 6.1.3.2

I've noticed some very strange behavior regarding bootsnap and don't know where the root is. After updating from 1.5.1 to 1.9.1, and moving tmp to a symlink pointing to /tmp, we started running into an intermittent NoMemoryError upon startup.

I noticed that if I would touch the ruby file, the error would go away. I ran an strace on it, and it appeared to be allocating over 9GB of ram, causing the error to throw

Context: /srv/www/ops-lib/ops-cli2.rb is the name of the script, and contains

#!/usr/bin/env -S rails runner
if ENV['OPSLIB_USE_SERVICES'] == 'true' || ARGV.delete('--use-services')
  require_relative 'scripts/ops_cli_services'
else
  require_relative 'scripts/ops_cli_legacy'
end

/srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92

root@9b99c87f71b2:/srv/www/ops-lib# xxd /srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92
00000000: 0300 0000 f73a cf20 a7f6 8450 b3e2 a2ec  .....:. ...P....
00000010: cb00 0000 0000 0000 a8d9 9261 0000 0000  ...........a....
00000020: a900 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: d9a7 6966 2045 4e56 5b27 4f50 534c 4942  ..if ENV['OPSLIB
00000050: 5f55 5345 5f53 4552 5649 4345 5327 5d20  _USE_SERVICES']
00000060: 3d3d 2027 7472 7565 2720 7c7c 2041 5247  == 'true' || ARG
00000070: 562e 6465 6c65 7465 2827 2d2d 7573 652d  V.delete('--use-
00000080: 7365 7276 6963 6573 2729 2072 6571 7569  services') requi
00000090: 7265 5f72 656c 6174 6976 6520 2773 6372  re_relative 'scr
000000a0: 6970 7473 2f6f 7073 5f63 6c69 5f73 6572  ipts/ops_cli_ser
000000b0: 7669 6365 7327 2065 6c73 6520 7265 7175  vices' else requ
000000c0: 6972 655f 7265 6c61 7469 7665 2027 7363  ire_relative 'sc
000000d0: 7269 7074 732f 6f70 735f 636c 695f 6c65  ripts/ops_cli_le
000000e0: 6761 6379 2720 656e 64                   gacy' end
root@9b99c87f71b2:/srv/www/ops-lib#

You can see after it reads in the first line, it does an mmap of 9296302080 bytes.

After touching the file, it works correctly

strace output:

....
openat(AT_FDCWD, "/srv/www/ops-lib/ops-cli2.rb", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0775, st_size=203, ...}) = 0
openat(AT_FDCWD, "/srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92", O_RDONLY) = 6
read(6, "\3\0\0\0\367:\317 \247\366\204P\263\342\242\354\313\0\0\0\0\0\0\0\250\331\222a\0\0\0\0"..., 64) = 64
read(6, "\331\247if ENV['OPSLIB_USE_SERVICES'] "..., 169) = 169
mmap(NULL, 9296302080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x562c291c4000)                     = 0x5629ff01f000
mmap(NULL, 9296437248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 9296302080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x562c291c4000)                     = 0x5629ff01f000
mmap(NULL, 9296437248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
close(5)                                = 0
close(6)                                = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[1mTraceback\33[m (most recent ca"..., 42Traceback (most recent call last):
) = 42
write(2, "bin/rails: ", 11bin/rails: )             = 11
write(2, "\33[1m", 4)                   = 4
write(2, "failed to allocate memory", 25failed to allocate memory) = 25
write(2, " (", 2 ()                       = 2
write(2, "\33[1;4m", 6)                 = 6
write(2, "NoMemoryError", 13NoMemoryError)           = 13
write(2, "\33[m", 3)                    = 3
write(2, "\33[1m", 4)                   = 4
write(2, ")", 1))                        = 1
write(2, "\33[m", 3)                    = 3
write(2, "\n", 1
)
@byroot
Copy link
Contributor

byroot commented Nov 30, 2021

So your strace suggest that it's the cache for /srv/www/ops-lib/ops-cli2.rb that somehow end up huge.

Couple questions:

  • How big is /srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92 ?
  • How big is RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").to_binary.bytesize ?

@jondavid-schober-sp
Copy link
Author

@byroot I posted the entire file as a hex dump in the original posting. Here is the outputs requested

 root@0311387bfc79:/srv/www/ops-lib# ls -l /srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92
 -rw-r--r-- 1 root root 233 Nov 18 23:47 /srv/www/ops-lib/tmp/cache/bootsnap/compile-cache/b6/5cc8804abfcc92
 root@0311387bfc79:/srv/www/ops-lib#

RubyVM:

 irb(main):001:0> RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").to_binary.bytesize
 => 520
 irb(main):002:0>

@byroot
Copy link
Contributor

byroot commented Nov 30, 2021

Hum, that's really a weird one.

So based on what you say about touching the file, when you do this you actually bypass the cache for that one file. It's regenerated but not used until a subsequent run.

No idea what the bug actually is at this stage, but a few more thing to confirm my suspicion.

First try:

RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").eval

Then in a new irb:

RubyVM::InstructionSequence.load_from_binary(RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").to_binary).eval

If I'm right either both or the last one should fail, and if so you likely ran into a Ruby bug of some sort.

Also if it does reproduce and there's nothing secret in that .rb file, I'd love to see it.

@jondavid-schober-sp
Copy link
Author

jondavid-schober-sp commented Nov 30, 2021

First try:

 root@0311387bfc79:/srv/www/ops-lib# rails console
 Loading production environment (Rails 6.1.3.2)
 irb(main):001:0> RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").eval
 Usage: rails [options] [--] <command> <subcommand> [additional args]
 
 Available commands:
 ...

So that works properly

Second:

 root@fb73c6cd02ad:/srv/www/ops-lib# rails c
 Loading production environment (Rails 6.1.3.2)
 irb(main):001:0> RubyVM::InstructionSequence.load_from_binary(RubyVM::InstructionSequence.compile_file("/srv/www/ops-lib/ops-cli2.rb").to_binary).eval
 Usage: rails [options] [--] <command> <subcommand> [additional args]
 
 Available commands:
 ...

So both are working....

I also posted the original rb file, its just a small bootstrapping file

 #!/usr/bin/env -S rails runner
 if ENV['OPSLIB_USE_SERVICES'] == 'true' || ARGV.delete('--use-services')
   require_relative 'scripts/ops_cli_services'
 else
   require_relative 'scripts/ops_cli_legacy'
 end

I would love to share the docker container, but there's too much secret in there to share

@byroot
Copy link
Contributor

byroot commented Nov 30, 2021

Wow, that's extremely weird, I really don't get what might be happening here :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants