Page MenuHomePhabricator

Stretch grid problem: cannot increase memory for Java
Closed, ResolvedPublic

Description

While migrating (my tool) to the Stretch job grid I ran into this problem:

I cannot start my Java process on the grid. I have tried every possible combination of arguments I can think of when referring to the (apparently) outdated documentation.

Error occurred during initialization of VM
Could not allocate metaspace: 1073741824 bytes
Error occurred during initialization of VM
Could not reserve enough space for 512000KB object heap
tools.usrd-tools@tools-sgebastion-07:~$ /usr/bin/jsub -once -mem 1024m java -jar my-bot/USRDbot.jar

I have set the Java program to directly run on login, because I have already spent hours on this and cannot throw more hours away trying to figure this out. Next time, please make this obvious in the documentation.

Event Timeline

Rschen7754 renamed this task from Stretch grid problem: (your description) to Stretch grid problem: cannot increase memory for Java.Mar 9 2019, 5:45 AM
Rschen7754 updated the task description. (Show Details)

I have set the Java program to directly run on login

This is not a great idea. Your bot will probably be stopped as soon as it is noticed by a Toolforge administrator.

/usr/bin/jsub -once -mem 1024m java -jar my-bot/USRDbot.jar asks the grid for exactly 1Gb of vram. There will be some amount of overhead memory needed beyond the jvm's heap. Starting a jvm without explicit -Xms and -Xmx values leaves determination of the minimum and maximum heap sizes up to the jvm itself:

$ java -XX:+PrintFlagsFinal -version | grep HeapSize
   size_t ErgoHeapSizeLimit                        = 0                                         {product} {default}
   size_t HeapSizePerGCThread                      = 43620760                                  {product} {default}
   size_t InitialHeapSize                          = 264241152                                 {product} {ergonomic}
   size_t LargePageHeapSizeThreshold               = 134217728                                 {product} {default}
   size_t MaxHeapSize                              = 4208984064                                {product} {ergonomic}
    uintx NonNMethodCodeHeapSize                   = 5835340                                {pd product} {ergonomic}
    uintx NonProfiledCodeHeapSize                  = 122911450                              {pd product} {ergonomic}
    uintx ProfiledCodeHeapSize                     = 122911450                              {pd product} {ergonomic}
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Debian-3bpo91)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3bpo91, mixed mode, sharing)

This seems to indicate that the default values (at least on a Toolforge bastion server) are the equivalent of using -Xms256m -Xmx4g. Hopefully your bot can actually run with less than 4Gb of ram, but with these implicit settings the jvm won't begin to garbage collect aggressively until it gets near the 4Gb expected upper limit on heap size.

I would recommend:

  • Always explicitly setting -Xms and -Xmx with your java grid job command
  • Setting both to the same value and the lowest value that will allow your code to run. This will likely be something between 256m and 2g depending on the amount of data that needs to be held in memory at the same time.
  • Setting the -mem value passed to jsub or jstart to a slightly larger value than the heap size. 10% larger would typically be enough, but you may need to adjust this up or down slightly.

I would suggest trying jsub -once -mem 282m java -Xms256m -Xmx256m -jar my-bot/USRDbot.jar and adjusting the values upward from there until you can find the smallest memory size you can reasonably use for your bot.

This doesn't seem to work. This works:

java -Xms1G -Xmx1G -jar my-bot/USRDbot.jar

But this just causes the bot to immediately stop:

jsub -once -mem 4G java -Xms2G -Xmx2G -jar my-bot/USRDbot.jar

I understand the concern about running it directly on login, but I also can't spend hours just hoping to get the right command either.

Rschen7754 claimed this task.

I got it working. I had to do the following:

jsub -once -mem 3G java -Xms512m -Xmx512m -jar my-bot/USRDbot.jar

Seems like a lot of overhead but at least it is working.

Thanks for your help!

Andrew triaged this task as Medium priority.Mar 9 2019, 9:33 PM