Difference between revisions of "Xen"

From Noah.org
Jump to navigationJump to search
m
Line 1: Line 1:
 
[[Category:Engineering]]
 
[[Category:Engineering]]
  
== dom0 can't handle too much memory ==
+
== a note about GRUB_CMDLINE_XEN names in /etc/default/grub ==
  
Problem: you try to boot your Xen host and it locks up during boot with a message like this:
+
I have seen documentation with XEN command-line boot options with and without underscores. I am not sure if the system will accept both, or if one style is a newer convention, or what. Beware.
 +
 
 +
== xen version headaches ==
 +
 
 +
Xen can be very finicky to get running. Generally later versions are prefered. This may seem obvious, but later versions are much easier get working than earlier versions. The downside is that many tools for Xen are quite brittle and are strongly dependent on a specific version of Xen. Anything that depends on scripts in '''/etc/xen/scripts/''' is bound to break between different versions of Xen. Unfortunately, lots of tools seem to have this weakness. One of the more popular tools for working with Xen, '''xen-tools''' is particularly guilty of tight version coupling. It is also itself fairly buggy.
 +
 
 +
Beware.
 +
 
 +
== Problem: dom0 can't handle too much memory ==
 +
 
 +
It may seem odd that your host can have '''too''' much RAM, but it seems that huge amounts of RAM will confuse the Xen dom0. In my case I was working with a server with 384 GB of RAM. The problem is that your physical machine has more memory than '''dom0''' can handle. The solution is to restrict the amount of memory the Xen dom0 can use. This is set in the GRUB boot menu.
 +
 
 +
Here is what you are likely to see. You try to boot your Xen host and it locks up during boot with a message like this:
 
<pre>
 
<pre>
 
FATAL: Error inserting dm_mod (/lib/modules/2.6.32-5-xen-amd64/kernel/drivers/md/dm-mod.ko): Cannot allocate memory
 
FATAL: Error inserting dm_mod (/lib/modules/2.6.32-5-xen-amd64/kernel/drivers/md/dm-mod.ko): Cannot allocate memory
Line 10: Line 22:
 
Gave up waiting for root device.
 
Gave up waiting for root device.
 
</pre>
 
</pre>
The problem is that your physical machine has more memory than '''dom0''' can handle. In my case I was working with a server with 384 GB of RAM. The solution was to set a max memory limit for the Xen hypervisor in the GRUB boot menu.
 
  
I also like to pin a few cores for dom0. That is, I like to reserve CPU only for dom0 use.
+
You can restrict the amount of RAM for dom0 by editing the '''grub.cfg''' or by editing '''/etc/default/grub''' on Debian/Ubuntu systems. I also like to pin a few cores for dom0. That is, I like to reserve CPU only for dom0 use.
  
 
The '''grub.cfg''' should have a line similar to this:
 
The '''grub.cfg''' should have a line similar to this:
Line 26: Line 37:
 
and http://wiki.debian.org/Xen#Other_configuration_tweaks
 
and http://wiki.debian.org/Xen#Other_configuration_tweaks
  
The exact operations you need to update '''grub.cfg''' will vary from platform to platform. On '''Debian 6''' I did this:
+
The exact operations you need to update '''grub.cfg''' will vary from platform to platform. On modern Ubuntu systems you will edit '''/etc/default/grub''' then run '''update-grub'''. On an ancient '''Debian 6''' system I did this:
 
<pre>
 
<pre>
 
dpkg-divert --divert /etc/grub.d/08_linux_xen --rename /etc/grub.d/20_linux_xen
 
dpkg-divert --divert /etc/grub.d/08_linux_xen --rename /etc/grub.d/20_linux_xen
Line 35: Line 46:
 
</pre>
 
</pre>
  
=== GRUB_CMDLINE_XEN names ===
+
== Problem: dom0 can't free RAM to run guests ==
 +
 
 +
You might see an error like this while starting a guest:
 +
<pre>
 +
Error: Not enough free memory and enable-dom0-ballooning is False, so I cannot release any more.  I need 8421376 KiB but only have 130924.
 +
</pre>
 +
Ballooning causes trouble in machines with lots of RAM, yet turning it off causes dom0 to take ''all'' the RAM for itself. This leaves nothing for the guests. The fix is simple. This is another instance where Xen behaves badly on large systems.
 +
 
 +
The solution is simple. You must also set the Xen boot parameters in GRUB to limit the amount of RAM dom0 is allowed to use. See the section titled [[#Problem: dom0 can't handle too much memory]].
  
I see documentation with XEN command-line boot option with and without underscores. I am not sure if the system will accept both or if one style is a newer convention.
+
The most annoying part about this is that part of the fix must be done in '''/etc/xen/xend-config.sxp''' and part of it must be done in the GRUB config on boot. It seems like these fundemental memory parameters should all be in one place.
  
=== scrubbing free RAM takes forever ===
+
== Problem: scrubbing free RAM takes forever ==
  
 
Add '''no-bootscrub''' to GRUB_CMDLINE_XEN. You may also wish to disable scrubbing free RAM since that will cause the boot to take forever. The RAM scrubbing is a security strengthening step. If your host is for your sole use then this security step can probably be skipped. This will significantly increase the boot speed.
 
Add '''no-bootscrub''' to GRUB_CMDLINE_XEN. You may also wish to disable scrubbing free RAM since that will cause the boot to take forever. The RAM scrubbing is a security strengthening step. If your host is for your sole use then this security step can probably be skipped. This will significantly increase the boot speed.
Line 85: Line 104:
 
== guests start to have erratic networking ==
 
== guests start to have erratic networking ==
  
I found that this happened when my '''dom0''' ran low on disk space. I am not certain that this is the cause because there were no useful messages in '''dmesg''' or any other log files .
+
I found that this happened when my '''dom0''' ran low on disk space. I am not certain that this is the cause because there were no useful messages in '''dmesg''' or any other log files.
 +
 
 +
Also, this may begin to happen if too many guests share the same bridged interface. This happens only at very high load levels with lots of guests (around a hundred).
  
 
== Error: physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore. ==
 
== Error: physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore. ==
Line 121: Line 142:
 
aptitude -q -y install xen-hypervisor-4.1-amd64 xen-tools xen-utils-4.1 xenstore-utils   
 
aptitude -q -y install xen-hypervisor-4.1-amd64 xen-tools xen-utils-4.1 xenstore-utils   
 
</pre>
 
</pre>
 +
 +
== miscellaneous problems ==

Revision as of 16:59, 6 May 2013


a note about GRUB_CMDLINE_XEN names in /etc/default/grub

I have seen documentation with XEN command-line boot options with and without underscores. I am not sure if the system will accept both, or if one style is a newer convention, or what. Beware.

xen version headaches

Xen can be very finicky to get running. Generally later versions are prefered. This may seem obvious, but later versions are much easier get working than earlier versions. The downside is that many tools for Xen are quite brittle and are strongly dependent on a specific version of Xen. Anything that depends on scripts in /etc/xen/scripts/ is bound to break between different versions of Xen. Unfortunately, lots of tools seem to have this weakness. One of the more popular tools for working with Xen, xen-tools is particularly guilty of tight version coupling. It is also itself fairly buggy.

Beware.

Problem: dom0 can't handle too much memory

It may seem odd that your host can have too much RAM, but it seems that huge amounts of RAM will confuse the Xen dom0. In my case I was working with a server with 384 GB of RAM. The problem is that your physical machine has more memory than dom0 can handle. The solution is to restrict the amount of memory the Xen dom0 can use. This is set in the GRUB boot menu.

Here is what you are likely to see. You try to boot your Xen host and it locks up during boot with a message like this:

FATAL: Error inserting dm_mod (/lib/modules/2.6.32-5-xen-amd64/kernel/drivers/md/dm-mod.ko): Cannot allocate memory
done.
Begin: Waiting for root file system ... done
Gave up waiting for root device.

You can restrict the amount of RAM for dom0 by editing the grub.cfg or by editing /etc/default/grub on Debian/Ubuntu systems. I also like to pin a few cores for dom0. That is, I like to reserve CPU only for dom0 use.

The grub.cfg should have a line similar to this:

    multiboot   /xen-4.0-amd64.gz placeholder

It should be modified to something like this:

    multiboot   /xen-4.0-amd64.gz placeholder dom0_mem=8192M,max:8192M dom0maxvcpus=4 dom0vcpuspin

See also: http://wiki.xen.org/wiki/Xen_Best_Practices#Xen_dom0_dedicated_memory_and_preventing_dom0_memory_ballooning and http://wiki.debian.org/Xen#Other_configuration_tweaks

The exact operations you need to update grub.cfg will vary from platform to platform. On modern Ubuntu systems you will edit /etc/default/grub then run update-grub. On an ancient Debian 6 system I did this:

dpkg-divert --divert /etc/grub.d/08_linux_xen --rename /etc/grub.d/20_linux_xen
sed -i -e '$aGRUB_CMDLINE_XEN="dom0maxvcpus=4 dom0vcpuspin dom0_mem=8192M,max:8192M"' /etc/default/grub
update-grub
sed -i -e 's/(enable-dom0-ballooning .*)/(enable-dom0-ballooning no)/' -e 's/(dom0-min-mem .*)/(dom0-min-mem 8192)/' /etc/xen/xend-config.sxp
reboot

Problem: dom0 can't free RAM to run guests

You might see an error like this while starting a guest:

Error: Not enough free memory and enable-dom0-ballooning is False, so I cannot release any more.  I need 8421376 KiB but only have 130924.

Ballooning causes trouble in machines with lots of RAM, yet turning it off causes dom0 to take all the RAM for itself. This leaves nothing for the guests. The fix is simple. This is another instance where Xen behaves badly on large systems.

The solution is simple. You must also set the Xen boot parameters in GRUB to limit the amount of RAM dom0 is allowed to use. See the section titled #Problem: dom0 can't handle too much memory.

The most annoying part about this is that part of the fix must be done in /etc/xen/xend-config.sxp and part of it must be done in the GRUB config on boot. It seems like these fundemental memory parameters should all be in one place.

Problem: scrubbing free RAM takes forever

Add no-bootscrub to GRUB_CMDLINE_XEN. You may also wish to disable scrubbing free RAM since that will cause the boot to take forever. The RAM scrubbing is a security strengthening step. If your host is for your sole use then this security step can probably be skipped. This will significantly increase the boot speed.

GRUB_CMDLINE_XEN="dom0_max_vcpus=4 dom0_mem=4G,max:4G no-bootscrub"

generic /etc/default/grub settings

This is a good starting place for values for grub in /etc/default/grub. I show only the values that I typically change. After modifying this file you need to run update-grub.

GRUB_DEFAULT=3
GRUB_HIDDEN_TIMEOUT_QUIET=false
GRUB_TIMEOUT=10
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="apparmor=0"
GRUB_DISABLE_OS_PROBER=true
GRUB_CMDLINE_XEN="dom0_max_vcpus=4 dom0_mem=4G,max:4G no-bootscrub"
GRUB_CMDLINE_XEN_DEFAULT=""

Error: Dom0 dmesg log shows 'page allocation failure' or 'Out of memory: kill process:' or 'invoked oom-killer:' messages

Yes, these are vague symptoms, but I found that if I set vm.min_free_kbytes to a higher value this seemed to help. This may be partly precipitated by turning off dom0 ballooning and setting a fixed amount of dedicated memory. Note that this can happen even if dom0 has free RAM and swap. If you have lots of guests I think their I/O demands (disk and/or network) cause the dom0 kernel run out of wiggle room. Edit /etc/sysctl.conf and set the following option to reserve 128 MB for the kernel.

vm.min_free_kbytes = 131072

You can update this live with the following command.

sysctl vm.min_free_kbytes=131072

XENDOMAINS_SAVE

Edit /etc/default/xendomains and set XENDOMAINS_SAVE to be empty. This controls the feature that allows Xen to save the guest's running state when dom0 is shutdown. I almost never need this feature. It uses a lot of disk space.

#XENDOMAINS_SAVE=/var/lib/xen/save
XENDOMAINS_SAVE=""

xend won't start

I found that this happened when my dom0 ran out of disk space. For me the solution was, "don't run out of disk space".

guests start to have erratic networking

I found that this happened when my dom0 ran low on disk space. I am not certain that this is the cause because there were no useful messages in dmesg or any other log files.

Also, this may begin to happen if too many guests share the same bridged interface. This happens only at very high load levels with lots of guests (around a hundred).

Error: physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.

If you see the following message in dmesg or /var/log/kern.log

Error: physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.

then you probably need to patch /etc/xen/scripts/vif-common.sh and edit the function frob_iptables() so that it looks like the function below. You need to add the --physdev-is-bridged option to iptables in two places.

frob_iptable()
{
  if [ "$command" == "online" ]
  then
    local c="-I"
  else
    local c="-D"
  fi

  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$vif" "$@" -j ACCEPT \
    2>/dev/null &&
  iptables "$c" FORWARD -m state --state RELATED,ESTABLISHED -m physdev \
    --physdev-is-bridged --physdev-out "$vif" -j ACCEPT 2>/dev/null

  if [ "$command" == "online" -a $? -ne 0 ]
  then
    log err "iptables setup failed. This may affect guest networking."
  fi
}

Xen on Ubuntu

aptitude -q -y install xen-hypervisor-4.1-amd64 xen-tools xen-utils-4.1 xenstore-utils  

miscellaneous problems