Rollback RHEL updates using LVM
RHEL6 has a nice new feature: LVM snapshot merging. This basically means that you can merge a snapshot back into it’s origin LV. RHEL installations normally have their OS installed on LVM (If you don’t, please reconsider. You don’t know what your missing).
So imagine you have a RHEL box running in production and you need to upgrade this box from 6.0 to, say, 6.1. Your customer asks you what are the risks involved and, if things go wrong, what is your contingency plan? Of course you say you have a backup, but what if a restore takes too long? If you happen to have your OS installed on a SAN disk, changes are good this SAN has some form of snapshotting available. If it has, use that. It is by far the better solution. However, lot’s of boxes have their OS installed on a local disk. If this local disk is actually a mirror of two disks behind a local raid controller, simply remove one member from the mirror before upgrade. When things go wrong, you can rollback using the removed member. Exact procedure heavily depends on the RAID controller in use, so RTFM. If you do not even have a local mirror available, you can now use LVM snapshots to rollback to the previous state, independent of the used filesystem. But wait, it gets even better: It even works for RHEL5 upgrades! So here is an example procedure I have worked out and tested together with my colleague Martijn to do just that.
Because you can have all kinds of crazy setup’s with LVM, I cannot predict how this procedure must be done exactly with your setup. So instead, I describe a basic RHEL5 LVM setup:
- A single disk /dev/sda
- /boot on a small seperate partition /dev/sda1 outside of LVM, because boot is not supported inside LVM
- VG VolGroup00 on /dev/sda2
- root in LV VolGroup00/LogVol00
- swap in LV VolGroup00/LogVol01
If your setup is different, simply adjust the procedure to fit your LVM setup.
- Prepare a RHEL6.x installation DVD. We need it’s rescue environment if we need to rollback.
- Estimate the amount of extra storage needed for the upgrade. If this space is not available in the Volume Group(s) then you need to add some disk space to them.
- Make a snapshot of each LV that is part of the OS: lvcreate -L<size> -s -n LogVol00-backup Volgroup00/LogVol00. You do not need to create a snapshot of any swap LV’s.
- Save the content of the boot partition. This partition is not on a LV, so we cannot use LVM snapshotting for backup. Create a temporary directory and make an exact copy of /boot: tar –xattr -cspvzf /tmp/boot.tgz * (while in /boot)
- upgrade your box as intended. The old content of all changed blocks will now be saved in the snapshots.
- test, test, test
If the tests are successfull, you only need to clean up (after a few days):
- remove /tmp/boot.tgz
- remove the snapshots (lvremove VolGroup00/LogVol00-backup)
- remove the extra physical volume(s) from the volume groups (vgreduce VolGroup00 <dev>)
If the tests are not successfull and you need to rollback:
- Boot the server from the RHEL6.x DVD you prepared beforehand (you did, didn’t you?)
- Choose the rescue environment from the RHEL6.x GRUB menu
- When asked to enable the network interfaces choose No
- When asked to mount the install os beneath /mnt/sysimage choose Continue
- The rescue environment will then mount the installed OS on /mnt/sysimage
- Choose to get to the commandline
- chroot into the installed os: chroot /mnt/sysimage
- Remove the content of /boot: rm -Rf /boot/*
- restore the old content of /boot: tar –xattr -xpsjf /tmp/boot.tgz -C /boot
- run grub install: grub-install /dev/sda
- leave the chroot: exit
- Unmount all mountpoints beneath /mnt/sysimage. This is because we do not want the LV’s to be in use when we merge the snapshots back into their respective LV’s.
- Perform the merge on each LV: lvconvert –merge GroupVol00/LogVol00-backup -i 1. When both LV and snapshot are not in use (as it should be) the merging will be done interactively and because of the “-i 1″ you get a progress indication every second. If either LV or snapshot is in use somehow (shouldn’t be, because you unmounted everything in /mnt/sysimage, right?), the merging is scheduled for next moment both LV and snapshot are unused. You can see this state with lvs. The first attribute of the LV will then be “O”, meaning “Origin with merging”. You need to wait untill this first attribute is “-”. The best thing to do in this case is to reboot into the rescue environment again. You will see that the merging is then completed, because the LV’s are being freed during reboot. You need the LVM from RHEL6 in this situation to complete the merging, because the LVM in RHEL5 does not understand this and gives nasty error messages.
- When all LV’s are merged reboot into the installed OS. You now should have you’re old OS up and running again
- Cleanup using the same steps as with a successfull upgrade, with the exception of the snapshots, because those are automatically cleanup during the merge.
Have fun!
Read from Source
| Print article | This entry was posted by Fred on September 28, 2011 at 21:54, and is filed under Linux. Follow any responses to this post through RSS 2.0. Both comments and pings are currently closed. |
Comments are closed.