Monday, June 20, 2011

Unlocking LUKS encrypted LVM via SSH in Ubuntu

If you've followed the instructions /usr/share/doc/cryptsetup/README.remote.gz on recent versions of Ubuntu you've probably found that you are unable to log into the dropbear instance that you set up. This turns out to be the result of what I assume to be a bug in the dropbear hook script in initramfs-tools. Recent versions of Ubuntu seem to have reorganized the /lib directory and moved some files needed by dropbear without which it'll not be able to find "root" as a valid user.

First, run the following command to determine where the files you need are located:

find /lib -name libnss_files.so.2

On my system, I get the following:

/lib/i386-linux-gnu/libnss_files.so.2

The part you are interested in is the "i386-linux-gnu" part.

Now, edit (as root) /usr/share/initramfs-toosl/hooks/dropbear. Look around line 30 for the following:

cp /lib/libnss_* "${DESTDIR}/lib/"

Replace that with:

cp /lib/i386-linux-gnu/libnss_* "${DESTDIR}/lib/"

Now, as root, run the following:

update-initramfs -u

Reboot and you should now at least be able to connect to the dropbear instance with the dropbear key that was automagically generated.

Now, the next problem is plymouth. All the work-arounds I've found seem to either flat out not work as described or break plymouth such that you'll not be able to enter the unlock passphrase as the console should you choose to do so.

Use the steps below to work-around the plymouth issue (tested on Natty 11.04). This work-around will at least guarantee that plymouth is still able to unlock the LUKS root volume at the console should you choose to do so.

1) run "ps aux" and located the process id for the /scripts/local-top/cryptroot script
2) run "kill -9 pid" replacing pid with the process id you found in step 1
3) run "ps aux" again and look for a wait-for-root script and note the timeout on the command line
4) twiddle you thumbs for that many seconds - what will happen is that script will exit and start an initramfs shell
5) run "/scripts/local-top/cryptroot" and wait for it to prompt for your unlock passphrase
6) enter the unlock passphrase and wait for it to return you to the busybox shell prompt
7) run "ps aux" again and locate the process id of "/bin/sh -i"
8) run "kill -9 pid" using the process id you found from step 7

initramfs should be continuing the boot process at this point with your mounted root volume. You'll know this is happening because dropbear just had "/" yanked out from underneath it and you'll not be able to run any more commands in your ssh session as /bin no longer is available. Go ahead and disconnect and wait an appropriate amount of time for your system to finish starting up. After your system has finished booting, you should now be able to connect to it remotely just as if you had typed your unlock passphrase into the console.

If someone more knowledgeable than me with the inner workings of initramfs-tools and plymouth wants to comment on this article please feel free to. Since I don't even know if this is an Ubuntu vs upstream problem I haven't filed a proper bug report.

Wednesday, July 29, 2009

Postfix: Name service error resolving localhost

I thought I'd share this bit of information for those of you that use Postfix. If you want to forward your email to a smarthost that requires authentication over an ssl tunnel you've no doubt come across many guides on the net. They'll all tell you to place some variation of
"relayhost=localhost:###" in your main.cf config file along with the necessary entries for authentication in the sasl_passwd database and use stunnel in client mode.

Unfortunately, it seems that the version of Postfix included on my Ubuntu 9.04 system has issues with that configuration. In fact, my mail.info logs were spammed with the following messages:

postfix/smtp: warning: relayhost configuration problem
postfix/error: [irrelevant stuff snipped] Name service error for name=localhost type=A

Even though I have a valid entry for localhost in my /etc/hosts file it seems that the name resolution request was escaping to my isp's dns servers which have no idea what localhost should resolve to and rightfully return an error.

To fix this, try one of the following variations:

relayhost=[localhost]:###
relayhost=[127.0.0.1]:###

Documentation here:
http://www.postfix.org/postconf.5.html#relayhost

I wish there had been a clear, concise post on what the solution to this problem was. I spent way too much time reading dead end posts on mailing lists before everything clicked into place. Hopefully, this will save someone else some time.

Thursday, February 14, 2008

Making a Hitachi DKR2D-J72FC work in a Sun Blade 1000 workstation

I thought I'd post my experiences in the hope that it helps someone.

I recently purchased a Hitachi DKR2D-J72FC off ebay that shows internally as a DKR1C-J72FC with a firmware revision of D2W4 for use in a Sun Blade 1000 workstation with a Qlogic 2200 FC controller. According to the seller, the drive came from a StorEdge array that was used with AIX so the drive was formatted with a block size of 520 and would require low level formatting to a block size of 512.

The first step in getting this drive to work involved setting a jumper on pins 13 & 14 so the drive would autospin. This allows the Sun Blade 1000 to actually see the drive. However, even though Solaris could see the drive, the Solaris format utility was unable to format/label the drive.

I solved the format problem by downloading and compiling sg3_utils and used the sg_format program which has a convenient parameter for the block size to low-level format the drive to a 512-byte block size. After that, despite the fact that Solaris could see the drive and could see it was formatted to a 512-byte block size the Solaris format utility was still unable to write a partition table/label (e.g: error writing VTOC error) to the drive.

After several days, I finally stumbled upon a solution and it is nothing short of bizarre. For whatever reason this drive refuses to allow writes to it unless the write cache bit has been enabled. As the Solaris format utility is unable to change this parameter, I downloaded and compiled another tool called sdparm and used the following command to enable the write cache:
sdparm --set=WCE=1 /dev/rdsk/[device]s0

After enabling the write cache, the Solaris format utility was immediately able to write a label to the drive and I was able to create a filesystem and mount it. The drive passes all read/write surface tests I've thrown at it and works normally as long as the write cache bit remains enabled. If I disable the write cache it will immediately revert to the old behavior where any attempt to write to the drive results in an I/O error.

It's almost like the firmware on the drive is using that flag as some sort of global write-protection flag. This is really strange but since I don't require 100% data integrity I can live with the write cache being enabled. I hope this information helps anyone else that might be struggling trying to get one of these drives to work.