Diagnosing system issues with ssh and fs_usage

Jan 20, '03 08:59:18AM

Contributed by: Anonymous

If you're up for some fairly serious troubleshooting, the combination of these two tools will often give the solution to otherwise intractable system problems.

Many times when the Quartz-Aqua layers of Mac OS X are borked for one reason or another, the underlying BSD layer is working fine. As previously mentioned in this forum, fs_usage (executed as root or with sudo) lets you see file system activity. It works over an ssh connection, so if you have two computers and one of them is not responding, you may be able to log in remotely to see what is going on. The remote host doesn't even have to be a Mac; anything that supports ssh will do.

[Editor's note: Read the rest of the article for a good general troubleshooting log that may save you some time in the event you have future system troubles...]

As an example, I recently booted my desktop machine running 10.2.3 only to find that I couldn't log in. Everything was normal until I got to the login dialog. I entered my password and got the "logging in" message, but then the login aborted and the dialog reappeared. The same thing happened when I tried to log in to my pristine test user account, so I knew the problem wasn't in my home folder. Next I tried a safe boot (shift key down.) I still couldn't log in, so it wasn't any of my system startup items or kernel extensions.

Next I booted from a Firewire drive. This is better than booting from a CD for several reasons, one of which is that you can compare the system log with the one on the faulty boot device. I ran Disk Utility on the internal drive and repaired the disk directory (for the second time, since a safe boot includes this step automatically) and system permissions, with no unusual results.

Looking at the system log on the internal drive, I saw that it was the same as the one on the external until I came to this line:

crashdump: Crash report written to:
/Library/Logs/CrashReporter/loginwindow.crash.log
So the loginwindow process was crashing after I entered my password. But why? The crash report was unrevealing, at least to me.

Time to boot from the internal again, this time normally - no shift key. That lets the ssh daemon start up. I waited for the login dialog to come up, then entered my name and password, but didn't try to log in yet. Then I got my Powerbook, started a Terminal session, and successfully opened an ssh connection to the desktop via AirPort. Now I had access to my desktop machine, even though I couldn't log in to Aqua.

I started fs_usage from the PB, and hit the enter key on the desktop to log in. Again the login failed, but this time I was recording all file system activity in the scrollback buffer on the Powerbook. I searched for the first occurrence of the word "crashdump." The last file access by loginwindow prior to that line was this:
access	/Applications/Utilities/sshAskPass.app>>>>>>	
0.000099 loginwindow
I listed this item. "No such file or directory." Here was the answer. The application 'sshAskPass' is part of the (excellent and not at all blameworthy) package 'sshLogin' by Michael K Link. During the last Aqua session, I had inadvertently moved sshAskPass out of the Utilities folder where it was originally installed. The installation of sshLogin modifies the file /etc/ttys, which is read by loginwindow, to launch this app at login. When the app wasn't found in the expected location, loginwindow crashed. Safe booting didn't prevent the crash, because /etc/ttys was still being read.

The point of the story is that ssh and fs_usage saved me from a hellish troubleshooting session that might have taken days. The usual last resort, archive & install, would have accomplished nothing, because after restoring all the system configuration files in /etc, sshAskPass.app would still have been misplaced and logging in would have failed again. The modification dates of the files would have given me no clue.

Comments (5)


Mac OS X Hints
http://hints.macworld.com/article.php?story=2003012005591894