Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Instantly restart a server or workstation each day System
Among other things, I administer two identical servers located in different locations on different LANs. They're running Mac OS X 10.4.2 client w/all updates and they run the services: ipfw, nat, dhcp, dns, and ssh. Unfortunately there seems to be a bug in Tiger which causes my servers to freeze frequently. The servers don't automatically login (all of their services are daemons, and don't require an user account). I tried changing RAM, repairing permissions, running Disk Warrior, Drive Genius, zapping the PRAM, etc. I am fairly confident that its not a hardware problem, because it's occurring on two separate but identically-configured servers (on two different LANs, from two different ISPs).

It has been suggested that lookupd is the culprit, and the recommended solution is the third-party unlockupd daemon. However, even running unlockupd, my servers freeze frequently. I figured that I'd try a hardware solution called Kick-Off! to restart the computers when they freeze. However, even using Kick-Off!, my servers freeze frequently.

I did notice that, most of the time, the servers last at least 24 hours between freezes, so I decided to implement a once a day, automatic restart. But these servers are very important, so I want them to have as little downtime as possible. So I choose the time of day with the least traffic to implement this daily restart.

I did not want to use the Energy Saver Preference Pane of Mac OS X to shut down then start up the computer, because I can't have the machines down for more than a minute at a time -- so I can't separate the shutdown and startup times by more than a minute. Also, I can't have the machines miss their startup time because they're still shutting down a lot of processes (so I can't separate the shutdown and startup times by less than a minute). So here's how I decided to proceed. I decided to simply add a line to the /etc/crontab file:
$ sudo nano -w /etc/crontab
Password:
# The periodic and atrun jobs have moved to launchd jobs
# See /System/Library/LaunchDaemons
#
# minute        hour    mday    month   wday    who     command
30    6    *    *    *    root    /sbin/shutdown -r now
Control-X y
It is important to note that the spaces shown between the items above (in the 30... row) are tabs. Use Control-X to exit nano, and hit y to tell nano that you want to save your changes. That tells the computers to have the root user (root account does not need to be enabled) execute the instant restart command everyday at 6:30 AM.

I understand that using the crontab is being depreciated, however, for the life of me, I could not get a launchd LaunchDaemon to execute at the time that I specified. I did use plutil to make sure the syntax was correct, I did set the owner and permissions properly, and I did remember to use launchctl to load the file, etc. I know the command worked because I could sudo launchctl load /Library/LaunchDaemon/daily-restart.plist, and then manually do sudo launchctl start daily-restart, and it would restart the comp immediately. However, it never worked at the time that I specified in the plist file. Oh well.
    •    
  • Currently 2.80 / 5
  You rated: 5 / 5 (5 votes cast)
 
[14,725 views]  

Instantly restart a server or workstation each day | 22 comments | Create New Account
Click here to return to the 'Instantly restart a server or workstation each day' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Instantly restart a server or workstation each day
Authored by: stewarsh on Oct 17, '05 08:42:52AM

I've never heard of this happening on Tiger client but have had the problem on my XServe since going to Tiger server. There is a bug in lookupd that causes a lockup and thus will prevent anything running that needs to lookupd. These things include Authentcation requests, mapping a UID to a username, DNS queries, etc.

I wrote a C program that will detect the fault and automatically restart the system when it occurs. This has minimized my downtime, but the negative effect is that the reboot can happen at any point in the day.

If anyone is interested in getting this program drop me an e-mail @ sstewart_at_mac_dot_com and I'll send it to you.



[ Reply to This | # ]
If lookupd is the problem...
Authored by: gfoyle on Oct 17, '05 09:32:43AM
If lookupd is the problem, have you tried just restarting it? You could use something like
kill -9 `cat /var/run/lookupd.pid`; /usr/sbin/lookupd
(I have not tested this command so I am not sure if this is the exact syntax.)

[ Reply to This | # ]
If lookupd is the problem...
Authored by: sjk on Oct 17, '05 02:36:30PM

-9 (SIGKILL) is overkill (pun intended?) and should only be tried when a process can't be killed with some other signal. What you probably want here is:

sudo killall lookupd

or:

sudo killall -HUP lookupd



[ Reply to This | # ]
If lookupd is the problem...
Authored by: TigerKR on Oct 17, '05 07:05:40PM

Restarting lookupd is exactly what unlockupd is supposed to do. But It doesn't appear to work all of the time. And when unlockupd doesn't work, you can't log-in to a headless server to restart lookupd.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: MrIso on Oct 17, '05 09:41:20AM

for the time, you don't have to use tabs, you can use spaces, so:

30 6 * * * .....

would work.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: foilpan on Oct 17, '05 09:46:19AM

if this is a problem, why hasn't apple fixed it -- especially if it affects OS X server?

i haven't heard of this issue yet, but it seems like rebooting your servers is probably the brute force approach and doesn't actually solve the problem.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: TigerKR on Oct 17, '05 07:13:30PM

This problem has been reported on macfixit.com, macintouch.com, xlr8yourmac.com, here at macosxhints.com, and on apple.com support and discussion forums.

Yes, this is a problem apple needs to solve. But until they do, I need to find the most-effective, least expensive solution. Since I've run the combination of unlockupd and a daily "shutdown -r now" I've had exactly 0 freezes on both of the machines.

I'm not sure if my problem has disappeared because of the daily restart, or a combination of a daily restart + the unlockupd daemon, (I do know that unlockupd alone did not do the trick) but for now, I'm going to stick with what's working (and that's the combination approach).



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: patpro on Oct 17, '05 10:21:27AM

if you are using a hand-made IPFW ruleset, you might want to check that you are not using keep-state rules on dual processor Mac. There is a nasty bug in the kernel/IPFW implementation that will make your machine freeze with that setting. (bug #4112652 on Apple's bugreport, flaged as closed, should be fixed in 10.4.3)



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: TigerKR on Oct 17, '05 07:16:01PM

That's interesting because I do have a hand-made IPFW2 ruleset. However, my boxes are single processor machines.

Is there any word on this bug affecting G3 towers (all G3 towers were single processor)?



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: patpro on Oct 18, '05 02:39:51AM

nop, single proc system are not affected. In fact, on a dual-proc system, if you disable one proc by using "Processor" system pref pane, you avoid completely the problem.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: stewarsh on Oct 17, '05 10:47:41PM

Interesting to note, thanks. I'll post that comment on Apple's board as it may help someone else. Though in my case all my servers are behind SonicWalls so no help, but still every thought is appreciated.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: patpro on Oct 18, '05 02:45:47AM

that's a shame, back in May-to-August there was a full thread (at least 50 posts, maybe around 80) about this issue. Too bad Apple deleted it.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: lurch99 on Oct 17, '05 04:53:41PM
I also administer several Tiger servers and run the same services you mention, in addition to others, and never have experienced the freeze you're talking about. I'm not even sure if it's crashed even once on any of the three Tiger machines I admin (two G4s, one xServe). These machines are all workhorses that hardly ever get a chance to catch their breath, either. You ought to investigate what's causing the problem, because restarting the machines is not a good solution. Look closely at the logs in Console and I'm sure you'll find valuable information; also, you ought to post a message on the Apple website, too, and try to figure out what's happening. Address the problem at hand before simply restarting it, that's only a half measure that will never get at the root of the problem. Cheers, Lurch

[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: TigerKR on Oct 17, '05 07:25:14PM

I am glad that you are not having any problems. But I know that I am not alone with this problem.

I have poured over every log file that I could find, scoured their contents whether or not I thought they were relevant. I don't even have confirmation of the freeze from any of the logs - they all just stop logging as if they are asleep until the eventual forced restart when they start logging again. By the way, my servers are set to never sleep - HD nor system sleep.

I've left a terminal window open on the servers with top running to hopefully catch a process max-ing out the CPU, or maybe to see that I've run out of RAM - no dice... I've spent many sleepless nights trying to find a solution. To no avail. With my current implementation, I have 0 problems.

How are you running DHCPd on Mac OS X client? What version of DHCPd are you running?



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: lurch99 on Oct 18, '05 10:40:35AM

Then let me ask: what devices do you have connected to your server? Do you have an extra machine you can do a clean install on, to see if you can duplicate the problem? My guess is a hardware problem if you're truly not seeing it in the logs. I'm on the OS X Server mailing list and I've not heard of anyone else with your problem, so I would be hesitant to conclude its a bug in the OS. Don't give up and conclude you've reached the end of the road in terms of troubleshooting, it's often something you've overlooked ag ain and again. Have you tested your RAM? Did you buy third party RAM from the same company and installed the same chips in both machines? These are (some) of the questions you need to ask...



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: TigerKR on Oct 18, '05 10:21:57PM

I have previously stated that I have experienced identical behavior on two separate machines, both with identical hardware and software configurations. However, the two machines are in two separate physical locations and they are working off of two separate ISPs.

So to answer your question - yes, I have duplicated the problem. I did not clone the machines, they were wiped and installed fresh individually (albeit with the same procedures).

The only devices that I have attached to the computer(s) are a a broadband modem, a 10/100 ethernet switch, an apple keyboard, an apple mouse, and LCD monitor.

I also considered a hardware issue, however, everything was working fine (with both machines) before I upgraded them to Tiger. Its a little odd that both computers would suddenly develop a hardware issue exactly after being (wiped and ) updated to Tiger. I wish that I could downgrade, but alas, I cannot.

Again, as I had previously stated, I tried changing RAM, repairing permissions, running Disk Warrior, Drive Genius, zapping the PRAM, etc. I have also run Techtool Pro, moved the RAM around, swapped it in and out, etc...

By the way, I am not running Mac OS X Server.

I think that I have already not only asked those questions, but also answered them too.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: lurch99 on Oct 19, '05 09:52:55AM

TigerKR, I hear you but I still don't think the problem is with the Tiger OS.

Your original note says you are running server, so I'm confused. How else are you running DNS services? And DHCP? Are you building these yourself?

You need to be totally scientific about this. I admin more than 50 Tiger/Panther clients and three servers, why haven't I had this problem? Or why haven't I seen this on the Apple mailing lists, either? If you think this is a bug that others are experiencing, too, I have found zero evidence of that.

In short, either scripting or manually restarting your server on a daily or weekly basis is a totally impractical solution, and I'm sure Apple will not figure out your problem since it doesn't sound like you're terribly clear on what the problem is, and how to fix it...

Lurch



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: TigerKR on Oct 19, '05 10:22:49AM

From the original hint: "Among other things, I administer two identical servers located in different locations on different LANs. They're running Mac OS X 10.4.2 client w/all updates..."

DNS is handled by Mice & Men Suite 5, and DHCPD is handled by the darwin-ports build isc-dhcpd-V3.0.2. That is the only darwin-port I have installed.

Are any of the machines that you admin Blue and White G3 towers? I don't have a problem with Tiger on any of the other computers that I administer, but they're all G4s and they aren't running DNS, DHCPD, IPFW2, nor NATD.

Just yesterday I swapped out a Blue and White tower for a Quicksilver G4 (single processor). I will try removing the daily restart on the G4 tower to see if it was a problem with the OS in relation to the old hardware.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: stewarsh on Oct 17, '05 10:44:09PM

All that has been done and several steps beyond that. This bug doesn't seem to affect everyone, but I don't have enough test environments to trace the exact fault. However, Apple does and I've been working with their Enterprise Support group to resolve the issue. I imagine that the fix will be included in 10.4.3, but have no solid confirmation on that.

So far the only way to get your machine back once it hits this state is to reboot since killing and re-starting lookupd has no affect. Nor does flushing the lookup cache, nor restarting DirectoryServices or any combination of the above that I've tried. At this point I can only wait on Apple.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: mickazoid on Oct 17, '05 10:32:34PM
I found myself considering a daily restart when I found services on my main web/mysql production server Mac were becoming more and more unresponsive under heavy use (this was in the era of Jaguar). The delays and timeouts would grow so problematic, I considered and wrote a similar crontab.

However, I luckily wound up not needing to do so - I found a correction to my particular demon "bug-du-jour".

Your machine wants to be a high-uptime machine. It doesn't want to be rebooted every day. And you don't wanna do that to your nice server, do you?

I wish you all the best luck in finding, and eradicating, the bug - or simehow finding a means to avoid the bug's error condition until an appropriate Apple update is released.

[ Reply to This | # ]

Instantly restart a server or workstation each day
Authored by: stewarsh on Oct 17, '05 10:49:44PM

What, if I may ask, did you do? We are seeing problems with python slowing on one of our other XServe webservers.



[ Reply to This | # ]
Instantly restart a server or workstation each day
Authored by: mickazoid on Oct 17, '05 11:14:44PM
Well, in a nutshell, we first installed a vanilla setup from scratch on an all-new machine (to find any specific bugaboos that might have accompanied this box config in particular. Always a good idea. Then when we found the problem was indeed replicable across vanilla installs, we made intimate use of the apache server, mysql and server monitoring capabilities (as you did) to watch individual web and server process threads in action. Then we'd recompile the various code modules for each of the relevant web, app and db products in use. When we found a combination that consistently threw errors, we recompiled the related code to test for incremental exceptions until we trapped the condition.

Sorry to be vague but that's the most consise way to describe it :)

[ Reply to This | # ]