WATCHDOG and RELATED



  1. Daemon

      1. Check nfs server status

The matter is that the daemon was still pinging the watchdog if the file system was unreachable. The watchdog daemon now checks if it can read a file. This file can be found in the /.wd_cfs/ file system which is soft mounted via nfs. If it gets a major time out, it waits some minutes and then reboot the CCPC. If the nfs server becomes reachable, then the daemon will go back to the previous state of pinging the watchdog forever, testing the file system.

So, in order to use this new feature of the wd_daemon, you need to make the directory /ccpc/dev/root/.wd_cfs in your nfs root file system. Then you make the directory /ccpc/dev/wd_cfs/ and you add this line in the /etc/export file “/ccpc/dev/wd_cfs *(rw,sync,no_root_squash). Finally you need the last disklessrc, which mounts the directory with the good options. This is done by the new kernel rpm.

It is also needed that the daemon is run with a parameter, which is the time, in minutes, you want to wait before to reboot once you detected that the nfs server was down. 15.

This changes from the previous version. In the previous version, if the daemon had no parameters it was not ran on boot, now it is, because the parameter is the time to wait. So you need to modify the file /etc/inittab in order to add the time out in parameter to the daemon, the line should looks like that : “wd:2345:respawn:/sbin/wd_daemon 15. This is done by the new kernel rpm.

The new daemon can run both on the 2.6.9-1 and the 2.6.9-2 kernel version, but on the first one, it won't check the nfs server status because none of the tools it needed will be there. So you need to install the latest kernel version.



      1. Check file system integrity

If the daemon can write in the root file system, restart the ccpc. This is done making a new file at the root of the system If an error is returned then it is ok, if the file is created it means that the file system as a problem, as it should be read-only, so the ccpc is rebooted.



  1. Boot time


      1. Ping the watchdog at start up

The time to download and uncompress the kernel can be long, long enough to let the watchdog rebooting the ccpc. We have to ping the watchdog once the kernel has been downloaded. This is done in the kernel sources, in the file setup.S, which is responsible for uncompressing the kernel.

/* added by jc to ping the watchdog at boot */



pushw %ds

pushl %eax



mem : .word 0xfcb0

.word 0xe000



movl $0,%eax

lds mem,%ax



/* initialisaton, 30 seconds */



movw $0x3333,%ds:0xfcb0

movw $0xCCCC,%ds:0xfcb0

movw $0xC080,%ds:0xfcb0



/* ping */

movw $0xAAAA,%ds:0xfcb0

movw $0x5555,%ds:0xfcb0



popl %eax

popw %ds

/* end */



Modified sources are available in a kernel patch on cvs, that I need to apply to the kernel but I don't succeed yet.

The ping is done just before the uncompression of the kernel.

  1. New Kernel version

The daemon and the new section of boot results is kernel version 2.6.9-2. The RPM of this last release of the kernel is not a very good RPM.

It puts the new network boot image in /tftpboot, and build the trees like needed by the new daemon.

But it is not the only thing needed, it should also make the update of the modules according to its version, it is not the final RPM release I hope. So you need to build modules and specific ccpc modules with the new kernel sources, and put them into the good directories $CCPCROOT/lib/modules/2.6.9-2.EL.ccpc/ .

Finally, you need to modify the file /etc/dhcpd.conf (or the file where your dhcpd configuration is ) in order to load the correct network boot image.



The best thing would be to build a rpm from kernel sources, making also the new network boot image, and setting up everything correctly.

  1. Resume what to do








Any comments to : jeanchristophe dot garnier at gmail dot com