Reg Exp
Web Design
Notes Client
Rebooting Windows Server
I have a scheduled process running on a test server of mine. And every once in a while this scheduled process gets a little bit messed up. It's a connection issue, and the only way to fix it is to reboot the entire operating system. A restart of Domino doesn't even do it. Yes, I know, going to Linux might make it better. But it "might" make it better, not "certainly" make it better, and there are other things on the test server that are Windows-dependent. So changing operating systems is not an option at this point.

I ended up with a pretty ingenious solution. I wrote some monitoring code that would identify the situation. There is logging in the original process and there's a specific error triggered when the issue happens. I look through the log and if that error happened in the most recent run of the process, then I want to reboot the server. But, how to reboot the server? If you call a batch file from the Domino program, what ended up happening is that the batch file would shut down Domino but not restart the server - the trigger for running the batch file (Domino) went away and so the batch file went away with it. So the batch file wouldn't complete its processing.

I found this tip from Search Domino that talks about using an API to write to the Windows Event log from LotusScript. I modified it slightly to make my own script library.

Option Public
Option Declare

Declare Private Function RegisterEventSource Lib "advapi32.dll" Alias _
"RegisterEventSourceA" (ByVal lpUNCServerName As String, ByVal lpSourceName As String) As Long
Declare Private Function DeregisterEventSource Lib "advapi32.dll" (ByVal hEventLog As Long) As Long
Declare Private Function ReportEvent Lib "advapi32.dll" Alias "ReportEventA" ( _
ByVal hEventLog As Long, ByVal wType As Integer, ByVal wCategory As Integer, _
ByVal dwEventID As Long, ByVal lpUserSid As Any, ByVal wNumStrings As Integer, _
ByVal dwDataSize As Long, plpStrings As Long, lpRawData As Any) As Long
Declare Private Function GetLastError Lib "kernel32" () As Long
Declare Private Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (hpvDest As Any, _
hpvSource As Any, ByVal cbCopy As Long)
Declare Private Function GlobalAlloc Lib "kernel32" (ByVal wFlags As Long, ByVal dwBytes As Long) As Long
Declare Private Function GlobalFree Lib "kernel32" (ByVal hMem As Long) As Long


Sub LogNTEvent(sAppName As String, sString As String, iLogType As Integer, iEventID As Long)
   On Error GoTo BubbleError
   ' Create an event in the Windows Server event viewer
   Dim bRC As Variant
   Dim iNumStrings As Integer
   Dim hEventLog As Long
   Dim hMsgs As Long
   Dim cbStringSize As Long
   hEventLog = RegisterEventSource("", sAppName)
   cbStringSize = Len(sString) + 1
   hMsgs = GlobalAlloc(&H40, cbStringSize)
   CopyMemory ByVal hMsgs, ByVal sString, cbStringSize
   iNumStrings = 1
   If ReportEvent(hEventLog, iLogType, 0, iEventID, 0&, iNumStrings, cbStringSize, hMsgs,hMsgs) = 0 Then
      Error 100, GetLastError()
   End If
   Call GlobalFree(hMsgs)
   DeregisterEventSource (hEventLog)

   Exit Sub
   Error Err, Error$ & Chr$(10) & "in subroutine " & GetThreadInfo(1) & ", line " & CStr(Erl)
End Sub

Once I had the script library created, I was able to create a monitoring scheduled agent that would create an event on the Windows server.

   If needToRestart = True Then
      ' Create a Windows Event that will trigger a server reboot
      Call LogNTEvent("NotesEvent", "Look for Connection Time Outs", EVENTLOG_ERROR_TYPE, 1000)
   End If

The next item was using the Windows Task Scheduler to trigger the batch file that reboots the server when this event happens. The batch file is pretty straightforward - sleep for 10 seconds (allow the agent to finish instead of cutting it off), send a "net stop" command to stop the Domino server, do an NSD -kill command to kill off any remaining tasks, then do a reboot of the operating system (with all the right parameters). Inside the Windows Task scheduler, I created a task that looks for a trigger. The trigger is an event. The event is from the Application Log, with a source of NotesEvent (matches the first parameter of the subroutine call), and an Event ID of 1000 (matches the last parameter of the subroutine call).

The rest of the task scheduler parameters are basically the defaults (don't reschedule the task, no delay, and so on).

Now, when my monitoring notices the server issue, it triggers the Windows Event, which triggers the reboot, which allows the next run of the agent to succeed.

What about constant rebooting? My monitoring agent is set up to run every 3 hours, so there's no way I can get into a cycle of rebooting (remember, the batch file allows the monitoring agent to complete). It would be possible to reboot every 3 hours, though, if the original agent never completes. To help with that potential situation, my monitoring agent also sends me an email when the reboot is going to happen. So I know the reboot happened and if I get a second one 3 hours later, then I know something else is going on. Right now it happens every 2 or 3 days, which is a problem by itself, but at least now I have bought myself some time while I try to research why Windows is having the timeout issue in this specific instance.