Wednesday, August 22, 2012

When I say Acknowledge - you ACKNOWLEDGE me ok? Ohh wait, I just rebooted ?

This has come up twice in the past few weeks. I decided to put a little something down in writing to help clarify Maintenance Policies in Cisco UCS and why it might not be working the way you think it should.

I believe the confusion starts in that we've all been doing server updates the same way for a LONG time.  You update the system through your method of choice (ISO, floppy, software tool, etc) then you reboot the server and it's running the new firmware code.

I like to say that the "S" in UCS does NOT stand for "Server", it stands for "System". When you're updating Cisco UCS there's a few more things going on under the covers.  First, every compute node (server) under control of Cisco is under the ever watchful eye of the Fabric Interconnects. In order to make these changes, the servers in UCS boot from the FIs and load what is called PNuOS.  You might have seen this scroll by when you first configure a Service Profile on a system.

The "Processor Node Utility OS" is what controls the server.  It parses the XML of the Service Profile and makes the changes. 

When you make any change to a server under UCS that is disruptive, say for instance a FW update, it says, "OK, I hear what you're saying to me, but I'm going to look to my Maintenance Policy first and get back to you on when I'll actually perform the change you asked."

The default maintenance policy is "immediate" ... yucky, don't ask me, I didn't make it the default. Most people change to at least a "User-Acknowledged" policy.  Which means do the change when I acknowledge it at some later time.  At this point you should be saying, "So Scott, there's actually two sides to the UCS system ... one where it's under my control, and the other where it's under the FI overlord's control?" Correct. Until you actually acknowledge the changes, the system will never flip the bit in the service profile to make it boot from PNuOS.  If you simply reboot the system, it's still just a regular server and will perform a regular server reboot, never booting from PNuOS, but simply doing what you told it as a regular server.

Now when you make a change and have a User-Ack maintenance policy, you'll get a pop up that looks like this:

Ack the Ack
It can be a little confusing. At this point you are not actually Acknowledging the change.  You are acknowledging that you will Acknowledge it later.

To acknowledge the action, you have to Acknowledge it in the pending activities.

Pending Activities - where you Acknowledge and flip the bit for PNuOS
Clear as mud, which is maybe clearer than it was before. Or you could just create one uber service profile template with Immediate as the maintenance policy and activate all changes from there .... wait .. hmmm no, that's not really a good idea. :-)

