Domino on Linux Tip: top, It Just May Be an Admin’s Best Friend
Bill Malchisky November 29 2010 11:00:00 PM
For those that do not know, 'top' provides a ton of system information in a compact space, plus the 'top' system hogs. It is a very flexible program that provides a lot of power and the ability to resolve problems quickly.From the man page: "The top program provides a dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel. The types of system summary information shown and the types, order and size of information displayed for tasks are all user configurable and that configuration can be made persistent across restarts."
Here is a typical screen shot with default settings:
OK, so you get the point...there is a lot here. So, why blog about it? One aspect that really saves time is the ability to get a top listing based upon each user. As I typically install Domino on Linux as a partitioned server, you really get more bang for your buck with several different servers sharing system resources efficiently, with minimal overhead or disk consumption. This setup works great on UNIX installs as well, and was quite common long before virtualization was all the rage. Heck it's a lot cheaper too and is something that still makes a lot of sense in many scenarios.
Now recall that the best practice for a partitioned installation on any OS is to have each DPAR with a unique name. This allows one to keep track of resources by user (which maps to the server). You can organize in a more granular level on Linux by separating tasks by user ID with this command: top -U usenotes (taking a cue from the above graphic).
The point here is that you can open multiple xterm/terminal windows, each with a separate SSH session to your server and run top -U
Real-world usage: unbeknownst to the primary admin, a junior admin kicked-off an additional Domino server instance in the background; that server was a devbox that was retired and scheduled to be purged...it then proceeded to create a 12GB log.nsf for a server that was not even supposed to be running--didn't know to look for it initially. In just a few minutes the data directory's file system became full from both the log file--primarily. This all occurred during the maintenance window (fortunately). In applying a fix pack, the shutdown to the main Domino DPAR failed due to the full file system. Using top I then proceeded to see which processes needed attention:
Note: a properly tuned Linux box running Domino will almost never see 81% CPU utilization, so this is a big clue something is wrong; the event task running at 122% CPU provides the other piece to the puzzle. With the process ID on the line, it is easy to kill the errant Domino task on a server that is stuck on shutdown and can not be reached via the Domino console.
Bonus tip: with a default top session running, you can type "u" then enter the username (UID value) to immediately filter the list by username; depressing "u" and hitting the Enter key restores the list, or allows you to change UID values; very powerful. Read the man page for additional capabilities for the powerful tool.
As I firmly believe in parallel processing, while I was shutting down the runaway processes, in another xterm, I used "gzip --best log.nsf" to compact the large file in question, to give me some additional free space so that I could work.
After clearing the errant processes for the server, the Linux box resolved to normal and proceeded to clean-up things -- very quickly. In less then 15 seconds, the free space returned as the OS cleared-up all the temporary files and the errant task stopped filling NSF databases. Quite amazing.
For the command sequence below, I typed in the initial command, depressed the Enter key, observed the value, then pressed the up arrow key. Repeated three times.
In all my years, I have never seen a Windows box clean-up and recover from a full filesystem so quickly. Power of the EXT3 filesystem and a good kernel (et al).
Once calm returned, I then gracefully shutdown the primary DPAR to continue the maintenance which completed sans a hitch.
In this scenario, top proved a huge time-saver allowing for quick work of a situation that could have taken far longer without the knowledge of these quality tools provided gratis with Linux and UNIX. The one top session per terminal window ratio is very powerful and allows for excellent monitoring across multiple DPARs. I also add additional xterms for console monitoring and on one screen it is quite easy to troubleshoot upgrades, application roll-outs, and work with development to resolve complex application errors.
Two Additional Tips on Top
Toggle CPU Utilization
The default is to show the aggregate of all CPUs on one row; depressing the '1' key will show each individually; this key is a toggle.
Verifying Full Shutdown
If you want a quick way to establish that your DPAR has shutdown completely and has left no trailing processes, bring up top -u
A blank process list. If you just run top normally, you will never see an empty list, and most likely any remaining Domino processes would be at the bottom of the list and thus, scrolled off the screen. This approach guarantees things are copasetic.
Look for more Domino and Linux tips in the near future...
- Comments [0]