Well, I don't know if this is really applicable in your situation but we use nagios because it has plugins for everything. I think that is the key thing, not nagios itself but the plugin concept. If you can't find a plugin to monitor whatever it is you want to monitor, you write your own. I wrote a plugin to monitor temperature, disk status, and RAID status on our Dell servers. Actually, I have nagios set up to call my cell phone and speak a brief problem report if something really bad happens. That gives you some idea of how flexible nagios can be. I'm not familiar with the tools you are talking about but it seems to me that if you can't write a plugin to make the tool do whatever yu like, then its deficient. I suppose those are Windows tools, right? Maybe that's why they don't allow you to script plugins. On linux systems, you can assume the system will have bash, perl, and C++. But if these are Windows tools, maybe the developer didn't want to assume you'd have access to a scripting language. I'm not sure any of this is helpful. But maybe next time your department specs out a new monitoring system, they can make the ability to write your own plugins a top priority. ----- Original Message ----- From: "Darragh OHeiligh" <Darragh.OHeiligh@Oireachtas.ie> To: "Blind sysadmins list" <blind-sysadmins@lists.hodgsonfamily.org> Cc: "Blind-sysadmins" <blind-sysadmins-bounces@lists.hodgsonfamily.org> Sent: Wednesday, July 11, 2012 8:43 AM Subject: Re: [Blind-sysadmins] Information overlode. Ok, How! SCOM is a massive beast and the interface is a million miles from being very accessible. It can take a half an hour to create subscriptions because everything has to be done with the jaws cursor or what ever mouse cursor is available... SCOM is great because it highlights everything but it's not intellegent. 2012 has a few nice new views so that only systems in a red state are shown but it's far from perfect. It also has very poor integration with ESXI and it provides no hardware monitoring. The debate raged here for a while as to why we had so many monitoring tools. Management wanted us to justify why we were paying so many licence feeds every year. It's as simple as this though. SCOM is great for analysis of problems from a high level. It's integration with SRS is also fantastic. What's up gold is great for making sure systems are up and accessible but it doesn't do any real analysis. Unfortunately though, netbots is required for camera monitoring and environmental testing and the meriod of other hardware testing tools for the SAN, HP and Dell servers are absolutely vital because without them failures wouldn't be caught in time. Regards Darragh Ó Héiligh Fujitsu Offices of the Houses of the Oireachtas, Fredrick Building, South Fredrick Street, Dublin2 Telephone: +353 (1) 618 3559 Email: darragh.oheiligh@oireachtas.ie Internet: http://www.oireachtas.ie From: Matthew White <matt@wh1t3.net> To: Blind sysadmins list <blind-sysadmins@lists.hodgsonfamily.org> Date: 11/07/2012 14:36 Subject: Re: [Blind-sysadmins] Information overlode. Sent by: "Blind-sysadmins" <blind-sysadmins-bounces@lists.hodgsonfamily.org> Isn't that what system Center operations manager is supposed to do? I would use a system like that to capture all the incoming alert information and then presented in a much more useful manner. On Jul 11, 2012, at 6:52 AM, Darragh OHeiligh <Darragh.OHeiligh@Oireachtas.ie> wrote:
Hello,
I'm managing more and more systems here but I cant keep up with all the notifications. For example, I just cleared out over six thousand emails
from a folder that is used for air conditioning and environment notifications since the 15th of May.
that's just one system among ....... a lot.
I've notifications from SCOM, VMWare, What's up gold, Diskeeper, Event manage engines syslog, Netbots, Backups, the mail gateway, the SAN and more.
I know there are others out there that have the same amount of responsibilities so my question is, how do you stay up to date with the events. I am tired of being on the back foot. A few years ago I was able to tell when disk utilization was spiking on a server. Now, I'm way behind.
There are just too many alerts coming in.
It's not that the network is in bad condition. For example, one of the application servers is showing high CPU and disk utilization this morning. It could be just that a user is hammering away at it but it could be a dodgy application as well. Eitherway, I need to be aware of it.
You known in linux you can type tail -f *.log in a certain directory and
you'll see all the log files as their written? I want something like that for all my systems.
Unrealistic, I know, but I'm open to ideas.
Everything is tied up in red tape here but there's nothing that cant be done after a well written change request is provided.
Any suggestions?
Regards
Darragh Ó Héiligh Fujitsu
Offices of the Houses of the Oireachtas, Fredrick Building, South Fredrick Street, Dublin2 Telephone: +353 (1) 618 3559 Email: darragh.oheiligh@oireachtas.ie Internet: http://www.oireachtas.ie _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
_______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins