Hi Darragh:
A little of both actually. We're using a product called EG Monitoring, which I'd never heard of before I started working here. I can't go configure any of my own alerts due to accessibility issues, but that's another story. I guess I'm worried on two angles, the first is the quality of life issue, AKA does it at all impact you that you could have a pager go off any time of the night, do you sleep less, etc. I know that's a very individual to individual thing but that's what's worrying me. The false positive thing also bothers me, we have dependencies set up in EG, but they don't work. We have several sites around the world, and most of those sites have at least one ESX host. I'm responsible for the ESX host being down, but often when we get a host for an ESX host, its something else like a router, and EG just hasn't figured out the dependency yet. So things like that, also we don't monitor our SANs at all, which I've complained about but am told there's no way to do. Anyway, the only way we find out if a volume is down many times is when I get notified that the ESX servers can't access a datastore, then I have to go troubleshooting. So, I'm a little conserned about a lot of false positives coming in at 2:30 in the morning, but also the more general angle of do you do anything to make sure you're actually woken up when the 2:30 alert comes in.
Thanks a lot.
Ryan
-----Original Message-----
From: Blind-sysadmins [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Darragh OHeiligh
Sent: Friday, September 28, 2012 8:00 AM
To: Blind sysadmins list
Cc: Blind-sysadmins
Subject: Re: [Blind-sysadmins] out of hours notifications?
I use the text message alert system however I would love a better way.
Let me know what you come up with?
Sorry I cant be of more help.
What system are you using to send the alerts?
Are you looking for suggestions on alternative delivery methods or the
authenticity of genuine alerts.
For example, I have a mailbox set up just to receive all waht's up gold
alerts. If anything is down even for a second I get an alert. However
it gives a lot of false posatives. Most alerts are more genuine if the
threshold is set to 5 minutes or something like that. What's up gold also
has dependencies so for example, if a switch goes down and at the same
time all the nodes associated with that switch are unresponsive then WUG
sends the alert for teh switch only.
Feel free to ask if you have any questions.
Regards
Darragh Ó Héiligh
Fujitsu
Offices of the Houses of the Oireachtas,
Fredrick Building,
South Fredrick Street,
Dublin2
Telephone: +353 (1) 618 3559
Email: darragh.oheiligh@oireachtas.ie
Internet: http://www.oireachtas.ie
From: Ryan Shugart