We use nagios to monitor our systems
The idea behind nagios is that you write a script that takes arguments for
critical and warning error levels. So the critical level can be above or
below a certain number or even not matching a string pattern. Then the
script can fetch a value via any method you like and do the comparison. It
then sends you an alert if the critical or warning levels are met.
Of course, there are a huge number of existing scripts you can use. We use
nagios to check everything from toner level in our printers to disk space on
our mail server. Mostly, we use snmp to fetch the data for comparison but
you can use anything, http, soap, dns. And then you can configure just about
any kind of alert you like. It doesn't have to be email. I gave a
demonstration on configuring nagios to my local linux users group and set it
up to call my cell phone and speak an error message. So during the demo, I
got a call on my cell phone and I held it up so the people in the front row
could hear it say in the festival voice that my file server was down.
Here is a link to the web page I used when giving my talk:
http://www.math.wisc.edu/~jheim/snmp/
----- Original Message -----
From: "Ryan Shugart"
To: "Blind sysadmins list"
Sent: Tuesday, December 20, 2011 6:58 PM
Subject: Re: [Blind-sysadmins] Keeping track of your environment.
Hi Darragh:
Can you go into more detail on what these "monitor screens" are? It sounds
like a billboard hanging on a wall displaying system status. We don't have
anything like that in my environment. We have a pager that rotates through
the department each week, and our monitoring software is set to flag
critical alerts and send those off to the person on call and they'll be
responsible for finding the right party and informing them. Beyond that,
everyone's kind of responsible for monitoring their own systems. I really
have to pitty our backup guy, that used to be me but that got too much, and
its getting too much for the current backup guy too. I agree with Andrew
here, its all about using whatever software you have's monitoring and
alerting, but really narrowing down just what alerts you need. If its
something I need to know about right away, that's one type of alert. If its
something that can wait for the next business day, but needs to be top on my
plate come next business day then that's something else. Also I just skim
systems from time to time to see if anything is slipping through the cracks.
Is it perfect, no, but what is? Good monitoring is actually very difficult,
sighted or blind. We recently spent about $250000 on a package called EG,
the vendor made our management believe EG would not just alert us to systems
being down, but it would also automatically do route cause analisys and find
the real cause of problems. Surprisingly it hasn't worked too well, I only
discovered a bad SAN drive today when I wandered into the server room and
heard a really odd high pitched squeal coming from the SAN. No other alerts
period. So getting the information is tough regardless, and it just boils
down to good tweaking.
On a side note Darragh, I'd be very interested in talking to you further
privately about your setup. It sounds like we manage roughly the same types
of environments, and I'd love to hear more about what challenges you face
and the things you've done to get around them. For example, I didn't find
SCOM to be that accessible, nothing like SCCM. I'd love to set up SCOM to
monitor SCCM, right now our SCCM environment is completely unmonitored,
well, accept for me browsing sight status from time to time.
Ryan
-----Original Message-----
From: blind-sysadmins-bounces@lists.hodgsonfamily.org
[mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of
Darragh OHeiligh
Sent: Tuesday, December 20, 2011 2:05 AM
To: Blind sysadmins list
Subject: [Blind-sysadmins] Keeping track of your environment.
Good morning,
I've asked about this a long time ago but I didn't really follow it up.
Our environment has doubled in size in terms of servers over the past
year. We now have two DR sites, I'm in the process of virtualizing our
DMZ and we're expanding our SQL cluster. The amount of work that has been
done is actually quite impressive if I do say so myself.
The problem is that there's just too much information to digest every
morning. I've syslog showing errors, What's up gold showing utilization
and availability, SCOM showing system errors, System insight manager
monitoring the SAN, Storage escential checking for storage bottle necks,
OfficeScan monitoring for viruses and other infections and Nessus running
security reports. It all mounts up to a huge amount of reports and
statistics to monitor every day. The problem is, if I get distracted by
OfficeScan for example, the other reports are neglected and I potentially
miss things that have happened during the night.
Of course, the other person that works with me can glance up at the
monitoring screens and see at a glance what's up and what's not. He can
see systems that are running low on disk space or using up far too much
memory.
I start an hour before this person so it looks terrible if I miss
something that can be spotted by simply looking up at the screens.
How do you monitor hundreds of servers? Are there any tips or tricks
you'd like to share? We're working with a mixed environment here but if I
need to use two approaches for monitoring both widnows and Linux then I
don't mind. Once I can get information in a more condensed format without
overwelming me with things I don't need to know about.
Thanks. any suggestions will be appreciated.
Regards
Darragh Ó Héiligh
Fujitsu
Offices of the Houses of the Oireachtas,
Fredrick Building,
South Fredrick Street,
Dublin2
Telephone: +353 (1) 618 3559
Email: darragh.oheiligh@oireachtas.ie
Internet: http://www.oireachtas.ie
_______________________________________________
Blind-sysadmins mailing list
Blind-sysadmins@lists.hodgsonfamily.org
http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
_______________________________________________
Blind-sysadmins mailing list
Blind-sysadmins@lists.hodgsonfamily.org
http://lists.hodgsonfamily.org/listinfo/blind-sysadmins