Hi Darragh: I deal with the same problem, and I've not found a perfect solution. To be honest, most of the systems administrators I know, blind or sighted, deal with this. There's so much data coming at you and filtering it is a problem. My first piece of advice is to come up with an internal plan as to what's critical, what's important and what's nice to know. Critical would be something like a down system, something you'd interrupt whatever it is you're doing, even if it's a really hot date, and go fix. For me, an ESX server being down, I need to know that right away. Those alerts get e-mailed to me and go right into my inbox as high priority. Important things are things I need to know, it can probably wait until the next business day but it would still shape what I do that day. Those get e-mailed to me and filtered into their own filter. A secondary SCCM site being down, or being low on disk space but not out, or something like that would be important. Finally nice to knows, how things are performing, etc. For me, slight network bottlenecks but nothing causing noticible application performance issues would be on that list, an ESXI server needing a patch would be on this list. Many of those are things I go check when I'm feeling board. Like that ever happens. What is critical, important and nice to know varies on your environment. If you're doing OSD with SCCM or something where people expect it to be up 24/7, than it might be a critical for you. Another important thing is to filter out things you just don't care about. Most monitoring packages give you too much information, and the problem then is you dig for the needle in a hey stack. I've gotten into arguments with people, but my philosophy is if an alert isn't important, turn it off. Don't bother with it. If you're getting an alert but have no hope of fixing it, turn it off. By that statement, I don't mean down systems or anything like that, I mean an alert where you go to management tell them this is a problem and they tell you they don't care. In that case, make it clear to them that since they don't care you're turning off that alert, document their OK and move forward. It sounds bad in a way, but really sys admins get too much information, and you don't want to be in a place where you have to dig through the hey stack for the needle, at least as little as possible. We're there anyway. We don't use SCOM where I work. I wish we did, right now our SCCM environment isn't monitored at all because the software we use can't monitor it. We use a program called EG, which I'd never heard of before I started at MiTek, and I'm not too happy with it. EG monitors Citrix well, but that's the highlight of EG. Its all web based and is really inaccessible, so I don't touch it much. It sounds like we have more people in our IT department than you do, and so that helps spread the load around. For example, I don't care about storage alerts since I don't manage the SAN. I care about ESX, SCCM, and some AD alerts. If you can, spread the love around, you have coworkers that's what they're there for. Another thing is we have a cellphone that routates through the department. Whenever a priority 1 alert is generated, AKA a server down, a site is offline, etc. a text message is sent to that phone and the person who has it has to look up the person responsible for that system and alert them. Again, that just helps spread things around, its not all on one person's shoulders and we can try to get a good night's sleep from time to time. Does this fix the problem, no. As I said though, information overload is a system admin problem, not a blind sys admin problem or a sighted sys admin problem, but something every admin faces and has to work through. Hope this helps. Ryan -----Original Message----- From: Blind-sysadmins [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Darragh OHeiligh Sent: Wednesday, July 11, 2012 7:43 AM To: Blind sysadmins list Cc: Blind-sysadmins Subject: Re: [Blind-sysadmins] Information overlode. Ok, How! SCOM is a massive beast and the interface is a million miles from being very accessible. It can take a half an hour to create subscriptions because everything has to be done with the jaws cursor or what ever mouse cursor is available... SCOM is great because it highlights everything but it's not intellegent. 2012 has a few nice new views so that only systems in a red state are shown but it's far from perfect. It also has very poor integration with ESXI and it provides no hardware monitoring. The debate raged here for a while as to why we had so many monitoring tools. Management wanted us to justify why we were paying so many licence feeds every year. It's as simple as this though. SCOM is great for analysis of problems from a high level. It's integration with SRS is also fantastic. What's up gold is great for making sure systems are up and accessible but it doesn't do any real analysis. Unfortunately though, netbots is required for camera monitoring and environmental testing and the meriod of other hardware testing tools for the SAN, HP and Dell servers are absolutely vital because without them failures wouldn't be caught in time. Regards Darragh Ó Héiligh Fujitsu Offices of the Houses of the Oireachtas, Fredrick Building, South Fredrick Street, Dublin2 Telephone: +353 (1) 618 3559 Email: darragh.oheiligh@oireachtas.ie Internet: http://www.oireachtas.ie From: Matthew White <matt@wh1t3.net> To: Blind sysadmins list <blind-sysadmins@lists.hodgsonfamily.org> Date: 11/07/2012 14:36 Subject: Re: [Blind-sysadmins] Information overlode. Sent by: "Blind-sysadmins" <blind-sysadmins-bounces@lists.hodgsonfamily.org> Isn't that what system Center operations manager is supposed to do? I would use a system like that to capture all the incoming alert information and then presented in a much more useful manner. On Jul 11, 2012, at 6:52 AM, Darragh OHeiligh <Darragh.OHeiligh@Oireachtas.ie> wrote:
Hello,
I'm managing more and more systems here but I cant keep up with all the notifications. For example, I just cleared out over six thousand emails
from a folder that is used for air conditioning and environment notifications since the 15th of May.
that's just one system among ....... a lot.
I've notifications from SCOM, VMWare, What's up gold, Diskeeper, Event manage engines syslog, Netbots, Backups, the mail gateway, the SAN and more.
I know there are others out there that have the same amount of responsibilities so my question is, how do you stay up to date with the events. I am tired of being on the back foot. A few years ago I was able to tell when disk utilization was spiking on a server. Now, I'm way behind.
There are just too many alerts coming in.
It's not that the network is in bad condition. For example, one of the application servers is showing high CPU and disk utilization this morning. It could be just that a user is hammering away at it but it could be a dodgy application as well. Eitherway, I need to be aware of it.
You known in linux you can type tail -f *.log in a certain directory and
you'll see all the log files as their written? I want something like that for all my systems.
Unrealistic, I know, but I'm open to ideas.
Everything is tied up in red tape here but there's nothing that cant be done after a well written change request is provided.
Any suggestions?
Regards
Darragh Ó Héiligh Fujitsu
Offices of the Houses of the Oireachtas, Fredrick Building, South Fredrick Street, Dublin2 Telephone: +353 (1) 618 3559 Email: darragh.oheiligh@oireachtas.ie Internet: http://www.oireachtas.ie _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
_______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins