|
Internet Service Providers provide a service to the public like
electricity, which is expected to be working 24 hours a day!
It is extremely important for the ISP to be *first* to know if something does
not work, so that they can fix it as quickly as possible.
If the Customer discovers the problem before the ISP, that can make the
Customer look for another ISP.
In the case of Business
Customers, extended, unrepaired outages can lead to huge losses for the
customer.
|
Monitors should not only alert you to problems, but help in figuring out
what happened, and maybe even automatically fix the problem.
There are many monitors on the market, some hardware, some software,
some expensive, and some free. Some ISP' s use these, some ISP's use
none.
|
Most monitors do *not* fully do the job, here's why:
-
Most Monitor Systems can fail to detect problems.
- They are usually generalized monitors, and therefore not designed with
the specific service (and specific version) of what they are monitoring in
mind, so they can be unaware of what actually constitutes a "failure".
For example, "successfully connecting to a port" does not guarantee "a working service".
- Misconfigurations in services like Web Hosting may leave the main
service working OK, while some components are "down". For example, one
particular customer's web site on a web hosting computer may be misconfigured, and "not working", but a monitor
that only checks "The Web Server" may not catch this failure. (Our monitors do!)
- A monitor that fails to detect problems may be worse than no monitor,
because it gives a "false sense of security".
-
Read More Here about Monitors failing to detect errors
-
Most monitor systems are not "intelligent", and therefore make inaccurate reports
-
For example if one "agent" of a monitoring system detects "an INTERNET CONNECTION is down",
other monitor-agents which are watching services like email, and web hosting,
should *not* report "email is down", etc... because that is not the case.
- The monitor system must be "intelligent". (Our monitors are!)
-
Monitor Systems must try to diagnose, and if possible, fix the problem.
- Having monitor systems that are not only aware of a problem, but can
quickly diagnose what went wrong, and sometimes even fix it automatically, is a very valuable!
- Many monitors cannot do this for two very real reasons:
- They are frequently not programs running on the machines they are monitoring, so cannot restart any services that fail.
- They are generalized and frequently not aware of exactly how the systems they monitor work, are in
the dark as to what happened.
- The monitor designers did not think of it!
- While it is not always possible for a monitor to ascertain the system which failed, and even harder for the monitor to restore service, it is definitely possible. Here is a simple example:
- A Monitor detects a web server is not responding.
- The Monitor detects the web server is running.
- The Monitor kills the web server, and restarts it.
- This may solve the problem, if not, only then is a warning is issued to the administrator.
- If the problem *is* fixed, a "problem discovered and fixed" email is sent to the Administrator.
- The activity is logged for later re-inspection.
-
Most monitor systems frequently report "false alarms".
- Too many false alarms can lead to ignoring real failures.
- False alarms can be avoided with the proper "logic", i.e.
- See if a failure is detected 3 times in 3 minutes
- See if some other failure is in the system (like disk full), which may account for failure
- Etc...
-
Monitor Systems may detect a problem, but fail to "report" it.
-
Monitors consist of two parts, "detecting" the problem" and reporting the
problem".
- Some monitors can detect a problem, but fail to
report it.
- Others report it every 5 minutes until it is
fixed, rather than once only
- Others fail to report "System is Now OK".
|
|
|