<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Heroix Blog</title>
	
	<link>http://blog.heroix.com</link>
	<description>The Heroix blog: Charting Life in the IT Environment</description>
	<pubDate>Mon, 16 Nov 2009 02:04:33 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/heroixmonitoring" /><feedburner:info uri="heroixmonitoring" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><media:thumbnail url="http://blog.heroix.com/wp-content/uploads/heroixlogo1.jpg" /><media:keywords>monitoring,performance,network,availability</media:keywords><media:category scheme="http://www.itunes.com/dtds/podcast-1.0.dtd">Technology/Tech News</media:category><itunes:explicit>no</itunes:explicit><itunes:image href="http://blog.heroix.com/wp-content/uploads/heroixlogo1.jpg" /><itunes:keywords>monitoring,performance,network,availability</itunes:keywords><itunes:subtitle>Find it. Fix it. Forget it.</itunes:subtitle><itunes:summary>Heroix delivers multiplatform, agentless application performance and network monitoring software that assures the highest level of application, system, and network availability and performance. </itunes:summary><itunes:category text="Technology"><itunes:category text="Tech News" /></itunes:category><feedburner:emailServiceId>heroixmonitoring</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Correlating Events to Recognize Problems</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/gxk7q4wDGzY/</link>
		<comments>http://blog.heroix.com/index.php/2009/11/15/correlating-events-to-recognize-problems/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 01:55:09 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Howto]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=801</guid>
		<description><![CDATA[Every engineer and manager who receives alerts from automated monitoring systems can relate to both the critical need they fill and to their often annoying short comings. I&#8217;m not just referring to a situation where the monitoring has recently been installed, and you haven&#8217;t tuned the default thresholds to limit notification to actionable events. The [...]]]></description>
			<content:encoded><![CDATA[<p>Every engineer and manager who receives alerts from automated monitoring systems can relate to both the critical need they fill and to their often annoying short comings. I&#8217;m not just referring to a situation where the monitoring has recently been installed, and you haven&#8217;t tuned the default thresholds to limit notification to actionable events. The nature of monitoring and alerting, even with the most sophisticated programs, is that problem events are based on very narrow criteria, like a server&#8217;s CPU Load, or a router&#8217;s bandwidth consumption, or some application&#8217;s .NET errors. It&#8217;s good to know about these specific problems. But if they are related in a complex problem, that diagnosis can easily be missed when these separate events are surrounded with hundreds or thousands of other random problem events. Recognizing the complex problem is further complicated because the events come from different sources: servers, network appliances, and applications. Of course most real world applications depend on distributed environments for reliable service delivery, and problems occur whose symptoms span multiple devices and programs. If you want to get ahead of the curve to immediately recognize and fix complex problems, then you need to start correlating multiple events so that you can send intelligent notifications that describe the conditions and fixes for complex problems.</p>
<p><strong>What&#8217;s The Problem?</strong></p>
<p>Events can be misleading. Consider an example where several servers are behind a switch. We&#8217;ll further assume that we are monitoring the availability of the switch and the servers. When the switch goes down, what happens? A ton of notification is sent alerting everyone that all the servers are down, which is effectively true, but isn&#8217;t really the problem. Of course eventually the switch down alert comes in with all the server down messages. This is a simple example, where most good engineers will immediately diagnose the problem when they read the switch down alert, but a lot of messages were sent to notify you of the true problem. I always cringe when I know my boss is getting flooded with email that the sky is falling. Now, what if we use some logic in our notification that only sends out server down messages when the switch is OK, and suppresses all the server down messages when the switch goes down? That would be useful. Even better, let&#8217;s configure the switch down message to inform recipients with the list of servers that are unavailable due to the switch being down.</p>
<p>The switch example is easy to understand. Think about how useful correlating much more complex events can be, especially when critical information is included in the notification. Fixing a problem is a lot easier if the problem email or text message includes a concise description of the multiple conditions, what the root cause is, how to fix it, and who to communicate with for help. I&#8217;ve even worked with customers to include links to their own online SOP or Help Desk documentation. In my experience with problem email alerts, less is always better, as long as you always get all the notification you need. Correlation of events is the only way to simultaneously reduce the email count and dramatically improve the quality of information in alerts.</p>
<p><strong>Logically Speaking</strong></p>
<p>A Correlated Event is going to have multiple conditions. Sometimes all conditions must be true. We also want to be able to recognize when some conditions are true, while others are definitely not true. We may even want to specify that some conditions must be true, while others may be true, and some others must not be true (a really complex event&#8230;). We&#8217;re really only using three Booleans, AND, OR, and NOT, where we group the logically similar conditions. The logical order of listing the conditions should be:</p>
<ol>
<li>Conditions that Must Be True</li>
<li>Conditions that May Be True</li>
<li>Conditions that Must Not Be True</li>
</ol>
<p>The number of conditions can vary, but most complex problems can be recognized based on between 2 and 10 conditions. More may be needed when deducing problems along an extensive network path or across multiple application servers, for example.</p>
<p>I find it useful to include some synchronization and the awareness of persistence when correlating events. First, all conditions might not be based on measurements with the same interval. For example, disk statistics are typically collected hourly, whereas availability is tested every 1 to 5 minutes. Many other conditions will have intervals between these two extremes. I may want to specify that all conditions must happen with a specified interval. It&#8217;s also really valuable to be able to specify that <em>X</em> number of events must have happened in the interval. Picture network latency that gets flaky sometimes, but if it persists for <em>X</em> amount of time then it&#8217;s a problem.</p>
<p><strong>Intelligent Notification</strong></p>
<p>Once we recognize a complex problem based on the presence or lack of specific conditions we&#8217;re in a position to provide effective notification that will maximize the probability that a problem is fixed as quickly as possible. You&#8217;re setting up yourself and those around you for success. Here&#8217;s how I recommend configuring Correlated Event notification:</p>
<ol>
<li>Describe the condition set
<ol>
<li>what it means</li>
<li>what&#8217;s the root cause</li>
</ol>
</li>
<li>Describe the procedure to fix the problem
<ol>
<li>Links to documentation</li>
</ol>
</li>
<li>List the interested parties to contact for help
<ol>
<li>ISP Contacts</li>
<li>Network Admins</li>
<li>Server Admins</li>
<li>Application Support</li>
</ol>
</li>
</ol>
<p><strong>How To Do It</strong></p>
<p>You can build a BAT file, VB, Shell, or Perl script to use a CASE test using the Booleans described above, but you&#8217;ll have to build an interface to the database of events. You can even use a well crafted query to select for the conditions of interest. If you use Longitude, then you can just use the Correlated Event actions to define the multiple conditions, interval, and persistence, build your notification. Please <a href="mailto:blogger@heroix.com?subject=Correlating%20Events">email me</a> if you have questions about using Correlated Events.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Correlating+Events+to+Recognize+Problems+" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/gxk7q4wDGzY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/11/15/correlating-events-to-recognize-problems/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/11/15/correlating-events-to-recognize-problems/</feedburner:origLink></item>
		<item>
		<title>The keys to Effective SLAs</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/ZbZPcOWMaos/</link>
		<comments>http://blog.heroix.com/index.php/2009/11/09/the-keys-to-effective-slas/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 05:50:22 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Howto]]></category>

		<category><![CDATA[Service Level Agreements (SLAs)]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=781</guid>
		<description><![CDATA[Service Level Agreements are usually the object of desire, fear, and uncertainty all at the same time. They can be such useful tools that it&#8217;s important to demystify them. SLAs are desirable because they provide accountability and timely feedback to managers. They are to be feared when they include factors beyond control or that are [...]]]></description>
			<content:encoded><![CDATA[<p>Service Level Agreements are usually the object of desire, fear, and uncertainty all at the same time. They can be such useful tools that it&#8217;s important to demystify them. SLAs are desirable because they provide accountability and timely feedback to managers. They are to be feared when they include factors beyond control or that are poorly aligned with reality. SLAs are commonly approached with a high degree of uncertainty about what to measure and how to report results as an effective tool for all parties. While the ingredients in SLAs are as varied as applications and service providers, all effective SLAs share a few critical characteristics.</p>
<p><strong>Good and Bad SLAs</strong></p>
<p>Let&#8217;s start by poking fun at what will be the worst example of an SLA you&#8217;ve ever heard of or that I&#8217;ve been a party to implementing. I should point out this happened long before I became part of the Heroix team. I was brought in to design and implement a monitoring and reporting regime that supported the SLA between a web hosting provider and a Wall Street firm seeking its first web presence. Considering the big-time clients and huge capital expenditure (100 servers in two data centers), I was expecting a challenging assignment with a highly sophisticated and complex set of monitoring requirements. When I received my copy of the SLA, a single paragraph appendix to a large contract, it had one condition:</p>
<ul>
<li><em>No server shall experience greater than 30% average CPU usage during any rolling hour</em></li>
</ul>
<p>You could have knocked me over with a feather. After disbelief, hilarity, and confusion, came concern and agitation. I actually suggested that we provide much more, which would have been included in any basic monitoring regime, and was rebuffed. What&#8217;s obviously wrong with this SLA is that the measure of success has no direct connection with actual service delivery to consumers. It was, however, very easy to measure. So the primary rule in creating effective SLAs is:</p>
<ol>
<li>Measure things that directly impact service delivery or user experience</li>
</ol>
<p>Our first rule provides the guiding principle in answering the questions, &#8220;What should I measure and why should I measure it?&#8221; Of course, the WHY part of the answer should always be &#8220;Because it directly impacts service. Some good examples WHAT to measure would be:</p>
<ul>
<li>Availability of systems and applications</li>
<li>Success of sessions and transactions
<ul>
<li>Web Pages, DB Queries, etc.</li>
</ul>
</li>
<li>Response Times where applicable</li>
<li>Loss of resources critical to service delivery
<ul>
<li>Disk or DB space, Connection or Session limits</li>
</ul>
</li>
</ul>
<p>When selecting SLA measures it&#8217;s important to choose things that you have control over and that can be measured objectively, even if the statistic is as simple 0 for True and 1 for False, as in the case of whether a required TCP port is accepting connections. Either it is (0) or it isn&#8217;t (1). A valuable planning exercise is to picture the data or transaction path, and reserve slots in your SLA for appropriate tests of each potential break point in a service. Using a typical web application example, a consumer connects to a web server, which creates a session on a back end application server, which in turn queries a DB server, ultimately sending a response back to the consumer. In our model the break points are the web, application, and DB servers, plus the network connecting them. By constructing a map of break points to monitor, you place yourself in a position to go beyond simply reporting a service failure by localizing where the service is breaking.</p>
<p>Although I&#8217;m sure you get why it&#8217;s important to localize the point of failure for a service. It is worth examining the answer. Recall that one of our principle goals is to achieve accountability. That doesn&#8217;t just apply to apportioning blame afterwards. It means knowing who owns the component that has failed, and should immediately be given the lead to find a solution. A process of discovery always happens as soon as a service failure is detected. In my experience, quickly identifying who in a group of varied specialists responsible for different technologies should own a problem can be unnecessarily time consuming, <em>if you know what I mean</em>&#8230; This is especially true if an SLA is poorly designed and the data are ambiguous as to the cause of the failure. In a well designed SLA with data from each break point and each team member seeing the same picture, it&#8217;s usually immediately clear who &#8220;owns&#8221; the problem. You actually facilitate taking ownership of the problem and effecting a solution.</p>
<p><strong>How to Report SLA Data</strong></p>
<p>Designing and implementing the best SLA will be for naught if you fail to build accessible views of the data that can easily be assimilated into a concept of operations. In other words, you have to build that intuitive picture of your transaction path that everyone&#8217;s going to share, and put it somewhere everyone can see it quickly and easily. You may have noted that we are discussing using SLA data in the present tense, as in live presentations. Don&#8217;t be confused if you expected an SLA to be some tabular historical report to be compared to contract terms and conditions. An effective SLA is all of these things. What&#8217;s the point of identifying what can impact service delivery if we don&#8217;t use it as an intensive monitor of the health of our critical application? So let&#8217;s use the same data to create live presentations of the application&#8217;s current state, while also generating historical reports of compliance with key standards.</p>
<p>There are some types of data that do not lend themselves to live reporting. For example, log data that&#8217;s collected nightly. Any type of daily or weekly aggregated data should be relegated to historical reporting. That can include both daily detail reports and long term summaries. Any data that is measured at least hourly can be represented effectively in live presentations or dashboards. Remember we want live data to be fresh (last 5-60 minutes) and not have to wait long periods for one component to refresh the picture again. For historical reporting, we&#8217;re really using the same data, just querying for longer periods, like weeks, months, and years (ok, daily if your feeling nervous or obsessive&#8230;).</p>
<p><strong>Putting It All Together</strong></p>
<p>A well designed SLA can be a critical tool for managers and technicians. Hopefully the process of automating it in live SLA dashboards and historical reports will actually reduce the workload on administrators. It should dramatically reduce the time normally spent in discovery when service problems arise. The SLA will provide accountability, timely assistance, and a unified picture. It will enable service providers to report proactively how well they are providing service.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+keys+to+Effective+SLAs+http://bit.ly/44Dfec" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/ZbZPcOWMaos" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/11/09/the-keys-to-effective-slas/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/11/09/the-keys-to-effective-slas/</feedburner:origLink></item>
		<item>
		<title>Monitoring Your IIS Application Pools</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/jeQ1X77OBao/</link>
		<comments>http://blog.heroix.com/index.php/2009/10/09/monitoring-your-iis-application-pools/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 17:43:11 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[Effective Tech]]></category>

		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Howto]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=705</guid>
		<description><![CDATA[I had intended to begin a new theme in October, but I got an inquiry from a customer last week for which the answer may help others, and it definitely fits into the Free Stuff theme. It can be extremely difficult to understand and solve problems for mission critical IIS applications without a good view [...]]]></description>
			<content:encoded><![CDATA[<p>I had intended to begin a new theme in October, but I got an inquiry from a customer last week for which the answer may help others, and it definitely fits into the Free Stuff theme. It can be extremely difficult to understand and solve problems for mission critical IIS applications without a good view of application pool use. IIS7 makes the process of application pool monitoring and recovery very easy in the IISMgr interface. You have access to <a href="http://www.iis.net/ConfigReference/system.applicationHost/applicationPools/add/processModel" target="_blank">modify the Process Model</a> in detail. In IIS6 and below, the application pools appear as w3wp.exe processes in Task Manager. But it&#8217;s not clear which w3wp.exe process represents which application pool. Fortunately, Microsoft again provides a VBS script to figure that out. That&#8217;s just half the battle, though. Once we know the specific process that supports each application pool we want to get some detailed process statistics. We could populate several WMI queries with the process PIDs, but an easier, more efficient method is to download and install Microsoft&#8217;s Sysinternals PsList.exe utility. It&#8217;s free and provides all the process and thread information we&#8217;ll want.</p>
<p><strong>How do we identify our application pool processes?</strong></p>
<p>For IIS6 and below, begin setting up your application pool monitoring by ensuring we can run the IISAPP.VBS script on the monitored system. You either have to wrap the call to the script in the command that executes CScript, or you have to set CScript as the default scripting language. For example,</p>
<ul>
<li>Call CScript in front of the script name: cscript iisapp.vbs //NOLOGO</li>
<li>Set CScript as the default: CScript //H:cscript //NOLOGO //S</li>
</ul>
<p>Now we can run the iisapp.vbs script to identify the application pool processes. The parameters for the script are /a, ApplicationPoolName and /p, PID, like this:</p>
<p>W3wp.exe PID: 2232 AppPoolID: DefaultAppPool<br />
W3wp.exe PID: 1160 AppPoolID: MyAppPool</p>
<p><strong>NB:</strong> If you have more than one process for each pool, you probably have Web Garden (multiple worker processes) enabled, and should seriously <a href="http://forums.iis.net/t/1160897.aspx" target="_blank">evaluate if this provides any benefit</a>.</p>
<p>So if MyApp is the pool I care about I know its PID is 1160, and we can move on to step two.</p>
<p>For IIS7 on Windows Server 2008 life is much easier. You can view the application pools in the tree view of IISMgr. From the command line, issue the following command:</p>
<p>%windir%\system32\inetsrv\appcmd list apppool</p>
<p>From the IISMgr interface, you can right-click on each application pool and set up ping and recovery parameters, process model details, security, and more.</p>
<p><strong>TIP:</strong> You want to create separate application pools in IIS for each application. This prevents one application&#8217;s pool problem from taking down all applications. Plus, it makes it easier to do detailed workload characterization on each application.</p>
<p><strong>How do we get statistics for our IIS6 application pool processes?</strong></p>
<p>Here&#8217;s where we use Sysinternals&#8217; PsList.exe to get information on IIS6 application pool processes. There are many parameters, or arguments, but the important ones for us are:</p>
<ul>
<li>-X; Shows process, memory, and thread information.</li>
<li>PID; For example, PsList.exe -x 1160; we don&#8217;t actually include &#8220;PID&#8221; in the command, so it&#8217;s implied that the number following the command is a PID.</li>
</ul>
<p>The results of the command above look like this:</p>
<p>C:\Windows\system32&gt;pslist -x 1160</p>
<p>pslist v1.28 - Sysinternals PsList<br />
Copyright - 2000-2004 Mark Russinovich<br />
Sysinternals</p>
<p>Process and thread information for STUDIO:</p>
<pre>Name Pid Pri Thd Hnd Priv CPU Time Elapsed Time
w3wp 1160 8 18 276 12020 0:00:00.483 1:23:57.316
VM WS Priv Priv Pk Faults NonP Page
115204 20500 12020 12892 7141 41 189
Tid Pri Cswtch State User Time Kernel Time Elapsed Time
1164 10 66 Wait:Executive 0:00:00.000 0:00:00.015 1:23:57.316
1176 10 64 Wait:UserReq 0:00:00.000 0:00:00.000 1:23:57.303
1188 10 6063 Wait:UserReq 0:00:00.000 0:00:00.046 1:23:57.293</pre>
<p>At the top of the output are the statistics for the w3wp process. Below follows a summary for each thread, including its Context Switches, Current State, User CPU Time, Kernel CPU Time, and Total Elapsed Time. This is all the information we&#8217;ll need.</p>
<p><strong>For more Information</strong></p>
<p><a href="http://blogs.msdn.com/david.wang/archive/2005/08/29/HOWTO_Understand_and_Diagnose_an_AppPool_Crash.aspx" target="_blank">Diagnosing Application Pool crashes and hangs</a></p>
<p><a href="http://technet.microsoft.com/en-us/library/cc753449%28WS.10%29.aspx" target="_blank">Managing IIS7 Application Pools</a></p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Monitoring+Your+IIS+Application+Pools+" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/jeQ1X77OBao" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/10/09/monitoring-your-iis-application-pools/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/10/09/monitoring-your-iis-application-pools/</feedburner:origLink></item>
		<item>
		<title>Monitoring Your Custom .NET Web Services</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/K6gGzvCXuIA/</link>
		<comments>http://blog.heroix.com/index.php/2009/09/30/monitoring-your-custom-net-web-services/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 20:15:08 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[Effective Tech]]></category>

		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=608</guid>
		<description><![CDATA[The .NET platform makes it fairly easy to write and deploy Web Services quickly. Being able to support both web-based (ASP.NET app) consumers and Windows application-based consumers from the same Web Service is a welcome efficiency. Finding tools to monitor your custom Web Services can be problematic and time consuming, but happily WMI provides classes [...]]]></description>
			<content:encoded><![CDATA[<p>The .NET platform makes it fairly easy to write and deploy Web Services quickly. Being able to support both web-based (ASP.NET app) consumers and Windows application-based consumers from the same Web Service is a welcome efficiency. Finding tools to monitor your custom Web Services can be problematic and time consuming, but happily WMI provides classes to monitor them. Including .NET application workload information on top of basic IIS Server monitoring is essential to achieving accurate performance and capacity planning analysis, which lie at the heart of service assurance and forecasting both requirements and capital budgets.</p>
<p><strong>What types of data will we want?</strong></p>
<p>So here is how you build a good picture of workload and performance on .NET platforms. We will want our analysis to include both trend analysis and workload characterization. Trend analysis is simply looking at how the peaks are growing and estimating when resources will be exhausted. Workload Characterization includes examining the contribution of component workload processes during peak periods. Peak periods are important because they clearly show the relationship between capacity required to assure service and current resource consumption. First, define what needs to be included in the data collection to support the desired analysis, but it generally includes:</p>
<ul>
<li>Overall resource consumption on the platform</li>
<li>Contributions of constituent workload processes (.NET apps) to global resource consumption</li>
</ul>
<p><strong>How much data will we need?</strong></p>
<p>Next, we need to collect the data for our analysis. Usually, periods of data need to include sufficient samples and time in order to derive trends with high confidence. If you look at data over two weeks your trends will not have enough historical basis to be valid more than a few days into the future. A trend based on data collected over 12 months provides much higher confidence that a projection is valid several months out, or sufficient time to order new hardware. For trend analysis, it&#8217;s not so much the granularity of the samples that is important as that the data set includes all the normal variations seen in real workload processing plus the actual growth of resource consumption over time. In fact, an important best practice for maintaining the data set is to start summarizing trivial period measurements into fewer averaged samples. For example, summarize minutes into hours and hours into days. Without getting into all the statistical calculations specifically, it is sufficient for our analysis to reduce year-long analysis to days, weeks, and months, instead of minutes, hours, and days. We should strive to keep just enough data to meet our needs, without retaining data that is no longer statistically useful. In other words, we can get rid of a lot of fine granularity data (and storage space) that could be summarized into larger samples. For workload and performance analysis we do need data with finer granularity, but it need not cover long time periods, so long as it includes peak periods. So a sound strategy for data maintenance is to keep fine granularity data for a relatively short period, and start to summarize after 30 to 90 days.</p>
<p><strong>What data should we collect?</strong></p>
<p>Alright, now for what you want to collect for .NET platforms:</p>
<ul>
<li><em>Baseline Windows performance statistics</em>
<ul>
<li>Win32_PerfRawData_PerfOS_System (Proc &amp; Thread Count, Q-Len)</li>
<li>Win32_PerfRawData_PerfOS_Processor (% Processor Time)</li>
<li>Win32_PerfRawData_PerfOS_Memory (Free Mem &amp; Page Faults)</li>
<li>Win32_PerfRawData_PerfDisk_LogicalDisk (Space, IO Ops and Q-Len)</li>
</ul>
</li>
<li><em>IIS performance Statistics</em> (This will vary based on the type of applications)
<ul>
<li>Win32_PerfRawData_ASP_ActiveServerPages (Sessions, Requests, and Error)</li>
<li>Win32_PerfRawData_InetInfo_InternetInformationServicesGlobal (Cache Info)</li>
<li>Win32_PerfRawData_W3SVC_WebService (Total Connections and Throughput)</li>
<li>Win32_PerfRawData_NETFramework_NETCLRMemory (Heap and GC info)</li>
</ul>
</li>
<li><em>.NET Process Statistics</em>
<ul>
<li>Win32_PerfRawData_W3SVC_WebService (Connections, Throughput , and Requests by application)</li>
<li>Win32_PerfRawData_ASPNET_2050727_ASPNETv2050727 (Requests and Sessions)</li>
<li>Win32_PerfRawData_ASPNET_2050727_ASPNETAppsv2050727 (Requests, Cache, Errors)</li>
</ul>
</li>
</ul>
<p>There are numerous .NET WMI Classes that can be easily viewed in the WBEMTEST interface, so if you have a very specialized application, it&#8217;s likely you will find the specific class that tracks what your application is doing.</p>
<p><strong>Turning Data Into Useful Information</strong></p>
<p>While collecting the data with WMI is straight forward, the same is not always true of interpreting the data. The win32_PerfRawData classes are just that, RAW. The statistics need to be <a href="http://msdn.microsoft.com/en-us/library/ms974615.aspx" target="_blank">&#8216;cooked&#8217; (calculated) according to their data type</a>. If you&#8217;d like to avoid this calculation, an easy way around it is to use the TypePerf command to get the data from PerfMon instead of WQL queries to WMI. WMI is a lot more efficient than launching numerous command processes running TypePerf, though. Once collected and cooked, store the data in a relational database; I use MySQL compliant MaxDB.</p>
<p>You will want to generate two types of representations of the data, Long Term Trends and Peak Profiles. Weekly or monthly periods of peak usage are great for profiling, depending on your business cycle processing. I like to build two graphs for my reports with both a trend over time and weekly or monthly profile. Here are sample queries (MySQL) for sorting the data from the ASPNETAppsv2050727 WMI Class:</p>
<p><em>Monthly Profile</em></p>
<p>select dayofmonth(aspr.sampletime) as SampleTime,<br />
aspr.instance as ApplicationName,<br />
round(avg(aspr.reqexectime),0) as AvgReqExecTime,<br />
round(avg(aspr.reqwaittime),0) as AvgReqWaitTime,<br />
round(avg(aspr.reqexecuting),0) as AvgReqExecuting,<br />
round(avg(aspr.reqbytesintotal),0) as AvgReqBytesInTotal,<br />
round(avg(aspr.reqbytesouttotal),0) as AvgReqBytesOutTotal<br />
from aspdotnetv2_appsreq aspr<br />
where ($TIME_CONDITION)<br />
group by dayofmonth(aspr.sampletime), aspr.instance</p>
<p><em>Weekly Profile</em></p>
<p>select dayname(aspr.sampletime) as SampleTime,<br />
aspr.instance as ApplicationName,<br />
round(avg(aspr.reqexectime),0) as AvgReqExecTime,<br />
round(avg(aspr.reqwaittime),0) as AvgReqWaitTime,<br />
round(avg(aspr.reqexecuting),0) as AvgReqExecuting,<br />
round(avg(aspr.reqbytesintotal),0) as AvgReqBytesInTotal,<br />
round(avg(aspr.reqbytesouttotal),0) as AvgReqBytesOutTotal<br />
from aspdotnetv2_appsreq aspr<br />
where ($TIME_CONDITION)<br />
group by dayname(aspr.sampletime), aspr.instance</p>
<p style="text-align: left;">To query for usage over time for trending, we simply eliminate the time functions in the statement, DayOfMonth and DayName. Also remove the aggregation of the variables so that</p>
<p style="text-align: center;">round(avg(aspr.reqexectime),0) as AvgReqExecTime</p>
<p style="text-align: left;">becomes much simpler as</p>
<p style="text-align: center;">aspr.reqexectime as ReqExecTime</p>
<p style="text-align: left;">Now that we have all the pieces to build our analysis reports. What are the burning questions that managers want answered in the reports?</p>
<ul>
<li><strong>How well is the current platform supporting the load?</strong>
<ul>
<li>&#8220;<a href="http://www.heroix.com/aspscript/wp_sla_form.asp" target="_blank">Monitoring Service Levels</a> &amp; Performance Evaluation&#8221;</li>
</ul>
</li>
<li><strong>How long will the current platform support Peak Usage?</strong>
<ul>
<li>&#8220;<a href="http://www.cmg.org/forum/viewforum.php?f=2" target="_blank">Capacity Planning</a>&#8220;</li>
</ul>
</li>
<li><strong>What does each application contribute to overall load?</strong>
<ul>
<li>&#8220;<a href="http://cs.gmu.edu/~menasce/papers/IEEE-IC-WorkloadCharacterization-September2003.pdf" target="_blank">Workload Characterization</a>&#8220;</li>
</ul>
</li>
<li><strong>Should the applications be co-located?</strong>
<ul>
<li>&#8220;Workload Balancing&#8221;</li>
</ul>
</li>
</ul>
<p>With our data we can show the total load on the .NET platform. We can even go further to characterize the load that each .NET application contributes to the global activity. By examining the weekly and monthly profiles that show the relative activity of the applications we can characterize the workload sufficiently to perform workload balancing and understand peak usage constraints better. Extend the timeline and view the trend to forecast the point at which the current platform will find resources saturated. Now, this can be a bit time consuming to set up, so as usual I built a few custom .NET and Web Services solutions and reports for Longitude; if you&#8217;d like copies, just send me an <a href="mailto:blogger@heroix.com?subject=Monitoring%20Custom%20Web%20Services">email</a>.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Monitoring+Your+Custom+.NET+Web+Services+http://bit.ly/Jn8n" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/K6gGzvCXuIA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/09/30/monitoring-your-custom-net-web-services/feed/</wfw:commentRss>
		<enclosure url="http://cs.gmu.edu/~menasce/papers/IEEE-IC-WorkloadCharacterization-September2003.pdf" length="343117" type="application/pdf" /><media:content url="http://cs.gmu.edu/~menasce/papers/IEEE-IC-WorkloadCharacterization-September2003.pdf" fileSize="343117" type="application/pdf" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>The .NET platform makes it fairly easy to write and deploy Web Services quickly. Being able to support both web-based (ASP.NET app) consumers and Windows application-based consumers from the same Web Service is a welcome efficiency. Finding tools to monit</itunes:subtitle><itunes:summary>The .NET platform makes it fairly easy to write and deploy Web Services quickly. Being able to support both web-based (ASP.NET app) consumers and Windows application-based consumers from the same Web Service is a welcome efficiency. Finding tools to monitor your custom Web Services can be problematic and time consuming, but happily WMI provides classes [...]</itunes:summary><itunes:keywords>monitoring,performance,network,availability</itunes:keywords><feedburner:origLink>http://blog.heroix.com/index.php/2009/09/30/monitoring-your-custom-net-web-services/</feedburner:origLink></item>
		<item>
		<title>Monitoring Windows Event Logs</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/uELXecklsHQ/</link>
		<comments>http://blog.heroix.com/index.php/2009/09/21/monitoring-windows-event-logs/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 21:22:19 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[Event Logs]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=590</guid>
		<description><![CDATA[In a continuation of our theme this month of cost saving yet effective monitoring techniques, we&#8217;re going to look at a problem brought to me by a customer in Singapore that we solved with event log monitoring. In our example, the admin spends a lot of time on the phone with users who&#8217;ve locked themselves [...]]]></description>
			<content:encoded><![CDATA[<p>In a continuation of our theme this month of cost saving yet effective monitoring techniques, we&#8217;re going to look at a problem brought to me by a customer in Singapore that we solved with event log monitoring. In our example, the admin spends a lot of time on the phone with users who&#8217;ve locked themselves out of their account. Fixing these problems quickly is a priority. In a large, distributed environment managing domain user security issues can be a challenge. Users lock themselves out of their accounts, they log in where they shouldn&#8217;t, accounts expire and get disabled, systems shutdown and startup, login services fail, and many more events are recorded in security event logs that can grow to be extremely large. Unfortunately, parsing huge event logs remotely can be both time consuming and resource intensive using standard protocols, like WMI to query for events. Trying to parse the security log with its many thousands of security audits can become impractical when WMI queries start to take from 3 to 20 minutes to complete.</p>
<p>Once again our solution is provided by some cool free stuff, the Windows eventquery.vbs script. Microsoft provides this script on Windows 2003 Server in the system32 directory. The script lets you query the Event Logs much faster than reading through them. It takes several arguments that you can see by running the command, &#8220;eventquery /?&#8221; from a CMD Prompt. Here are the important ones for our solution:</p>
<ul>
<li>S - Server to query</li>
<li>FO - Format; we use the argument CSV for comma-separated values</li>
<li>L - Log to query (Security, System, Application, etc.)</li>
<li>FI - Filter; we filter on DATETIME and ID
<ul>
<li>Other Useful Filters: User, Computer, Source, Type (i.e. Errors)</li>
</ul>
</li>
<li>V - Verbose</li>
</ul>
<p>In my testing, searching for particular events using the eventquery vbscript is orders of magnitude faster than using WMI, especially on the biggest security logs. Furthermore, the method doesn&#8217;t demand the large IO bandwidth of solutions that download the entire logs. I built a BAT file to take a few arguments and execute the vbscript so that it only gets events in the last 5 minutes. The time threshold is calculated by another vbscript that I borrowed from my colleague Susan.</p>
<p><strong>Here&#8217;s what&#8217;s in the BAT file:</strong></p>
<p>set Server=%1<br />
set File=%2<br />
set EvtID=%3</p>
<p>FOR /F &#8220;tokens=1 delims=;&#8221; %%i IN (&#8217;cscript //nologo //b E:\scripts\time_threshold.vbs&#8217;) DO set TimeThreshold=%%i</p>
<p>EVENTQUERY.vbs /S %Server% /V /FO CSV /L %File% /FI &#8220;DATETIME gt %TimeThreshold% AND Id eq %EvtID%&#8221;</p>
<p><strong>Here&#8217;s the time_threshold Vb script:</strong></p>
<p>Dim MonthStr<br />
Dim DayStr<br />
Dim HrStr<br />
Dim MinStr<br />
Dim SecStr<br />
MonthStr = DatePart(&#8221;m&#8221;,DateAdd(&#8221;n&#8221;,-5,Now))<br />
if Len(MonthStr) = 1 then MonthStr= &#8220;0&#8243; &amp; MonthStr<br />
DayStr = DatePart(&#8221;d&#8221;,DateAdd(&#8221;n&#8221;,-5,Now))<br />
if Len(DayStr) = 1 then DayStr= &#8220;0&#8243; &amp; DayStr<br />
HrStr = Hour(DateAdd(&#8221;n&#8221;,-5,Now))<br />
if HrStr &gt; 12 then HrStr= (HrStr - 12)<br />
if Len(HrStr) = 1 then HrStr= &#8220;0&#8243; &amp; HrStr<br />
MinStr = Minute(DateAdd(&#8221;n&#8221;,-5,Now))<br />
if Len(MinStr) = 1 then MinStr= &#8220;0&#8243; &amp; MinStr<br />
SecStr = Second(DateAdd(&#8221;n&#8221;,-5,Now))<br />
if Len(SecStr) = 1 then SecStr= &#8220;0&#8243; &amp; SecStr<br />
wscript.stdout.write(MonthStr &amp; &#8220;/&#8221; &amp; DayStr &amp; &#8220;/&#8221; &amp; Right(DatePart(&#8221;yyyy&#8221;,Date),2) &amp; &#8220;,&#8221; &amp; HrStr &amp; &#8220;:&#8221; &amp; MinStr &amp; &#8220;:&#8221; &amp; SecStr &amp; Right(DateAdd(&#8221;n&#8221;,-5,Now),2))</p>
<p>This collection method gave us a way to get time-critical information from huge Windows Security Event Logs with very short interval tests every few minutes, instead of every 15 minutes or longer. Now the admin knows before the user calls that somebody is locked out or that a server has rebooted.</p>
<p><strong>TIP</strong> - if you want to run .VBS scripts from a BAT file set the default script language using &#8220;CSCRIPT //H:CSCRIPT //S&#8221;.</p>
<p>As usual, I created a Longitude solution that uses this method. It&#8217;s called EventLogQuery. Please send me an <a href="mailto:blogger@heroix.com?subject=Monitoring%20Windows%20Event%20Logs">email</a> if you&#8217;d like a copy.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Monitoring+Windows+Event+Logs+http://bit.ly/6JsCV" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/uELXecklsHQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/09/21/monitoring-windows-event-logs/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/09/21/monitoring-windows-event-logs/</feedburner:origLink></item>
		<item>
		<title>Monitoring IIS User Experience with Free Stuff</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/dD0uA_2YiAc/</link>
		<comments>http://blog.heroix.com/index.php/2009/09/17/monitoring-iis-user-experience-with-free-stuff/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 15:53:47 +0000</pubDate>
		<dc:creator>Chris Smith</dc:creator>
		
		<category><![CDATA[Effective Tech]]></category>

		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Howto]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=581</guid>
		<description><![CDATA[In these days of shrinking budgets there doesn&#8217;t seem to be any less emphasis on providing timely user experience information, or at least monitoring web site traffic. I&#8217;m surprised to find how many users don&#8217;t avail themselves of an outstanding source of free information, monitoring IIS Extended Logs. It amazes me to see how much [...]]]></description>
			<content:encoded><![CDATA[<p>In these days of shrinking budgets there doesn&#8217;t seem to be any less emphasis on providing timely user experience information, or at least monitoring web site traffic. I&#8217;m surprised to find how many users don&#8217;t avail themselves of an outstanding source of free information, monitoring IIS Extended Logs. It amazes me to see how much time, effort, and money are spent engineering methods to sample or capture the time it takes to get pages, which pages are gotten the most, how many Bytes are being transferred, and which pages referred users to the big hitting pages. All this information is available in the IIS Extended Logs.</p>
<p>IIS Log Monitoring is a snap to set up. First, not all the useful statistics named above are captured by default. Selecting what you want to be logged is easily configured in the Extended Logging Properties. The controls are under the Web Site properties on Win 2003 Server and under Site\Logging on Win 2008 Server. You can select the fields that you want to keep track of. So all you have to do is click on the check box beside Time Taken and Sent Bytes to start logging that data. The best part is that you don&#8217;t even have to stop or restart IIS. IIS starts using the new logging format as soon as you save the changes.</p>
<p>Why you might care about some info:</p>
<ul>
<li>Server data - This is useful for understanding load in a clustered or load-balanced environment, e.g. which server or IP served which pages,</li>
<li>HTTP Status (scStatus) - Server Client Status is the numeric status of the operation, for example, a Get of a page may yield a status of 200 (OK) or 404 (Page Not Found),</li>
<li>Bytes Sent and Received - from the server perspective; this is best converted to a larger number such as KBytes or even MBytes for IO intensive sites,</li>
<li>Time Taken - The length of time the action (e.g. a GET) took in milliseconds; I usually convert this to seconds, and set a threshold to capture slow pages,</li>
<li>Referrer - The site the user last visited or provided a link to the current page; this is not always populated depending on how the user got the page,</li>
<li>Port - only needed if you serve both HTTP and HTTPS, and want to differentiate between the two connection types.</li>
</ul>
<p>Why you might not care about other info:</p>
<ul>
<li>User Name - this is normally not captured for HTTP connections, but could be useful for HTTPS connections,</li>
<li>URI Query - only used for dynamic pages, and often not used, but is available,</li>
<li>Win32 Status - non-intuitive data that doesn&#8217;t add to the Server&#8217;s HTTP status,</li>
<li>Protocol Status - sub-status error code; I&#8217;ve never used it,</li>
<li>User Agent - this is the browser the user employed to view your site, and is only useful when debugging compatibility issues,</li>
<li>Cookie - this is a very specialized need rarely used, and people always know when they need to use it,</li>
<li>csHost - Client host - will not usually be captured, and is largely irrelevant to aggregated views of web service.</li>
</ul>
<p>You can parse the IIS logs with a FOR Loop in a Windows BAT file that counts the unique page hits, plus accumulates the Time Taken data to calculate Max, Min, and Avg times for each page. Or, you can <a href="mailto:blogger@heroix.com?subject=Monitoring%20IIS%20Logs">send me an email</a>, and I&#8217;ll give the Longitude solution I recently created for IIS web traffic monitoring. Longitude has a built in File Parser that I used for the collector. Even better, I configured the collector to set a pointer so that it only reads new entries in logs that can be really huge. I also created a few reports to go with the collector: Page Hits Summary, Slow Pages, and Page Referral Analysis, which I&#8217;ll be happy to pass along&#8230;</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Monitoring+IIS+User+Experience+with+Free+Stuff+http://bit.ly/ODX5x" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/dD0uA_2YiAc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/09/17/monitoring-iis-user-experience-with-free-stuff/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/09/17/monitoring-iis-user-experience-with-free-stuff/</feedburner:origLink></item>
		<item>
		<title>Monitoring Tips for Smooth Vacation Re-Entry</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/SMxITKlakQo/</link>
		<comments>http://blog.heroix.com/index.php/2009/09/09/monitoring-tips-for-smooth-vacation-re-entry/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 18:41:34 +0000</pubDate>
		<dc:creator>Mary Masi-Phelps</dc:creator>
		
		<category><![CDATA[Effective Tech]]></category>

		<category><![CDATA[General Monitoring Tips]]></category>

		<category><![CDATA[Howto]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=560</guid>
		<description><![CDATA[Now that Labor Day has come and gone and kids are back in school, many of us are reorienting ourselves after summer vacation. If you were able to plan ahead for vacation time performance monitoring, perhaps you are coming back to a reasonably orderly situation. Even the best laid plans are subject to change, though, [...]]]></description>
			<content:encoded><![CDATA[<p>Now that Labor Day has come and gone and kids are back in school, many of us are reorienting ourselves after summer vacation. If you were able to <a href="http://blog.heroix.com/index.php/2009/06/24/plan-it-coverage-and-systems-monitoring-for-when-you-are-gone/">plan ahead for vacation time performance monitoring</a>, perhaps you are coming back to a reasonably orderly situation. Even the best laid plans are subject to change, though, and it&#8217;s likely that at least something about your IT infrastructure or application workload changed while you were away. With <a href="http://www.heroix.com/agentless/network_monitoring_software.htm">Longitude software for application performance monitoring</a>, if you come back to some unexpected performance or availability events, or if you&#8217;re just not sure what to expect when you return to the office, there are some easy ways to get back in the driver&#8217;s seat:</p>
<ul>
<li>Run an event report (application events as well as SLA events) to get a quick look at what happened over the past week or two. Then dig in with your favorite Event Monitor dashboard.
</li>
<li>Use the Event Monitor&#8217;s pulldown filters to focus in on events based on severity, time of occurrence, or application; work on the most urgent events, and once they&#8217;re cleared up you can change the filter and move on to the less critical issues.</li>
<li>Use the Event Monitor&#8217;s drill-down features to learn more about unfamiliar events and their underlying root causes.</li>
<li>If an event is being triggered too easily, you can change performance thresholds right in the Event Monitor, and Longitude will show you average, minimum and maximum workload values to help you make the right adjustments.</li>
<li>Remember that you can enable actions right from the Event Monitor, and you can even suspend events if you don&#8217;t want to keep receiving notifications while you work on a problem.</li>
</ul>
<p>Regardless of whether you use Longitude, another <a href="http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems" target="_blank">commercial network monitoring software product</a>, or your own home grown scripts, try to do a little monitoring tune-up like this on a regular basis - monthly is probably sufficient - to keep your application performance monitoring practices up to date. It&#8217;s a little like cleaning out your Inbox or your coat closet at home - doing it as you go can be a real time saver in the long run.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Monitoring+Tips+for+Smooth+Vacation+Re-Entry+http://bit.ly/f9tVB" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/SMxITKlakQo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/09/09/monitoring-tips-for-smooth-vacation-re-entry/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/09/09/monitoring-tips-for-smooth-vacation-re-entry/</feedburner:origLink></item>
		<item>
		<title>Crisis Can be Opportunity for Action</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/bW2Ka6DvKA0/</link>
		<comments>http://blog.heroix.com/index.php/2009/08/14/crisis-can-be-opportunity-for-action/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 10:09:04 +0000</pubDate>
		<dc:creator>Dave Atkins</dc:creator>
		
		<category><![CDATA[Dealing with Crisis]]></category>

		<category><![CDATA[Effective Tech]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=540</guid>
		<description><![CDATA[We&#8217;re all familiar with the risk-averse adage, &#8220;if it ain&#8217;t broke, don&#8217;t fix it,&#8221; and in engineering, it IS often wise to avoid tinkering with things when there is no good reason to risk the unexpected. Unfortunately, this is a stifling environment for creative professionals that misses out on genuine opportunities to improve service levels. [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re all familiar with the risk-averse adage, &#8220;if it ain&#8217;t broke, don&#8217;t fix it,&#8221; and in engineering, it IS often wise to avoid tinkering with things when there is no good reason to risk the unexpected. Unfortunately, this is a stifling environment for creative professionals that misses out on genuine opportunities to improve service levels. I&#8217;ve found it handy to have a few things &#8220;ready to go&#8221; in a crisis&#8230;actions that fall into two categories:</p>
<p>First, there are the &#8220;we&#8217;ll do that the next time we have to reboot the server,&#8221; tasks. It&#8217;s helpful to have a set of tasks like this around for when there is downtime&#8211;while you are waiting for the phone call from vendor support or while the entire office is offline due to something beyond your control such as a localized communications outage. &#8220;Since no one can access the internet for the next couple hours&#8230;we will be performing some systems maintenance tasks&#8230;&#8221; These kinds of tasks should be well-contained and unrelated to the crisis at hand&#8211;i.e. don&#8217;t start things you can&#8217;t finish.</p>
<p>But the big opportunity presented by a crisis is the chance to try a new business process&#8211;after the immediate crisis is covered. The &#8220;Why?&#8221; is fresh in people&#8217;s minds. There will be a receptive audience asking, &#8220;how can we prevent a problem like this from happening again?&#8221; The IT department&#8211;usually not highly visible to the rest of the business&#8211;now has the attention of everyone, for good or bad. Have something good to propose and be ready to use the crisis to support your arguments for change.</p>
<p>In one company for whom I worked, a series of service disruptions provided the motivation to finally achieve management support for a platform upgrade. We had been running many websites on many servers with divergent code bases and rotating pagers through three people who had become accustomed to waking every night at 3am to restart IIS or maybe even reboot a server. We tried many things to work within the confines of the old systems&#8211;and we communicated what we were doing all along so management knew we were doing everything we could. But finally, a major outage in the middle of the day&#8211;as the company was trying to find customers and investors to avoid going out of business&#8211;provided the impetus to agree on a total development freeze to allow 3 months for platform migration.</p>
<p>It is a delicate balance. Nobody wants to hear an IT department complaining that they can&#8217;t do anything and that they need a crisis to act. Or worse, to present your  ideas for solving problems and have a reaction like, &#8220;Well that&#8217;s great. Why haven&#8217;t you already done this?? Oh, sorry, we still don&#8217;t have any money.&#8221; So it is important, when you play the &#8220;now is the time to act!&#8221; card to have established a credible basis of competence&#8230;the systems ARE being held together with duct tape and aggressively monitored with <a href="http://www.heroix.com/agentless/network_monitoring_software.htm" target="_blank">the best application and performance monitoring systems you can find</a>&#8211;and you can keep doing that&#8230;but even if you optimize your response time so that you can fix things in 10 minutes every time, users are going to notice and service levels will be affected.</p>
<p>When you encounter a situation that has you thinking, &#8220;I can&#8217;t believe they do this!&#8221; get over the frustration and plan for the day when the barriers to change can be blown away by crisis. Tinker on the development server and have a solution well-baked so you can roll it out: &#8220;to address performance and reliability problems, we are launching a new system&#8230;please be patient while we work out any new problems.&#8221; Crisis is not the time for experimentation: &#8220;We&#8217;ll try upgrading the software and see if that fixes it.&#8221; But it can be a time to step up and lead an organization out of the wilderness&#8230;to recognize&#8230;it really was broke, and now we need to fix it.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Crisis+Can+be+Opportunity+for+Action+" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/bW2Ka6DvKA0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/08/14/crisis-can-be-opportunity-for-action/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/08/14/crisis-can-be-opportunity-for-action/</feedburner:origLink></item>
		<item>
		<title>Never Waste a Good Crisis</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/H-SPZBctiFM/</link>
		<comments>http://blog.heroix.com/index.php/2009/08/13/never-waste-a-good-crisis/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 13:02:37 +0000</pubDate>
		<dc:creator>Mary Masi-Phelps</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=544</guid>
		<description><![CDATA[White House Chief of Staff Rahm Emanuel is credited with coining the adage, &#8220;never waste a good crisis.&#8221; Aside from providing highly quotable fodder for online pundits and comedians, Emanuel makes an excellent point: often a crisis presents an opportunity to try new ideas or embark on new initiatives that we wouldn&#8217;t have time for, [...]]]></description>
			<content:encoded><![CDATA[<p>White House Chief of Staff Rahm Emanuel is credited with coining the adage, &#8220;never waste a good crisis.&#8221; Aside from providing highly quotable fodder for online pundits and comedians, Emanuel makes an excellent point: often a crisis presents an opportunity to try new ideas or embark on new initiatives that we wouldn&#8217;t have time for, think are possible or even think of during ordinary times.</p>
<p>And the same applies to IT organizations and how they manage their mission critical applications. The typical IT department is always busy implementing new projects - rolling out new applications, building disaster recovery plans, and the like. Too often, regular, proactive monitoring falls by the wayside or simply falls behind the organization&#8217;s changing business practices and becomes irrelevant. Now that budget pressures have put many of these projects on hold, this is the perfect opportunity to step back and assess your company&#8217;s application performance needs.</p>
<p><a class="alignleft" title="Application Performance Monitoring Podcast - Never Waste a Good Crisis" href="http://www.heroixdownload.com/Podcasts/Never_Waste_a_Good_Crisis.MP3" target="_blank">Listen to our latest application performance monitoring podcast to learn more.</a></p>
<p>If you&#8217;re interested in learning more about application performance monitoring and how Longitude can help you improve service levels, we invite you to visit our web site at <a href="http://www.heroix.com" target="_blank">www.heroix.com</a> and <a href="http://www.heroix.com/aspscript/script_longitude_demo_form.asp" target="_blank">request a Longitude free trial</a>.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Never+Waste+a+Good+Crisis+http://bit.ly/ZmvHM" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/H-SPZBctiFM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/08/13/never-waste-a-good-crisis/feed/</wfw:commentRss>
<enclosure url="http://www.heroixdownload.com/Podcasts/Never_Waste_a_Good_Crisis.MP3" length="4678844" type="audio/mpeg" />
		<media:content url="http://www.heroixdownload.com/Podcasts/Never_Waste_a_Good_Crisis.MP3" fileSize="4678844" type="audio/mpeg" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>White House Chief of Staff Rahm Emanuel is credited with coining the adage, &amp;#8220;never waste a good crisis.&amp;#8221; Aside from providing highly quotable fodder for online pundits and comedians, Emanuel makes an excellent point: often a crisis presents an</itunes:subtitle><itunes:summary>White House Chief of Staff Rahm Emanuel is credited with coining the adage, &amp;#8220;never waste a good crisis.&amp;#8221; Aside from providing highly quotable fodder for online pundits and comedians, Emanuel makes an excellent point: often a crisis presents an opportunity to try new ideas or embark on new initiatives that we wouldn&amp;#8217;t have time for, [...]</itunes:summary><itunes:keywords>monitoring,performance,network,availability</itunes:keywords><feedburner:origLink>http://blog.heroix.com/index.php/2009/08/13/never-waste-a-good-crisis/</feedburner:origLink></item>
		<item>
		<title>Social Media Monitoring: What is your IT Plan?</title>
		<link>http://feedproxy.google.com/~r/heroixmonitoring/~3/7UOPs3svOpA/</link>
		<comments>http://blog.heroix.com/index.php/2009/08/05/social-media-monitoring-what-is-your-it-plan/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 10:53:21 +0000</pubDate>
		<dc:creator>Dave Atkins</dc:creator>
		
		<category><![CDATA[Security]]></category>

		<category><![CDATA[Teamwork]]></category>

		<category><![CDATA[Tech Life]]></category>

		<category><![CDATA[TechOps]]></category>

		<guid isPermaLink="false">http://blog.heroix.com/?p=537</guid>
		<description><![CDATA[The Marine Corps is banning social media. In the corporate world, there is widespread confusion or at least lack of consensus on the value/detriment of employees using social media sites like Facebook and Twitter (or even non work-related Internet use in general.) Companies should develop a policy to guide their decisions and avoid the time-wasting [...]]]></description>
			<content:encoded><![CDATA[<p>The Marine Corps is <a href="http://www.informationweek.com/blog/main/archives/2009/08/marines_ban_soc.html" target="_blank">banning social media</a>. In the corporate world, there is widespread confusion or at least lack of consensus on the value/detriment of employees using social media sites like Facebook and Twitter (or even <a href="http://www.wisegeek.com/how-do-employers-monitor-internet-usage-at-work.htm" target="_blank">non work-related Internet use </a>in general.) Companies should <a href="http://searchcompliance.techtarget.com/tip/0,289483,sid195_gci1362874,00.html" target="_blank">develop a policy</a> to guide their decisions and avoid the time-wasting conversations that will happen when management makes arbitrary decisions and demands that IT enforce them.</p>
<p>I&#8217;m curious what the experience of IT professionals is with respect to monitoring, blocking, and otherwise regulating employee access to social media. It is fairly trivial to block certain websites or applications at the network level, but such actions will lead employees to<a href="http://www.zdnet.com.au/insight/software/soa/Five-tips-for-stealthy-Facebooking/0,139023769,339280292,00.htm"> find a way around</a>&#8230;</p>
<p>You can certainly use <a href="http://www.heroix.com/agentless/network_monitoring_software.htm">network monitoring software like Heroix Longitude</a> to evaluate whether there are bandwidth issues from employee usage. But usually the problems are not just about bandwidth&#8211;there is the &#8220;lost time&#8221; argument which presumes workers who deviate from their task list are &#8220;stealing time&#8221; from the company and there is the &#8220;idiot factor&#8221;&#8211;the employee who updates his or her Facebook status with confidential or proprietary company information. These are not really IT issues&#8230;but ultimately, where culture, company norms, and common sense fail to achieve the desired results, management will come knocking on the door to IT for answers to questions and solutions to problems.</p>
<p>What&#8217;s your experience? Has it changed recently? Do you block it all or have you been asked to? Have you performed surveillance on employees to see how much time they were spending on Facebook? How do you know they don&#8217;t just have it open in a window all day vs using it constantly? Almost 2 years ago, I friended my CEO on Facebook and at the time, that was kind of radical and risky. Now, the issue is more that you don&#8217;t know what to do when your CEO starts following you and you don&#8217;t want it! Or maybe I just live in an early adopter bubble. <img src='http://blog.heroix.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Social+Media+Monitoring%3A+What+is+your+IT+Plan%3F+http://bit.ly/XHdHB" title="Post to Twitter"><img class="nothumb" src="http://blog.heroix.com/wp-content/plugins/tweet-this/icons/tt-twitter-micro3.png" alt="[Post to Twitter]" border="0" /></a>&nbsp; </p><img src="http://feeds.feedburner.com/~r/heroixmonitoring/~4/7UOPs3svOpA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.heroix.com/index.php/2009/08/05/social-media-monitoring-what-is-your-it-plan/feed/</wfw:commentRss>
		<feedburner:origLink>http://blog.heroix.com/index.php/2009/08/05/social-media-monitoring-what-is-your-it-plan/</feedburner:origLink></item>
	<media:rating>nonadult</media:rating><media:description type="plain">Find it. Fix it. Forget it.</media:description></channel>
</rss>

