<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel>
        <title>Splunk &gt; Developer Blogs</title>
        <link>http://www.splunk.com/</link>
        <description>The IT Search Engine</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:ashley@splunk.com" />

       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/splunkdev" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Anomalies: How to find what you’re looking for, without looking for it</title><link>http://feedproxy.google.com/~r/splunkdev/~3/sOTVpWc15sA/</link><comments>http://blogs.splunk.com/david/2009/05/25/anomalies-how-to-find-what-youre-looking-for-without-looking-for-it/</comments><pubDate>Mon, 25 May 2009 23:14:54 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Very often you want to find &#8220;problems&#8221; in your IT data, but you don&#8217;t know what to look for.  How can you find these problems with Splunk?
In Splunk&#8217;s new search language, there are several search operators that can help you.  I&#8217;ll describe only a subset of what is possible.

 1) You can search [...]]]></description><content:encoded><![CDATA[<p>Very often you want to find "problems" in your IT data, but you don't know what to look for.  How can you find these problems with Splunk?</p>
<p>In Splunk's new search language, there are several search operators that can help you.  I'll describe only a subset of what is possible.</p>
<ul>
<li> 1) You can search for unexpected events by looking at those that do not cluster into large groups.  For example, you can cluster the errors in the last hour and report on the events the belong in the smallest clusters (e.g., 'error | cluster showcount=true | sort - cluster_count | head 5&amp;#8242;).</li>
<li> 2) You can find unexpected events by finding values that are far from the standard deviation.  For example, you can search for sendmail events with anomalous 'delay' values (e.g., 'sourcetype=sendmail_syslog | anomalousvalue delay action=filter pthreshold=0.02&amp;#8242;).</li>
<li> 3) You can use machine learning to find events that have unexpected values based on the past historical context (e.g., '* | anomalies blacklist=boringevents').</li>
<li> 4) It's a little bit of a hand-wave  -  but you can do really cool graphical reports that often make anomalies visibly obvious. For example, you could create a timechart of average cpu_seconds by host, and visibly see problems (e.g., 'sourcetype=top | timechart avg(cpu_seconds) by host').</li>
<li> 5) Finally, Splunk is expandable  -  you can define your own search operators.  If you know how to find events interesting to you, you can write a simple script and trivially integrate it with the power of a search platform that deals for billions of events in seconds. Since Splunk uses a scalable map-reduce framework, your script will run in the map-reduce framework and scale automatically.</li>
</ul>
<p>Once you have searches that find unexpected events, you can set alerts for them.  You can also combine events together into 'transactions', and look for anomalies in groups of events.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/sOTVpWc15sA" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2009/05/25/anomalies-how-to-find-what-youre-looking-for-without-looking-for-it/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Amrit Bath: Reloading the auth system via CLI</title><link>http://feedproxy.google.com/~r/splunkdev/~3/sEcF17_OnkE/</link><comments>http://blogs.splunk.com/amrit/2008/11/26/reloading-the-auth-system-via-cli/</comments><pubDate>Wed, 26 Nov 2008 19:26:20 +0000</pubDate><dc:creator>Amrit Bath</dc:creator><description><![CDATA[Note: Tina pointed out that this does not apply to the authorize.conf file.  This will be fixed in an upcoming version of splunk.
This comes up every once in a while on the support channel (EFnet/#splunk), so I guess that means I should do a blog post on it.
If you&#8217;re making changes to the authentication.conf [...]]]></description><content:encoded><![CDATA[<p><strong>Note:</strong> Tina pointed out that this does not apply to the authorize.conf file.  This will be fixed in an upcoming version of splunk.</p>
<p>This comes up every once in a while on the support channel (EFnet/#splunk), so I guess that means I should do a blog post on it.</p>
<p>If you're making changes to the authentication.conf file and want to reload Splunk's auth system without going through the web UI, you can use one of our internal functions to do it at the command line:</p>
<p>  $ splunk _internal rpc-auth '&amp;lt;call name="syncAuth"&amp;gt;&amp;lt;params/&amp;gt;&amp;lt;/call&amp;gt;'</p>
<p>This fires off the same call that the UI would use to reload the auth system, so it functions identically.  Note that this is an authenticated call, so you'll need to use one of the standard authentication methods (-auth, splunk login, or the SPLUNK_USERNAME/SPLUNK_PASSWORD env vars...).</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/sEcF17_OnkE" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/amrit/2008/11/26/reloading-the-auth-system-via-cli/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Syslog, Syslog-ng, and Splunk Forwarders</title><link>http://feedproxy.google.com/~r/splunkdev/~3/MfSrwI5Rx_k/</link><comments>http://blogs.splunk.com/mark/2008/11/13/syslog-syslog-ng-and-splunk-forwarders/</comments><pubDate>Thu, 13 Nov 2008 22:26:03 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[I often get asked, which is better for Log Management; Syslog, Syslog-ng or Splunk Forwarders&#8230;
The answer is nearly always the same. &#8220;What are you currently running in your infrastructure? Do you have a log archive? What are you comfortable configuring?&#8221;
Most, if not all systems come with syslog built in. Setting Splunk up to handle syslog [...]]]></description><content:encoded><![CDATA[<p>I often get asked, which is better for Log Management; Syslog, Syslog-ng or Splunk Forwarders...</p>
<p>The answer is nearly always the same. "What are you currently running in your infrastructure? Do you have a log archive? What are you comfortable configuring?"</p>
<p>Most, if not all systems come with syslog built in. Setting Splunk up to handle syslog inputs is trivial. If you only deal with single line events then syslog is fine. You would just configure Splunk to use the Monitor input and point it to the target directory that you are storing your syslog log files in. Often this is /var/log or /var/adm depending on a Linux or Solaris installation.</p>
<p>If you have a medium scale deployment where you have lots of servers, you can configure syslog to listen to remote syslog hosts. Run Splunk on your receiver and you're done.</p>
<p>As an example, lets say we have a Linux deployment.</p>
<ul>
<li>Step one, configure syslog to "listen" to incoming messages. On most systems these days the syslog flags are configured in the /etc/sysconfig/syslog file. Append -r to the SYSLOGD_OPTIONS="-m 0 -r"</li>
<li>On the sender hosts append to the end of the file "*.*                          @LOGHOST"</li>
<li>Add an entry to your /etc/hosts file for the IP address of "LOGHOST"</li>
</ul>
<p>Assuming your receiver has the /var/log directory set up create an inputs.conf in your $SPLUNK_HOME/etc/system/local/ directory with the following stanza.</p>
<p><code>[monitor:///var/log]<br />
sourcetype = syslog<br />
disabled = false<br />
host = host_name<br />
</code></p>
<p>I like to recommend syslog-ng for both large scale deployments, and deployments where there is significant traffic. Syslog-ng allows you to use TCP rather than UDP to send your log messages. As we all know, UDP is lossy.. If you have too many messages for the network, interface, or host you are running syslog on you will drop data. Also, syslog-ng allows you to pre-filter messages upon their arrival into "buckets" to give you better control over your logs. Splunk can still be easily configured to monitor the target path and easily handle the naming of incoming systems, events, and dates.</p>
<p>To configure your Splunk host to properly get the hostname on a log archive with syslog-ng, you would have to make sure syslog-ng is creating the hostname in the path. For example, /var/log/archive/hosts/hostname/.../</p>
<p>The Splunk monitor stanza would look like this:<br />
<code><br />
[monitor:///var/log/archive/hosts]<br />
host_segment = 5<br />
sourcetype = syslog</code></p>
<p>Where does Splunk Forwarders come into play here? (I knew you would ask)</p>
<p>Splunk forwards multi-line log events. This makes troubleshooting java apps, php apps, practically anything that uses this format, trivial. Typically I recommend using a mixture of inputs. I like using syslog/syslog-ng for collecting the log data to a central repository. This guarantees that you will always have the original data around. I then recommend configuring a Splunk instance to monitor the target directory of the syslog messages as well as pointing Splunk at the directories that contain the multi-line events. Best of both worlds.</p>
<p>What are the drawbacks of Forwarders? Just like conifiguring Splunk as a syslog receiver, if your splunk instance is down, you get no data.</p>
<p>So, often the best solution is to run Splunk Forwarders on those hosts that have multiline logs and use syslog/syslog-ng on your central server. Collect syslog with syslog-ng and collect app logs with Splunk. Best of both worlds.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/MfSrwI5Rx_k" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2008/11/13/syslog-syslog-ng-and-splunk-forwarders/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: inputcsv to restrict a search by a list of field values</title><link>http://feedproxy.google.com/~r/splunkdev/~3/IFk3FfkOB7A/</link><comments>http://blogs.splunk.com/andrea/2008/10/24/inputcsv-to-restrict-a-search-by-a-list-of-field-values/</comments><pubDate>Fri, 24 Oct 2008 16:52:27 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It&#8217;s documented as an internal search command here: 
http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv
We are talking about promoting it to public, so while it says unsupported it does work. Here&#8217;s how:
I&#8217;ve [...]]]></description><content:encoded><![CDATA[<p>A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It's documented as an internal search command here: </p>
<p>http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv</p>
<p>We are talking about promoting it to public, so while it says unsupported it does work. Here's how:</p>
<p>I've got events from my webserver for my new domain and I want to see what real hits it's getting and not my own. They look like this: </p>
<p><code><br />
66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] "GET /category/admin/ HTTP/1.1&amp;#8243; 200 5158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"<br />
</code></p>
<p>And I've gotten some traffic already: </p>
<p><code><br />
$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'<br />
count<br />
-----<br />
11424<br />
</code></p>
<p>It's a standard format that was automatically recognized as sourcetype access_common, so the extracted field "clientip" is already there. I create a csv file containing the values I want to exclude like this: </p>
<p><code><br />
clientip<br />
xxx.xxx.xxx.xxx<br />
yyy.yyy.yyy.yyy<br />
zzz.zzz.zzz.zzz<br />
</code></p>
<p>This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I'll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok. </p>
<p>Now I can do this search: </p>
<p><code><br />
./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]'<br />
</code></p>
<p><code><br />
$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count'<br />
count<br />
 -  - <br />
121<br />
</code></p>
<p>and only get the ones that aren't from my network. This search also works from the UI as </p>
<p><code><br />
source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]<br />
</code></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/IFk3FfkOB7A" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/10/24/inputcsv-to-restrict-a-search-by-a-list-of-field-values/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: Enabling debug messages</title><link>http://feedproxy.google.com/~r/splunkdev/~3/dDQ89hWzBzk/</link><comments>http://blogs.splunk.com/andrea/2008/09/22/enabling-debug-messages/</comments><pubDate>Mon, 22 Sep 2008 23:30:18 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[Splunk spits out an astounding number of its own internal log messages, some I&#8217;ve already described. This post is how to get more of them, in case you have spare disk space lying around and need something to fill it with. Or you have some problem with Splunk and need debug logs. Sometimes Support will [...]]]></description><content:encoded><![CDATA[<p>Splunk spits out an astounding number of its own internal log messages, some I've already described. This post is how to get more of them, in case you have spare disk space lying around and need something to fill it with. Or you have some problem with Splunk and need debug logs. Sometimes Support will ask for this to diagnose an issue. </p>
<p>splunkd log messages go in the file splunkd.log. (Note that if you move the existing file out of the way, a fresh one is created on startup if you want to work with only the messages from the current run.) They are controlled by the log.cfg file located in /opt/splunk/etc, which specifies the log level of messages by category:<br />
<code><br />
rootCategory=WARN,A1<br />
category.LicenseManager=INFO<br />
category.TcpOutputProc=INFO<br />
category.TcpInputProc=INFO<br />
category.UDPInputProcessor=INFO<br />
</code></p>
<p>Messages can be set to, in order of severity: DEBUG, INFO, WARN, FATAL, CRIT. Setting a log level gets you messages at that level and higher, so default settings are typically INFO or WARN. When you change something in this file, you need to restart Splunk for it to take effect. When you restart with the  - debug flag, it uses a similar file, log-debug.cfg, with a different set of settings for DEBUG messages. Not everything is set to DEBUG, because some of the categories are very chatty. </p>
<p>One of those is FileInputTracker, which even in log-debug.cfg is set to WARN. If you are having problems with data input from files, either indexing multiple times or not indexing at all, set this to DEBUG to get more about what is going on. </p>
<p>Now there is another way to enable and disable messages other than changing the file and restarting. If you want to permanently change settings, or you need to test a script that manages starting and stopping Splunk, you'll want to use these files. But you can also turn loglevels for categories off and on with a specially constructed search:<br />
<code><br />
| oldsearch !++cmd++::logchange !++param1++::root !++param2++::DEBUG<br />
</code></p>
<p>This is the seach used for 3.3.x, for 3.2 and before remove the "| oldsearch" part. Yes, that is really the pipe, or vertical bar, character there. (And you will get the message "Search Execute failed because Setting root priority" when the search completes.) You can change any category to any loglevel with this, using the category name for the param1 value and the loglevel for param2. "root" is a special keyword for all messages, otherwise use the correct category name like "LicenseManager". log.cfg is not changed, and on restart you will revert to the configured settings. </p>
<p>One clever thing you can do with this is set up a scheduled saved search to turn on debugging only when you want it. If you have some problem that you know happens around midnight, you can set up one search to turn it on (set it to DEBUG) and off (return it back to WARN or INFO or whatever.) </p>
<p>splunkweb messages are controlled by a different mechanism, the SplunkWeb.tac file. If your problem is specifically with splunkweb, such as debugging LDAP settings in the UI, turn on these additional messages. You do need to restart splunkweb, but this can be done with "splunk restart splunkweb" rather than restarting splunkd along with it on a normal restart. </p>
<p>Change this line:<br />
<code><br />
# set global logging level<br />
appLoggingLevel = logging.INFO<br />
</code></p>
<p>To this:<br />
<code><br />
# set global logging level<br />
appLoggingLevel = logging.DEBUG<br />
</code></p>
<p>The additional messages are output in $SPLUNK_HOME/var/log/splunk/web_service.log file.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/dDQ89hWzBzk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/09/22/enabling-debug-messages/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: 3D Photosynth of New Splunk Office</title><link>http://feedproxy.google.com/~r/splunkdev/~3/JKcD6FrKS8o/</link><comments>http://blogs.splunk.com/david/2008/09/09/3d-photosynth-of-new-splunk-office/</comments><pubDate>Wed, 10 Sep 2008 05:59:24 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[I made a photosynth of the new Splunk office in SF, which automatically linked 104 photos in 3D space.  It mostly worked.  
Hit the &#8220;play&#8221; button, sit back, and have a tour of the Splunk office. Click the button with 3 dots on it to jump to the next 3D space.

]]></description><content:encoded><![CDATA[<p>I made a photosynth of the new Splunk office in SF, which automatically linked 104 photos in 3D space.  It mostly worked.  </p>
<p>Hit the "play" button, sit back, and have a tour of the Splunk office. Click the button with 3 dots on it to jump to the next 3D space.</p>
<p><iframe frameborder=0 src="http://photosynth.net/embed.aspx?cid=72853bfd-4868-45ff-b1af-3c5b5ff5452d" width="400" height="300"></iframe></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/JKcD6FrKS8o" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2008/09/09/3d-photosynth-of-new-splunk-office/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: Index ICU: Assertion `_sourceMetaData != __null’ failed, part 1</title><link>http://feedproxy.google.com/~r/splunkdev/~3/6sgk6OJX9W4/</link><comments>http://blogs.splunk.com/andrea/2008/09/03/index-icu-assertion-_sourcemetadata-__null-failed-part-1/</comments><pubDate>Wed, 03 Sep 2008 16:59:47 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[There you were, merrily going along and Boom! Somebody kicks the power switch, your filesystem goes off the deep end, something Very Bad happens. You start to understand why fsck is a four-letter word. After using some additional four-words, you get things up and running. But what&#8217;s with Splunk? It won&#8217;t start!? You only get [...]]]></description><content:encoded><![CDATA[<p>There you were, merrily going along and Boom! Somebody kicks the power switch, your filesystem goes off the deep end, something Very Bad happens. You start to understand why fsck is a four-letter word. After using some additional four-words, you get things up and running. But what's with Splunk? It won't start!? You only get some cryptic error and "Splunkd appears too be down." Welcome to the world of WordData. You had a backup, right? Yeah, thought so.</p>
<p>Buried deep in the index are a bunch of *.data files: </p>
<p><code>www.feorlen.org[feorlen]:/Applications/splunk/var/lib/splunk/defaultdb/db$ ls -lr *.data<br />
-rw-r - r -   1 root  admin  10276 Sep  3 07:41 Sources.data<br />
-rw-r - r -   1 root  admin   5085 Sep  3 07:41 SourceTypes.data<br />
-rw-r - r -   1 root  admin    252 Sep  3 07:41 Hosts.data<br />
-rw-r - r -   1 root  admin     21 Jul 26 19:19 EventTypes.data</code></p>
<p>You will find them in every bucket, they contain event counts for sources, sources, hosts and event types along with some timerange info. During indexing, these are constantly being updated. They are supposed to look something like this (note my timestamping oops there for host::grumpy):</p>
<p><code>$ more Hosts.data<br />
0               0       2147483647      0       0<br />
1       host::grumpy    11194556        900458000       1231448496      1220453014<br />
2       host::www       1953184 1194131619      1220452994      1220452994<br />
3       host::www.feorlen.org   2350    1207761050      1216665145      1216665145<br />
4       host::localhost 7482    1203904810      1217973661      1217973661   </code>   </p>
<p>Except when they look like this: </p>
<p><code>$ more Hosts.data<br />
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<br />
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<br />
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<br />
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<br />
^@^@^@^@^@^@^@^@^@^@^@<br />
Hosts.data (END) </code></p>
<p>That isn't very good. splunkd doesn't much like it when somebody messes with it's *.data files. There are also supposed to be at minimum Sources.data, SourceTypes.data, and Hosts.data. (EventTypes.data may legitimately not be there in some cases.) Your crash log will likely contain something like this:</p>
<p><code> Backtrace:<br />
  [0x00002B51C8EEFB6E] abort + 270 (/lib/libc.so.6)<br />
  [0x00002B51C8EE8266] __assert_fail + 246 (/lib/libc.so.6)<br />
  [0x000000000066661D] ? (splunkd)<br />
  [0x0000000000697BA6] _ZN23DatabasePartitionPolicy20getSourceWordForCodeEmmR3Str + 182 (splunkd)</code></p>
<p>and here is the real smoking gun in splunkd_stderr.log:</p>
<p><code>splunkd: /opt/splunk/p4/splunk/branches/3.2/src/pipeline/indexer/TimeInvertedIndex.cpp:974: void TimeInvertedIndex::getSourceWordForCode(long unsigned int, Str&amp;#038;): Assertion `_sourceMetaData != __null' failed.</code></p>
<p>Ok, so you've got a horked *.data file. Where? Well, based on frequency of writes, it's going to be in a db-hot directory because that is where active indexing is going on. And the most active indexes are usually fishbucket, _internal and defaultdb. Start by looking for *.data files that are binary. Here's one way you can find which files are binary, a big clue on where the problem is:</p>
<p><code>$ cd /opt/splunk/var/lib/splunk<br />
$ find . -name *.data | xargs grep "." % | grep Binary<br />
grep: %: No such file or directory<br />
Binary file ./_internaldb/db/db-hot/Hosts.data matches<br />
Binary file ./_internaldb/db/db-hot/Sources.data matches<br />
Binary file ./_internaldb/db/db-hot/SourceTypes.data matches<br />
Binary file ./fishbucket/db/db-hot/Sources.data matches</code></p>
<p>file will do it also, but beware false positives: </p>
<p><code>$ for i in `find . -name *.data`; do file $i | grep -v text ;done<br />
./_internaldb/db/db-hot/Hosts.data: data<br />
./_internaldb/db/db-hot/Sources.data: data<br />
./_internaldb/db/db-hot/SourceTypes.data: data<br />
./defaultdb/db/db_1214955936_1210836930_38/Hosts.data: Bio-Rad .PIC Image File 2352 x 12297, 14601 images in file</code></p>
<p>Another check is to see if the line numbers in the file are in ascending order. If they aren't, then something is seriously wrong: </p>
<p><code>for i in `find . -name *.data`; do sort -nc $i;done</code></p>
<p>Have a look at these files and see what's in them. If they are only partially corrupted, you may be able to edit out the garbage. If they are totally full of junk, you will need to find replacements. For _internaldb and fishbucket, you may not care if your event counts are exactly correct so you can lift some files from another bucket. If the problem were in defaultdb or another index containing your real indexed data, you'll need to pay more attention to the contents. </p>
<p>In the simple case, if the files in db-hot are trashed, see if there is a warm bucket next to it you can copy some from. Warm buckets are in the same directory as db-hot and look something like db_1218802821_1218658318_17. Copy the *.data files from there into db-hot and try to restart Splunk. If it does, then you are good to go. If not, that means there is more damage to repair. If there are other binary *.data files, make sure you deal with all of them. </p>
<p>This should handle the most common types of problems. I'll go into more detailed debugging and reconstruction in another post. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/6sgk6OJX9W4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/09/03/index-icu-assertion-_sourcemetadata-__null-failed-part-1/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Karandeep Bains: first!</title><link>http://feedproxy.google.com/~r/splunkdev/~3/uHmJj6qCDPU/</link><comments>http://blogs.splunk.com/deep/?p=1</comments><pubDate>Mon, 01 Sep 2008 03:10:44 +0000</pubDate><dc:creator>Karandeep Bains</dc:creator><description><![CDATA[hello world!
]]></description><content:encoded><![CDATA[<p>hello world!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/uHmJj6qCDPU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/deep/?p=1</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Write your own search language</title><link>http://feedproxy.google.com/~r/splunkdev/~3/46Foq7-XDjY/</link><comments>http://blogs.splunk.com/david/2008/08/29/write-your-own-search-language/</comments><pubDate>Fri, 29 Aug 2008 23:16:26 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Splunk provides many power search commands &#8212; such as sort, fields, transactions &#8212; but even better, it allows you to expand things anyway you want, by writing your own search commands. 
I&#8217;ll show you how to write your own search command.

Suppose you want to make a new “shape” command in python that returns the shape [...]]]></description><content:encoded><![CDATA[<p><strong>Splunk provides many power search commands  -  such as sort, fields, transactions  -  but even better, it allows you to expand things anyway you want, by writing your own search commands. </strong></p>
<p><strong>I'll show you how to write your own search command.</strong><br />
<span id="more-291"></span></p>
<p style="text-align: left;">Suppose you want to make a new “shape” command in python that returns the shape of an event  -  tall, short, thin, wide, etc.  There are just three simple steps:</p>
<ul>
<li>Step 1) Tell splunk about this external command in  commands.conf...</li>
</ul>
<pre style="text-align: left; padding-left: 90px;">[shape]
filename = shape.py</pre>
<ul>
<li>Step 2) Authorize users to run this command in authorize.conf...</li>
</ul>
<pre style="padding-left: 90px;">[capability::run_script_shape]
[role_User]
run_script_shape = enabled</pre>
<ul>
<li>Step 3) Write the code!  Here is shape.py...</li>
</ul>
<pre style="padding-left: 60px;">   import splunk.Intersplunk 

   def getShape(text):
        description = []
        linecount = text.count("\n") + 1
        if linecount &amp;gt; 10:
            description.append("tall")
        elif linecount &amp;gt; 1:
            description.append("short")
        avglinelen = len(text) / linecount
        if avglinelen &amp;gt; 500:
            description.append("very_wide")
        elif avglinelen &amp;gt; 200:
            description.append("wide")
        elif avglinelen &amp;lt; 80:
            description.append("thin")
        if text.find("\n ") &amp;gt;= 0 or text.find("\n\t") &amp;gt;= 0:
            description.append("indented")
        if len(description) == 0:
            return "normal"
        return "_".join(description)            

   # get the previous search results
   results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()
   # for each results, add a 'shape' attribute, calculated from the raw event text
   for result in results:
        result["shape"] = getShape(result["_raw"])
   # output results
   splunk.Intersplunk.outputResults(results)</pre>
<p>It works!  Show me the top shapes among events with more than one line...</p>
<pre style="padding-left: 30px;">$ splunk search "linecount&amp;gt;1 | shape | top shape"
shape                count  percent
-------------------  -----  ---------
tall_indented           43  43.000000
short_indented          29  29.000000
tall_thin_indented      15  15.000000
short_thin_indented     10  10.000000
short_thin               3   3.000000</pre>
<p>Just to review, here are the files we made...</p>
<ul>
<pre>apps/example/bin/shape.py
apps/example/default/authorize.conf
apps/example/default/commands.conf</pre>
</ul>
<p>Now go out there and make cool extensions to Splunk!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/46Foq7-XDjY" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2008/08/29/write-your-own-search-language/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: More fishbucket fun</title><link>http://feedproxy.google.com/~r/splunkdev/~3/DuvhpOYZq8A/</link><comments>http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/</comments><pubDate>Wed, 27 Aug 2008 21:10:57 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. [...]]]></description><content:encoded><![CDATA[<p>For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don't want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn't want to do to a real one. </p>
<p><em><strong>REALLY BIG WARNING</strong></em></p>
<p><em>Don't do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don't confuse anybody.</em> </p>
<p>Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don't conflict with anything else that may be running on the machine. Since it won't be indexing, the license doesn't matter. Start and then stop so the first run stuff is done.</p>
<p>Change some things so it won't touch the index:<br />
./splunk clean all -f<br />
rm /opt/splunk/bin/splunk_optimize<br />
rm /opt/splunk/etc/system/default/inputs.conf (or wherever it is in your version)<br />
edit /opt/splunk/etc/system/default/indexes.conf to comment out the line frozenTimePeriodInSecs = 2419200 in [_thefishbucket] stanza<br />
If it's large, you'll want to also comment out maxDataSize = 10</p>
<p>rm -rf /opt/splunk/var/lib/splunk/fishbucket/*<br />
copy the contents of the fishbucket index you have into the now empty directory (don't accidentally create an extra fishbucket/fishbucket directory!)<br />
remove any archives or other temporary files you left lying around in the index directories</p>
<p>Start this instance and now you can search for index=_thefishbucket. It helps to exclude the Splunk internal files with something like this:</p>
<p>index=_thefishbucket NOT filename::/opt/splunk/var/log/splunk/license_audit.log NOT filename::/opt/splunk/var/log/splunk/metrics.log NOT filename::/opt/splunk/var/log/splunk/searchhistory.log NOT filename::/opt/splunk/var/log/splunk/splunkd.log NOT filename::/opt/splunk/var/log/splunk/splunklogger.log NOT filename::/opt/splunk/var/log/splunk/web_access.log NOT filename::/opt/splunk/var/log/splunk/web_service.log</p>
<p>Your full path may vary. What is left is all the files being monitored by the instance. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/DuvhpOYZq8A" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Inder Sabharwal: We are hiring ActionScript/Flex engineers….</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Tfd8EbFcrP4/</link><comments>http://blogs.splunk.com/inder/2008/08/21/we-are-hiring-actionscriptflex-engineers/</comments><pubDate>Thu, 21 Aug 2008 15:48:44 +0000</pubDate><dc:creator>Inder Sabharwal</dc:creator><description><![CDATA[Splunk is hiring ActionScript/Flex engineers to build new products for the Enterprise team. If you have been building Enterprise and/or Web applications using AS/Flex, we would love to talk to you.
Also, if you are a UI engineer using Java or .NET or AJAX (jQuery, ExtJS, etc..) technologies, and are motivated to move to ActionScript/Flex, we [...]]]></description><content:encoded><![CDATA[<p>Splunk is hiring ActionScript/Flex engineers to build new products for the Enterprise team. If you have been building Enterprise and/or Web applications using AS/Flex, we would love to talk to you.</p>
<p>Also, if you are a UI engineer using Java or .NET or AJAX (jQuery, ExtJS, etc..) technologies, and are motivated to move to ActionScript/Flex, we will provide you with the tools and mentoring to be successful in this position.</p>
<p>Experience in building network topology visualizations is a big plus!</p>
<p>All resumes can be emailed to <a href="mailto:inder@splunk.com">me</a> directly.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Tfd8EbFcrP4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/inder/2008/08/21/we-are-hiring-actionscriptflex-engineers/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: What is this fishbucket thing?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/FguVzHCTnc0/</link><comments>http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/</comments><pubDate>Thu, 14 Aug 2008 22:50:44 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[It&#8217;s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs [...]]]></description><content:encoded><![CDATA[<p>It's time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what's there, try searching for "index=_thefishbucket". Events look something like this: </p>
<p>48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log</p>
<p>The fields are:</p>
<p>timestamp (epoch time, in hex)<br />
CRC of the first 256 bytes of the file<br />
CRC of the 256 bytes where we were last reading<br />
seek pointer for where we are in the file<br />
the time the file last changed<br />
the full path to the file.<br />
the full path to the source, which is usually the same as the file but could be the archive the file came from.</p>
<p>When the file monitor processor looks at a file, it searches the fishbucket to see if the CRC from the beginning of the file is already there. If not, the file is indexed as new, If yes, then we check the CRC of where we were reading against the saved value in seekcrc. If it matches and the file is longer than the saved seek pointer, then there is new  stuff at the end to read. If the top of the file matches but the seekcrc doesn't, or the seek pointer is beyond the current end of the file, then something in the part we have already read has changed. Since we don't know what might have changed, we just index the whole thing. (You can control this: see CHECK_METHOD in props.conf.spec.) </p>
<p>If you want to track what is happening with a particular file, you can search for all the events in the fishbucket associated with it by the file or source name (like source::/var/log/apache2/feorlen_org_access_log.) If you check the seekptr and the modtime, they will only be increasing with time (note that events are returned most recent first, so this list is newest to oldest.) </p>
<p>48a3084d initcrc::5f66db978a1ff3a3 seekcrc::3e746e9f66897965 seekptr::414a40 modtime::1218644042 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a307d9 initcrc::5f66db978a1ff3a3 seekcrc::77f6d8313fc689ba seekptr::41419b modtime::1218643929 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a3062e initcrc::5f66db978a1ff3a3 seekcrc::2cc30b86b37c646 seekptr::4140fc modtime::1218643502 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a300d3 initcrc::5f66db978a1ff3a3 seekcrc::8db2f52ef6f75c91 seekptr::413fa4 modtime::1218642130 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2fc7a initcrc::5f66db978a1ff3a3 seekcrc::881375418e194bd5 seekptr::413f06 modtime::1218640999 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f996 initcrc::5f66db978a1ff3a3 seekcrc::c596371ec4c573d4 seekptr::413e6c modtime::1218640260 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f80c initcrc::5f66db978a1ff3a3 seekcrc::2e686cf0dd2f62bb seekptr::413dce modtime::1218639883 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f25a initcrc::5f66db978a1ff3a3 seekcrc::b2e489862ed72c79 seekptr::413d1d modtime::1218638406 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f1d1 initcrc::5f66db978a1ff3a3 seekcrc::58af0c6446e96bf5 seekptr::413c7f modtime::1218638289 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f19d initcrc::5f66db978a1ff3a3 seekcrc::16fdb83b48965067 seekptr::413bbe modtime::1218638236 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f05b initcrc::5f66db978a1ff3a3 seekcrc::fbb8700a35cfdfcb seekptr::413b25 modtime::1218637915 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2ebc5 initcrc::5f66db978a1ff3a3 seekcrc::ddbac21aa7386a6 seekptr::413abd modtime::1218636714 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log</p>
<p>Anything other than this indicates a big problem with the file, like it is getting re-indexed when it shouldn't. (Some files you do want to re-index when they change, but not normal logfiles that roll.) </p>
<p>So why do I care? </p>
<p>Every Splunk instance has a fishbucket index, except the lightest of hand-tuned lightweight forwarders, and if you index a lot of files it can get quite large. As any other index, you can change the retention policy to control the size via indexes.conf. But since it tracks what files the instance has seen, you have to consider carefully before you change the retention policy. If you retire data from the fishbucket for files that still exist on the host, it will "forget" it saw them and next time around they will get re-indexed. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/FguVzHCTnc0" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Search engine for virtual sprawl - vmware app for splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Y5QpnkQkle8/</link><comments>http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/</comments><pubDate>Sun, 10 Aug 2008 22:57:24 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[****  UPDATE - 10/31/08  ****
Hey all,
I&#8217;ve updated the app to version 1.8.
The only fix in this version is a bug with multiple datacenters.
Version 1.8 should now work for an unlimited number of datacetners.
( Thanks to Stephen for finding and letting me know )
As always feel free to bug me if the app has [...]]]></description><content:encoded><![CDATA[<p><b>****  UPDATE - 10/31/08  ****</b><br />
Hey all,<br />
I've updated the app to <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/10/vmware_18.tgz">version 1.8</a>.<br />
The only fix in this version is a bug with multiple datacenters.<br />
Version 1.8 should now work for an unlimited number of datacetners.<br />
( Thanks to Stephen for finding and letting me know )</p>
<p>As always feel free to bug me if the app has any problems.<br />
e.</p>
<p><b>****  UPDATE - 10/10/08  ****</b></p>
<p>Hey all,<br />
I update the latest release - 1.7 - to fix a shutdown bug.<br />
Turns out that in prior releases when spunk was shut down that the VMWare app kept running.<br />
This release not will terminate the VMWare app when splunkd goes away.</p>
<p>If you would like to test or run without splunk you can pass in the arg.<br />
java -jar splunk.jar  - standalone</p>
<p>** see instructions below on how to run the above command **<br />
As usual, drop me a line if you have any questions.<br />
Good luck with 1.7</p>
<p><b>****  UPDATE - 09/16/08  ****</b></p>
<p>Thanks to more testing i have found and fixed a few critical bugs.<br />
Updated APP version 1.6 <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/09/vmware1.zip">>> here <<</a></p>
<ul>
<li>
there was a static var preventing the multiple server configs from working. Should be fixed, and multiple servers in the vmware.conf should work.
</li>
<li>
Ibm jvm's should work - ie AIX should now work <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />
</li>
<li>
Added new saved searches and a few dashboards ( thanks to raffy <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />
</li>
</ul>
<p>As usual, please let me know if you find any bugs.<br />
I'll type up some notes on my VMworld experince</p>
<p>Cheers,<br />
e</p>
<p><b>****  UPDATE - 09/08/08  ****</b><br />
Thanks to lots of folks trying it out i have found a critical bug that was preventing much of the data from getting indexed. <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/09/vmware.zip">This latest release 1.5</a> should have that fix and everyone should see all the wonderful VMWare data in the index.</p>
<p>As usual, bug me if it does not work or you have any questions.</p>
<p>If you have made changes to vmware/local/vmware.conf  and not to the file in default you can just untar this version on top of your old one. If you are making changes to the default/vmware.conf file, i'd move that to local/vmware.conf that way when i ship updates it will not blow away your conf changes. We ship only default and not local/vmware.conf.</p>
<p>Thanks again to everyone that helped find bugs!</p>
<p>e.</p>
<p><b>****  UPDATE - 08/27/08  ****</b><br />
I have <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/vmware1.zip">updated the app</a> with a few fixes found in the field. </p>
<ul>
<li>hopefully fixed issue on AIX (IBM jvm )
</li>
<li>added output of host/vm name on update messages. It was hard to tell where the messages were coming from
</li>
<li>added more debugging infor on startup to help debug connection issues.
</li>
</ul>
<p>Things that are still under-investigation.</p>
<ul>
<li>Pointing at lots of ESX servers and not VC. Seems as though some data is not coming back from ESX.
</li>
<li>Making work with older jvm's ( currently it seems i require 1.5)
</li>
</ul>
<p><b>****  Original Post 08/10/08  ****</b><br />
I've wanted to release this a few months ago but the project keeps getting stuck on the back-burner.  Finally I've cleaned it up and had a few people try it and it seems to work well. I'm sure there are configurations and versions out there that will have issues - please write me back ( my first name at splunk.com ) if it does not work as advertised. </p>
<p>Reading the below makes it sounds more difficult that it really is. Just download, un-zip, change the server url, username and password in the vmware.conf file, restart and go! This really is the first pubic release and i'd love to get more feedback. I'll more than gladly send you Splunk tee shirt of your choice if you help find bugs or have useful suggestions!</p>
<p><strong>Why you want to give it a try:</strong><br />
This vmware app is a cool way to keep track of what your VC and ESX servers are up to, what instances are running where, when they are under load, when instances move, when they have errors, and much more. Since all the data is indexed in Splunk, it's easy and quick to search for problems and report on your virtual sprawl. </p>
<p><strong>How it works:</strong><br />
This app will connect a splunk server to any number of Virtual Center and/or ESX servers and grab/index the events, logs, properties, performance data, and anything else I can get my grubby mitts on. It's easy to hookup and get going, so if you use Virtual Center or ESX than give this app a try. I'll explain how to install/setup, how to trouble shoot, and what you will see when you get it working. You will need to install splunk or use an existing Splunk server.  See the configuration file for settings on how often to pull data. Also near the end of this post i give example searches to explain the data.</p>
<p>After installing you get cool graphs like this one showing CPU Usage by Guest by Time:</p>
<p><img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/picture-1.png" alt="cool graph" /></p>
<p><strong>Add Inside-out monitoring</strong><br />
Its optional but if you can also put splunk on the guest OS's as light weight forwarders and you will get a brilliant inside out view where we capture not only what VC/ESX thinks but what the guests are seeing on the inside. My best practice is to put splunk on the guests and capture basic logs as well as OS performance metrics, what apps are running, how much mem/cpu they are taking, etc. You can get the <a href="http://www.splunkbase.com/apps/All/Technologies/Systems_Management/Monitoring/app:Splunk+for+UNIX">Unix/Linux version here</a> and the <a href="http://www.splunkbase.com/apps/All/Technologies/Operating_Systems/Windows/app:Splunk+for+Windows+Management">windows here</a>.  Of course its not required and you get a ton of value out of just with the basic vmware app's monitoring of VC/ESX.</p>
<p><strong>INSTALLATION:</strong></p>
<p>**Important**<br />
This app requires a JVM be installed on the same box as the splunk server. I know this is less that optimal. Please bug your local VMWare rep and tell them to make me REST API's and not SOAP API's. The VMware API's are hideously over complicated - Please dear VMware make a simple REST interface.</p>
<p><strong>1)</strong> Make sure java is present and set the JAVAHOME environment variable. If not already set you must be set JAVAHOME to the directory that contains the java binary.</p>
<p><strong>2)</strong> To test the variable is set correctly, try and run the following on the command line<br />
<code>  windows> "%JAVAHOME%\bin\java<br />
    linux/unix> $JAVAHOME/bin/java<br />
</code><br />
If it worked it should spit back a bunch of options to pass to the java command. If its not set right you will get some kind of file not found error.</p>
<p><strong>3)</strong> Grab the vmware.zip file <strong><a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/vmware.zip">HERE</a></strong>.</p>
<p><strong>4)</strong> Unzip the file - and copy the resultant "vmware" directory to your SPLUNK_HOME/etc/apps/ directory. When done the following directory should exist: SPLUNK_HOME/etc/apps/vmware.</p>
<p><strong>CONFIGURATION:</strong><br />
There are a few config settings to make the app work.</p>
<p><strong>5)</strong> First you need to let Splunk know where your VC or ESC servers are. Edit the <code>vmware/default/vmware.conf </code> configuration file to point to your vc or esx servers. If using VC you need not specify all ESX servers under management, splunk will get the list from VC. The config file contains one or more of the following stanza's ( the unique_name can be anything you like so long as its unique):<code><br />
	[vmserver:unique_name]<br />
</code><br />
For each [vmserver] stanza be sure to set:<code><br />
	url=https://your_server_IP/sdk<br />
	username= your_user<br />
	password=your_passowrd<br />
</code></p>
<p>Note that the url should be the ipaddr of your server with "/sdk" at the end - for example "url=https://10.1.1.35/sdk". A good way to test that the url and username/password are correct is test using a web browser. Take the url you have entered above and replace the "sdk" with "mob". Use the web browser to navigate to that url and make sure it asks for username and password and that the values you entered above will authenticate correctly. If the "mob" url works with the username and passowrd you entered than splunk should have no trouble.  </p>
<p>With those three set you should be up and running after a restart.<br />
The rest of the config file should be self explanatory and is included end of this post for reference but you should not need to change anything else.</p>
<p><strong>Testing and Troubleshooting:</strong></p>
<p><strong>6)</strong> It's best to test running the vmware app outside of splunk first.<br />
You'll need to make sure that SPLUNK_HOME is set for the test.</p>
<p>**  On Windows  **:<br />
<code>   set SPLUNK_HOME=your splunk directory </code><br />
#note it does not like it when i add quotes around this path - try with no quotes.</p>
<p>Then run the app by hand<br />
<code>    > cd %SPLUNK_HOME%\etc\apps\vmware<br />
    >  java -jar lib/splunk.jar </code></p>
<p>**  On others  ** :<br />
<code>    export SPLUNK_HOME=your splunk directory </code></p>
<p>Then run the app by hand:<br />
<code>   > cd $SPLUNK_HOME/etc/apps/vmware<br />
    >  java -jar lib/splunk.jar  </code></p>
<p>It should spit out all sorts of vmware data. If it throws an error its likely that SPLUNK_HOME or JAVAHOME are NOT set. Remember SPLUNK_HOME will be set by the server when the server runs the script. You need only set it for testing.</p>
<p>If it does not work, likely the exception will have something useful in it such as connection refused ( bad auth ) or a 404 error in which case the url is incorrect.</p>
<p>If you get any non-obvious errors email me ( my first name at splunk.com ).</p>
<p><strong>7)</strong> Try running in splunk.<br />
If the above test works than you should be able to just restart splunk and all should be good. The way to tell if its working is that you will get events with sourcetype vmware and vmware_api.</p>
<p><strong>8 )</strong> If you do NOT see events of type vmware_api on the dashboard than try the following search:<br />
"index=_internal error"<br />
and<br />
"index=_internal  splunk4vmi.py"</p>
<p>You should see some kind of error or warning that is hopefully obvious. If not again email me and i'll sort you out.</p>
<p><strong>Using the App</strong></p>
<p>At this point it should be working and you should be able to search for cool stuff.<br />
Here is a quick overview of what splunk is indexing:</p>
<p>After restarting you should see a bunch of logs from vwmare and at least two new sourcetypes ; vmware and vmware_api.  Below is a screen shot of my dashboard after restarting - notice the vmware logs and the vmware_api event counts.</p>
<p><img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/picture-6.png" alt="sources" /></p>
<p>The vmware sourcetype is for the actual vwmware logs while the vmware_api sourcetype is for the API calls. It can take a minute before they show up so if they are not there, try again after a minute. If you still do not have the logs that likely means the logs path in the vmware.conf if incorrect and you should make sure the path is correct or contact me. </p>
<p>If you do not see the API calls than there is likely an auth or url error that should have been caught when you did the manual test above. Try retesting by hand above - if the by-hand method works but not through splunk than contact me.</p>
<p>I've just started to explore the logs that come back - there is a ton of information in them but my test infrastructure is not all that insteresting so i'm not sure what goodness you all might find in them. Poke around the files and see what you see and bug me if you see anything interesting i can make them into alerts / reports.</p>
<p>The meat of the data is from the API where we pull everything we can.<br />
Most useful are: </p>
<p><strong>1) Metrics</strong><br />
Every few seconds we captures the metrics for all VM's, including<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/properties.png" alt="metrics" /></p>
<p><strong>2) Events</strong><br />
I'm not sure the scope of these but it looks like interesting events kicked out by ESX. Someone with a larger VMware installation might find far more interesting events than i see on our infrastructure.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/events.png" alt="events" /></p>
<p><strong>3) Updates:</strong><br />
It looks like when anything changes, we can an update.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/updates.png" alt="updates" /></p>
<p><strong>4) Inventory: </strong><br />
I periodically just capture the inventory tree. It's more for debugging than perhaps useful in a production environment but it does not cost much to get and it can be useful.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/inventory.png" alt="inventory" /></p>
<p>Thanks to Christina we do ship with a bunch of saved searches. After installing you should see them, they all start with 'VM:'. They are named to be somewhat obvious, again let me know if they dont work or you have some better ones to add to the default app. Try some of the Metrics and Status saved searches to make sure your install is working.</p>
<ul>
<li>VM: Investigation CPU load on all guests sharing ESX server</li>
<li>VM: Investigation Find ESX Host for Guest</li>
<li>VM: Investigation Find Guests sharing ESX Server - Non FQDN</li>
<li>VM: Investigation- Find other VMs sharing ESX Host</li>
<li>VM: Investigation- Processes on hosts sharing ESX Server</li>
<li>VM: Investigation- Running processes on other guests on same ESX server</li>
<li>VM: Metrics- CPU by Guest last 60 minutes * VM: Metrics- Host Memory Usage last 15 minutes</li>
<li>VM: Metrics- Host Memory Usage last 60 minutes</li>
<li>VM: Metrics- Memory by Guest last 60 minutes</li>
<li>VM: Status- Free Space by Datastore</li>
<li>VM: Status- Running Guests</li>
<li>VM: Status- Running VMs </li>
</ul>
<p>That's about it.<br />
Like i said, PLEASE email me if you have bugs or suggestions.<br />
I'll plan on updating the app with whatever feedback i get from folks. So please, help me out and get yourself a tee shirt.</p>
<p>Kind Regards,<br />
e.</p>
<p>P.S. - there is a sample of the config just so that you can see what's in it without downloading:<br />
 -  -  -  -  -  -  -  -  -  -  -  -  - <br />
The following are the important values in the config file:</p>
<p><code><br />
[vmserver:demo]<br />
url=https://10.2.1.151/sdk      ## This is the url to the vc or esx server<br />
username=your_username     ## user name to auth against the server. If you are not sure of its value point we browser at the above url and check the web auth, it will be the same.<br />
password=your_passowrd            ## we will support non-clear text in the near future.<br />
ignorecert = t              ## for now leave as true (t), we will soon support checking of certs<br />
loggingLevel = error            ## to turn on debugging values are [error, warn, info, debug ]</p>
<p>index_events = t            ## should we index events (t)rue or (f)alse<br />
events_interval = 10            ## how often to check for events in seconds</p>
<p>index_properties = t            ## should we index events (t)rue or (f)alse<br />
property_interval = 10      ## how often to check for events in seconds</p>
<p>index_metrics = t           ## should we index events (t)rue or (f)alse<br />
metrics_interval = 10           ## how often to check for events in seconds</p>
<p>index_updates = t           ## should we index events (t)rue or (f)alse<br />
updates_interval = 10       ## how often to check for updates in seconds</p>
<p>index_logs = t              ## should we index logs (t)rue or (f)alse<br />
logs_interval = 300         ## how often to get log changes...<br />
logs_localpath = ../var/spool/vmware    ## the logs are copied from vc/esx to the this directory where splunk will pick them up for indexing<br />
</code></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Y5QpnkQkle8" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Inder Sabharwal: Deployed bundles not taking effect?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/e7XymDiLW3g/</link><comments>http://blogs.splunk.com/inder/2008/07/28/local-and-deployed-bundles/</comments><pubDate>Mon, 28 Jul 2008 21:25:09 +0000</pubDate><dc:creator>Inder Sabharwal</dc:creator><description><![CDATA[Changes made in /etc/system/local override any configuration bundles that you may be trying to publish to your Splunk instances using a DeploymentServer. 

Serveral customers have reported that DeploymentServer configuration bundles were not taking effect, only to realize after several troubleshooting cycles that there was some configuration in /etc/system/local that was preventing that from happening. Note [...]]]></description><content:encoded><![CDATA[<p>Changes made in <code>/etc/system/local</code> override any configuration bundles that you may be trying to publish to your Splunk instances using a DeploymentServer. </p>
<p>
Serveral customers have reported that DeploymentServer configuration bundles were not taking effect, only to realize after several troubleshooting cycles that there was some configuration in <code>/etc/system/local</code> that was preventing that from happening. Note that any configuration in <code>/etc/system/local</code> will always take precedence over any other configuration in the system - even deployed bundles.</p>
<p>
So, if you are stuck in this position, please make sure to check your <code>/etc/system/local</code> before hitting the panic button!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/e7XymDiLW3g" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/inder/2008/07/28/local-and-deployed-bundles/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Matt Green: Help Me Help You: Opening a good ticket with support</title><link>http://feedproxy.google.com/~r/splunkdev/~3/0NMdPIaY0b8/</link><comments>http://blogs.splunk.com/matt/2008/07/28/help-me-help-you-opening-a-good-ticket-with-support/</comments><pubDate>Mon, 28 Jul 2008 21:15:33 +0000</pubDate><dc:creator>Matt Green</dc:creator><description><![CDATA[Salutation drivers of the Information Super Highway,
I&#8217;ve got another post here in the occasional &#8220;Help Me Help You&#8221; series, this time I&#8217;m going to digging into case writing.
I was talking with the some of the engineers the other day around the bar about an issue that one of our field guys opened.  One of [...]]]></description><content:encoded><![CDATA[<p>Salutation drivers of the Information Super Highway,</p>
<p>I've got another post here in the occasional "Help Me Help You" series, this time I'm going to digging into case writing.</p>
<p>I was talking with the some of the engineers the other day around the bar about an issue that one of our field guys opened.  One of the engineers mentioned a piece of information that totally changed the way the rest of us were going to handle the issue.  This got us to talking about how some people write great cases and others don't.  The ones who write good cases usually get their issues resolved first (often times closing the issue with the first response from a member of my team), the ones who write "bad" cases generally have a back and forth exchange.</p>
<p>That got me thinking that maybe I should take a sec to talk about what makes a good case.   I'm going to try mapping out a basic template for submitting an issue.  This is by no means limited to Splunk and is most definitely not a de facto standard.  Rather it is a compilation of things that always make my life easier when my customers can provide them.</p>
<ul>
<li><strong>Backstory</strong>: Like I mentioned in my previous <a href="http://blogs.splunk.com/matt/2008/04/30/help-me-help-you/">post</a> I don't work in the cube next to you, I don't see the same things you see, know the same things that you know.<br />
Often times I get cases with a description like "I came into work this morning and discovered that this thingy that was working yesterday isn't working today.  What gives?"  In digging into the issue  the customer remembers that last night was the weekly maintenance window and one of the other guys was making some changes on the box and it is this change that caused things to go wonky.<br />
I guess what I am getting at here is that it helps to know what led up to the issue.  Flushing out the supporting data points can be a big help in piecing the problem together.  Even if you think it is unrelated include it, it can't hurt.  The worst thing that can happen is you spent a few more bits and thankful bits don't cost what they used to.  I've also found that when I take the time to think about _all_ of the things that led up to the event in question the light bulb over my head starts to flicker and maybe I can figure it out before enlisting someone else.</li>
<li><strong>Impact</strong>: Do you have to commit seppuku if this issue is not resolved in the next hour?  If you do you may want to include that in the initial report, it will really help with prioritizing the issue.  Are others unable to do their job because of this, we want to know. If you're asking a question for your own edification share that as well  -  helps us to prioritize other issues and formulate the best answer for you.  Big fires often require an immediate fix and you don't really care about the inner workings of the fix just that it works.  If you are trying to learn something you want the opposite.</li>
<li><strong>Priority</strong>: We all deal with fires (some bigger than others) let the guy on the other end know how you need the issue treated.  Support folk inherently want to help (why else do we do this job?  It isn't for the unlimited supplies of handi-snacks) and if you say I need this now we will make every effort to deliver.</li>
<li><strong>Data Samples</strong>:  One of my new favorite shows is <a href="http://www.aetv.com/the_first_48/">The First 48</a> which follows real homicide cops as they investigate murders.  Each episode always starts off with the cops going to crime scene collecting every potential piece of evidence.  They don't know what is relevant and what is not, so they assume it all is.  The same is true when troubleshooting an issue with software.  The more data points I have to work with the better position I am in to figure out what is going on.<br />
If splunk isn't parsing a field in a given file include a copy of said file along with your configs.  If the UI is acting weird take a screen shot.  If performance is an issue include the results of your tests to determine that things are slow along with the tool(s) used to produce the results.</li>
<li><strong>Repro steps</strong>:  If you can trigger this issue on demand, please share.  Knowing the exact path traveled will often make root cause analysis that much easier.  Screen shots of each step are very helpful (a picture is worth more than a 1,00 words) in describing an issue.</li>
<li><strong>Your investigation</strong>:  I find it is really helpful to know what you have done to try to figure out a problem.  It saves time because I wont ask you to perform steps that you said you've done and you wont get frustrated at me for asking you to do work again.  It also gives me insight into your investigative process  -  if you are thorough I am more inclined to trust your results at first glance.  If you are vague or unclear I have to assume that the information you are providing is incomplete.  This is not to say that what you are giving is bad/wrong/stupid, rather it is not the full story.</li>
</ul>
<p>Ok I'm sure there is more that I can say here but this post is getting kind of long, my fingers are tired of typing, and I need to answer some cases.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/0NMdPIaY0b8" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/matt/2008/07/28/help-me-help-you-opening-a-good-ticket-with-support/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: Splunk and iPhone</title><link>http://feedproxy.google.com/~r/splunkdev/~3/c_qzg1KLX88/</link><comments>http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/</comments><pubDate>Mon, 28 Jul 2008 18:20:43 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[I&#8217;ve been playing with a few things that will eventually turn into an iPhone application to talk to Splunk via the REST API. I don&#8217;t have a lot to say about it right now due to other issues but I do have a little something to show off: 

Splunk doesn&#8217;t support Safari officially yet and [...]]]></description><content:encoded><![CDATA[<p>I've been playing with a few things that will eventually turn into an iPhone application to talk to Splunk via the REST API. I don't have a lot to say about it right now due to <a href="http://gizmodo.com/5028374/iphone-app-devs-still-gagged-by-non+disclosure-agreement-mad-as-fn-hell-about-it">other issues</a> but I do have a little something to show off: </p>
<p><a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/livetail.jpg'><img src="http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/livetail.jpg" alt="" title="livetail" width="150" height="74" class="alignnone size-thumbnail wp-image-400" /></a></p>
<p>Splunk doesn't support Safari officially yet and MobileSafari is a whole 'nother animal, but there are other things you can do. You can talk to the REST endpoints just fine. Here I have a Live Tail search running from the browser, talking to my production server.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/c_qzg1KLX88" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: Forcing dashboard refresh</title><link>http://feedproxy.google.com/~r/splunkdev/~3/9apr07YSH6w/</link><comments>http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/</comments><pubDate>Fri, 25 Jul 2008 17:01:16 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can&#8217;t change this right now. But if you want to force a refresh, you can delete the files that contain the cached data. 
Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the [...]]]></description><content:encoded><![CDATA[<p>In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can't change this right now. But if you want to force a refresh, you can delete the files that contain the cached data. </p>
<p>Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the dashboard data. There is also a directory for each username with *.csv files. Delete the username_* files (like "admin_KB indexed per hour last 24 hours") and the *.csv files and the next time you refresh the dashboard, it will reload. </p>
<p>This is not an elegant solution by any means, but it does work. While you could just delete the files for the search in question, there is no simple way to identify which csv file is associated with it. Just don't go messing with the other files in this directory, you will be Very Unhappy if you do.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/9apr07YSH6w" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: My favorite “customer” and Splunk as multi-tenant platform</title><link>http://feedproxy.google.com/~r/splunkdev/~3/7_Xzh0Pt0fI/</link><comments>http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/</comments><pubDate>Wed, 23 Jul 2008 04:27:25 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[Everyone has their favorite customer.
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
 
Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the [...]]]></description><content:encoded><![CDATA[<p>Everyone has their favorite customer.<br />
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is <a href="http://www.google.com/search?q=rj+auburn+voxeo&amp;#038;ie=utf-8&amp;#038;oe=utf-8&amp;#038;aq=t&amp;#038;rls=org.mozilla:en-US:official&amp;#038;client=firefox-a">RJ Auburn</a><br />
<img class="alignleft size-medium wp-image-67" src="http://ecommmedia.com/mt-static/support/assets_c/userpics/userpic-87-100x100.png" align="left" border="16" margin="10" alt="rj"/> </p>
<p>Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and "industry expert" is the one to personally put splunk through its paces - but it's RJ is like that and gets his hands dirty - and splunk is the better for it. </p>
<p>RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk.  Below are some of the images from the voxeo service:<br />
<br />
<img src="http://blogs.voxeo.com/voxeotalks/files/2008/06/prophecylogsearch3-1.jpg" alt="vox dash" /></p>
<p>On the <a href="http://blogs.voxeo.com/voxeotalks/2008/06/18/voxeo-announces-a-new-beta-service-prophecy-log-search-a-better-way-to-search-your-application-log-files/">Voxeo blog</a> there is a nice description and even a cool video introduction: </p>
<p>Lessons learned from these initial deployments are having a significant effect on our upcoming 4.0 release. First and foremost we will provide a much better html "module" system so that you can embed splunk modules in other webpages. Secondly, we will be having the overall splunk UI more configurable and modular so that multi-tenant customers can build even more custom UI's. </p>
<p>One other very interesting trend is using splunk for SaS using cloud services. Often these uses have some kind of multi-tenant .... It wont be long before splunk makes deploying in the cloud even easier. More in a post to come but do drop me aline if you want to use splunk in the cloud and i can give you some hints.</p>
<p>In the mean time if your looking for the best push-it-to-the-limits beta tester, contact RJ!<br />
Thanks RJ <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>e.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/7_Xzh0Pt0fI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Congrats to FlowingData - strength in (subscriber) numbers!</title><link>http://feedproxy.google.com/~r/splunkdev/~3/tE96g5isxOw/</link><comments>http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/</comments><pubDate>Sun, 20 Jul 2008 18:42:04 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by Brian cool post. 
I just wanted to congratulate Nathan over at FlowingData for crossing the 3100 [...]]]></description><content:encoded><![CDATA[<p>We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by <a href="http://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/">Brian cool pos</a>t. </p>
<p>I just wanted to congratulate Nathan over at <a href="http://www.FlowingData.com">FlowingData</a> for crossing the <a href="http://flowingdata.com/2008/07/19/thank-you-everyone-for-reading-flowingdata/">3100 subscriber mark</a>. </p>
<div style="background-color:#FFFFFF;">
<img src="http://flowingdata.com/wp-content/themes/flowingdata-1-0/images/logo.gif" alt="flowingdata logo" />
</div>
<p>FlowingData is a fantastic example of the hidden value in the data all around us. As more and more of what we do is documented by computers the impact of statistics has become less of a hard-core math geek sport and more within the reach of anyone's curiosity. His daily posts are a constant reminder of how statistics has become a crossover genre.  </p>
<p>Thank you Nathan!<br />
e</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/tE96g5isxOw" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: Talk to Splunk from WordPress</title><link>http://feedproxy.google.com/~r/splunkdev/~3/RsOeNUlgKFg/</link><comments>http://blogs.splunk.com/andrea/2008/07/15/talk-to-splunk-from-wordpress/</comments><pubDate>Tue, 15 Jul 2008 21:20:32 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.  
You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual [...]]]></description><content:encoded><![CDATA[<p>I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.  </p>
<p>You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual search string is hardcoded because I'm doing some extra mangling to get the search terms the way I want anyway, but I'll be adding that to the configuration options also. Eventually there will be a way to cache results so you don't do the search each time the page is loaded. </p>
<p>Since there is still work to do to make it more generic, I haven't uploaded it to the WordPress site. But here is the basic PHP code to play around with. In fine programming tradition, I learned quite a lot by picking apart existing WordPress widgets, in this case Random Image and Twitter Tools. This widget requires the Splunk PHP SDK, by default my code is expecting it to be in the same directory (which is probably going to be something like wp/wp-content/plugins/widgetname.) There are a few things it depends on, you can find the details at the <a href="http://code.google.com/p/splunk-php-sdk/">Google Code page.</a> </p>
<p>You can find the widget here:<br />
<a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/splunk_statsphp1.gz'>splunk_statsphp1</a></p>
<p>Note: updated version posted 31 July 08.</p>
<p>Here's a sample of the kinds of events I'm looking at. I have some extra field extractions because it's a custom format and not exactly access_combined, but I get the referer in there. What I want to display is the actual search string, in this case "drum+carder". I have to strip out the '+' between words because otherwise it doesn't wrap nicely in my narrow sidebar. (I'm sure I could fix this in my theme somehow but Eric Meyer I'm not.)  </p>
<p>xxx.xxx.xxx.xxx [15/Jul/2008:12:08:07 -0700] "GET /tag/drum-carder/ HTTP/1.1&amp;#8243; 200 "http://www.google.com/search?hl=en&amp;#038;pwst=1&amp;#038;q=drum+carder&amp;#038;start=10&amp;#038;sa=N"</p>
<p>You can go look at the code if you really want to know, but here are a few comments on what it's doing: </p>
<p>I only want a couple results, so to make the search as fast as possible I'm limiting what I get back.<br />
        // how many results to get?<br />
        $dispatchProps['max_count'] = 3;</p>
<p>Also there's no need to have the default time to live, so set the timeout to something reasonable. This could be much smaller, even.<br />
        // don't leave the search hanging around<br />
	$dispatchProps['timeout'] = 300;</p>
<p>It's a pretty simple search, the auto key/value extraction already gets the q= stuff out of the referer field.<br />
        // using head to get only what I want makes the search way faster<br />
        $job_id = $searchMgr->syncSearch('search sourcetype="spinnyspinny_access_log" google search | head 3&amp;#8242;, $dispatchProps);</p>
<p>Here's what it looks like in my sidebar: </p>
<p><a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/sidebar_widget.png'><img src="http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/sidebar_widget.png" alt="image of my sidebar widget installed" title="sidebar_widget" width="207" height="140" class="alignnone size-medium wp-image-397" /></a></p>
<p>If you want to see it in action, I have it installed in my personal blog at <a href="http://www.feorlen.org">http://www.feorlen.org</a>. It is pulling statistics about my other site at <a href="http://www.spinnyspinny.com">http://www.spinnyspinny.com</a>, which gets a lot of search engine hits from Google. If you want to test it, search for "spinnyspinny" and some other relevant keywords like "yarn" and you will find my site. Don't go abusing it now, because you know that Splunk will be telling me your IP! <img src='http://blogs.splunk.com/andrea/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/RsOeNUlgKFg" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/07/15/talk-to-splunk-from-wordpress/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: What is it doing?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/32aH0qoTBRg/</link><comments>http://blogs.splunk.com/andrea/2008/07/15/what-is-it-doing/</comments><pubDate>Tue, 15 Jul 2008 16:31:15 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at. 
audit.log
New [...]]]></description><content:encoded><![CDATA[<p>Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at. </p>
<p>audit.log</p>
<p>New in 3.2, the audit.log records who did what based on what capability was requested from the authorization system. It shows both user-initiated actions like login and automated actions like running saved searches. </p>
<p>Login<br />
07-14-2008 10:59:09.434 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:09 2008, user=admin, action=login attempt, info=succeeded][n/a]</p>
<p>Running a script<br />
07-14-2008 10:59:12.542 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:12 2008, user=admin, action=run_script_sendemail, info=granted ][n/a]</p>
<p>Dispatch search<br />
07-14-2008 14:43:39.619 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 14:43:39 2008, user=admin, action=search, info=granted dispatch maxtime=0 maxresults=100 [search sudo | eval sizeof=length(host)  ] | outputcsv][n/a]</p>
<p>REST request<br />
07-15-2008 08:21:33.576 INFO  AuditLogger - Audit:[timestamp=Tue Jul 15 08:21:33 2008, user=admin, action=search, info=granted REST: /search/jobs][n/a]</p>
<p>license_audit.log</p>
<p>These are the LicenseManager event that used to be reported in splunkd.log, now they are in their own file. The things to pay attention to are quotaExceededCount (number of license violations,) peak (all-time high daily volume) and todaysBytesIndexed. rolloverCount is the number of rollovers since last cleanUsually there is one event generated a day, just after midnight, but there can be others if the instance has been restarted. </p>
<p>07-15-2008 00:01:38.456 INFO  LicenseManager-Audit - Audit:[timestamp=1216105298 quotaExceededCount=0, lastExceedDate=0, peak=14699861, rolloverCount=1, totalCumulativeBytesAtRollover=14699861, todaysBytesIndexed=14699861][Jls7bqb2G3dcwAgzAmi0P5pmJn1+IgDwMpoxmW1idMGbA1IlW2amr8tYq5ROlL3bysBxpCV46OEBCt3MJxjI73VvmGSWffU5C+1K3UXYejOLBdinoRavtk+hgLil69eF4n/vQ2mVixK179iHVkzckUcUe8X8iz8qPZT6BEvFhh0AukKlk6IFCrXWRftYysMEIR0IAmcuns7PWBzo/FmEOdm9rBKfVnNMKSvvos39QVooj4O6Km2+xsMUododll8w9IMrl9l0dDHW4AhfZfEN7Sf8krE1c/T/Q+VAxMRgzB0iqJWIddtIxgp6pmdBzD2q7dk9L2pAbkjzDlXRM5GyAg==]</p>
<p>metrics.log</p>
<p>I've talked about this one before, when trying to identify high volume data inputs. New for 3.3, in addition to the default 10 items per period you can configure how many items are reported in metrics.log by setting maxseries in limits.conf. (See limits.conf.spec for details.) Making this number larger will impact performance, but you can do it for investigating a specific issue. Or you can reduce it also. As before, it's a sample of the top n items for each group in a 30 second period. So if you have 200 sources, you won't see all your data inputs here. We are already talking about what metrics we can report, so in 4.0 expect to see new options. </p>
<p>Track blocked queues by looking for "blocked!!":<br />
06-24-2008 09:22:08.792 INFO  Metrics - group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=21, empty_count=0, current_size=1000, largest_size=1000, smallest_size=908</p>
<p>See which processors are actively running:<br />
07-09-2008 14:03:43.876 INFO  Metrics - group=pipeline, name=parsing, processor=utf8, cpu_seconds=0.321082, executes=90770, cumulative_hits=218992256</p>
<p>Diagnostic searches with CLI dispatch</p>
<p>The new dispatch search allows searching across many more events than the older search command. From the CLI, you can use the dispatch command or write something that uses the REST API. Particular searches can tell you more than just returning events. Dispatch from the CLI is particularly suited to this as it's designed for reporting across huge sets of events (although not to return those hundreds of thousands of events.) It may take a while to run, but it will complete. </p>
<p>How many events?<br />
./splunk dispatch "sourcetype=access_combined | stats count"<br />
./splunk dispatch "sourcetype=access_combined starttime::04/25/08:00:00:00 | stats count"</p>
<p>How big are they?<br />
./splunk dispatch "host=foohost1  | eval sizeof=length(_raw) | stats sum(sizeof)"</p>
<p>How big are various other things?<br />
./splunk dispatch "sourcetype=syslog | eval sizeof=length(host) | stats avg(sizeof)"</p>
<p>Note that all of these use additional search commands to report on the set of events rather than the events themselves. Actual results returned from dispatch via the CLI are maintained in memory, so trying to get back thousands of events or more can cause serious problems. Don't do it. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/32aH0qoTBRg" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/07/15/what-is-it-doing/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: Splunking pitchfork album reviews</title><link>http://feedproxy.google.com/~r/splunkdev/~3/GLbmBe0hQ08/</link><comments>http://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/</comments><pubDate>Mon, 14 Jul 2008 23:05:39 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[One of my favorite sites is the record review and music news site pitchfork media. On the site they have a bunch of interesting statistics like top record for each decade/year but these are obviously a more subjective list than if they crunched the raw stats. For example their #1 album of the nineties is [...]]]></description><content:encoded><![CDATA[<p>One of my favorite sites is the record review and music news site <a  href="http://pitchforkmedia.com">pitchfork</a> media. On the site they have a bunch of interesting statistics like top record for each decade/year but these are obviously a more subjective list than if they crunched the raw stats. For example their #1 album of the nineties is Radiohead's "Ok Computer" (rated 10.0)  and the #15 is "The Bends" by Radiohead ( which isn't reviewed on the site at all ).  I was interested in crunching the data provided by their wealth of reviews. So I downloaded all the  record reviews using a simple python script. And parsed out the description, rating, label, reviewer, release year, title and artist using the following regex :  </p>
<p>.*?&amp;lt;h2 class="fn"&amp;gt;\s*(.*?):&amp;lt;br /&amp;gt;([^\n]*)\n.*?&amp;lt;div class="info"&amp;gt;\n\[([^&amp;lt;;]*);?\s*(\d*)\]?.*?&amp;lt;span class="rating"&amp;gt;(.*?)&amp;lt;.*?&amp;lt;div class="content description"&amp;gt;(.*?)&amp;lt;/div&amp;gt;.*?  - &amp;lt;span class="reviewer"&amp;gt;&amp;lt;span class="vcard"&amp;gt;&amp;lt;span class="fn"&amp;gt;(.*?)&amp;lt;/span&amp;gt;.*?title="\d+"&amp;gt;(.*?)&amp;lt;</p>
<p>I can now run some interesting queries :</p>
<ul>
<li>
*  | chart  avg(rating) by releaseYear <br />
<a href='http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-5.png'><img src="http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-5.png" alt="" title="picture-5" width="600" height="120" class="aligncenter size-full wp-image-31" /></a><br />
Which graphs the average rating per calendar year of the release.
</li>
<li>
*| stats count(title), avg(rating) by artist | search "count(title)">2| sort "avg(rating)" d | head 10<br />
<a href='http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-6.png'><img src="http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-6.png" alt="" title="picture-6" width="300" height="211" class="alignnone size-medium wp-image-32" /></a><br />
This shows the top rated artists that have a least 3 reviews on pitchfork
</li>
<li>
* rating<=10  rating>0  | stats  avg(rating) as avg_rating, count(title) as title_count by label | search title_count>3 | sort avg_rating | head 10<br />
<a href='http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-7.png'><img src="http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-7.png" alt="" title="Worst labels" width="328" height="237" class="aligncenter size-full wp-image-33" /></a><br />
This shows that Invisible Records are the worst reviewed label on Pitchfork.
</li>
<li>
* | stats count(title), avg(rating) by reviewer | search "count(title)">4 "avg(rating)">7.5 | sort "avg(rating)" d<br />
<a href='http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-8.png'><img src="http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-8.png" alt="" title="Easiest reviewers" width="300" height="283" class="alignnone size-medium wp-image-34" /></a><br />
This search finds all the reviewers that have at least 5 reviews and on average score higher than 7.5. So if you want a good review on pitchfork you're better off with Luke Buckman <img src='http://blogs.splunk.com/brian/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</li>
<li>
* | eventstats count(title)  as titleCount by reviewer | search eventtype=7_dirty_words titleCount>3 | stats count(title) as ct ,max(titleCount) as mf by reviewer  | eval blue_index=ct*1.0/mf | sort blue_index d<br />
<a href='http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-9.png'><img src="http://blogs.splunk.com/brian/wp-content/uploads/2008/07/picture-9.png" alt="" title="Reviewers that curse the most" width="300" height="259" class="alignnone size-medium wp-image-35" /></a><br />
This is my personal favourite, it's a list of reviewers most likely to use the one of George Carlin's <a href="http://en.wikipedia.org/wiki/Seven_dirty_words">seven dirty words (nsfw)</a>. The mf column is the count of reviews with one of the words and the ct row is the review count for that reviewer. The blue_index is the mf/ct.
</li>
</ul>
<p>So there you go : Splunk > it's not just for logs.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/GLbmBe0hQ08" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Andrea Longo: More frequent alerts with CLI dispatch</title><link>http://feedproxy.google.com/~r/splunkdev/~3/tAJieSN08tw/</link><comments>http://blogs.splunk.com/andrea/2008/07/14/more-frequent-alerts-with-cli-dispatch/</comments><pubDate>Mon, 14 Jul 2008 18:17:14 +0000</pubDate><dc:creator>Andrea Longo</dc:creator><description><![CDATA[The saved search scheduler that the UI uses runs into trouble when you start running a bunch of searches at the same time. It kicks off one, waits for it to return or timeout and then moves on to the next. If the searches take more than a few seconds to run or there are [...]]]></description><content:encoded><![CDATA[<p>The saved search scheduler that the UI uses runs into trouble when you start running a bunch of searches at the same time. It kicks off one, waits for it to return or timeout and then moves on to the next. If the searches take more than a few seconds to run or there are dozens of them all with high frequency, it gets overloaded. One way to address this is to take advantage of the new dispatch (asynchronous search.) Dispatch is what is behind the REST API search functions and you can also get to it from the CLI with the "dispatch" command instead of the old "search."</p>
<p>Old CLI search: </p>
<p>./splunk search "sourcetype=access_combined googlebot | stats count" -maxresults 500<br />
count<br />
 -  - <br />
213  </p>
<p>New CLI search: </p>
<p>./splunk dispatch "sourcetype=access_combined googlebot | stats count"<br />
count<br />
 -  - <br />
213  </p>
<p>While the results look the same for this simple search, there is a lot different going on behind the scenes. The search command needs to load all the events it touches into memory, so there is only so much of the index it can search at one time. The data generation part, before the pipe, will only return maxresults number of events, which may not be all of them. If you then filter with additional search commands you won't get all of what you think you should. You can increase maxresults (default for the CLI is 100) but you can only push it so much until you run into memory problems. </p>
<p>The dispatch search kicks off a job that runs until completion, no matter how long it takes. But one thing to keep in mind is that CLI dispatch is designed for reporting: the actual results are all in memory so you can't get back thousands of results from a single search. Use reporting commands like stats or narrow your searches so they won't have more than a couple hundred results. (If you need more, write something that uses the REST API where you have access to job control.)</p>
<p>So how this applies to alerting: </p>
<p>In the UI, when a scheduled search runs, it uses a search command to actually generate the alert. There are a couple different ones, but as most people want an email I'll focus on sendemail. (Docs here: http://www.splunk.com/doc/3.3/user/UnsupportedCommands#sendemail.) </p>
<p>Any search can use the sendemail search command, it's not limited to the UI. So I can do this: </p>
<p>./splunk dispatch "error | sendemail to=sysadmins@example.com from=splunk@example.com" </p>
<p>This runs the search and then looks for a mail server (by default on the local machine) to send the message. Since it's using dispatch, you can kick off a bunch of these and they will all run independently of each other. You can look at the jobs from the REST endpoint: </p>
<p>https://localhost:8089/services/search/jobs</p>
<p>Splunk Atom Feed: jobs<br />
Updated: 2008-07-14T10:39:16-0700 Splunk build: 38343<br />
dispatch<br />
cursorTime	1969-12-31T16:00:00.000-08:00<br />
error<br />
eventCount	316<br />
isDone	1<br />
isFinalized	0<br />
isPaused	0<br />
isStreaming	0<br />
keywords	sudo<br />
resultCount	100<br />
sid	1216057125.31<br />
ttl	3570.9 seconds<br />
events - results - timeline - summary -<br />
control:</p>
<p>2008-07-14T10:38:47.000-07:00 | admin</p>
<p>Here's an example I set up on my local machine, an OS X 10.5 box which uses postfix. I've already made sure postfix is running and I can receive mail to my local account. </p>
<p>I wrote a script that does 50 searches, all set to alert with an email address. Note the auth in the command, if you aren't already authenticated you will need to use the auth command as part of the CLI search. In a  production environment, you would want a more sophisticated means of handling login credentials than sticking plaintext into a script. (You could also use a restricted user created only for CLI searches.) </p>
<p>[root]:/opt/splunk3.3/bin$ more alert_overload.sh<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo01&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo02&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo03&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo04&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo05&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo06&amp;#8243; -auth admin:changeme&amp;#038;<br />
./splunk dispatch "sudo | sendemail to=feorlen from=foo07&amp;#8243; -auth admin:changeme&amp;#038;<br />
[...]</p>
<p>When I run this script, it starts up all these searches. (Note that each one starts up another python! Keep that in mind.) When they complete, they send an email alert. </p>
<p> N 16 foo13@AndreasSplunkP  Mon Jul 14 10:59 393/280230 "Splunk Results"<br />
 N 17 foo17@AndreasSplunkP  Mon Jul 14 10:59 393/280230 "Splunk Results"<br />
 N 18 foo11@AndreasSplunkP  Mon Jul 14 10:59 393/280230 "Splunk Results"<br />
 N 19 foo32@AndreasSplunkP  Mon Jul 14 10:59 393/280230 "Splunk Results"<br />
 N 20 foo09@AndreasSplunkP  Mon Jul 14 10:59 393/280230 "Splunk Results"<br />
? s* dispatch_test.mbox<br />
"dispatch_test.mbox" [New file]<br />
? x<br />
AndreasSplunkPowerbook-2[feorlen]:~$ grep ^From: dispatch_test.mbox | wc -l<br />
      50</p>
<p>The messages don't arrive in the same order, but they do arrive. For these 50 test searches, it was about 20 seconds for all of them. More complicated searches will take longer. One thing to know is that if you are searching faster than it can complete, as in every minute you start a search that takes two minutes to run, they will back up and take a while to complete. There is no hard guideline, as it depends on the individual searches and the overall load on the instance. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/tAJieSN08tw" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/andrea/2008/07/14/more-frequent-alerts-with-cli-dispatch/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Simple Transactions</title><link>http://feedproxy.google.com/~r/splunkdev/~3/kp05PItO10M/</link><comments>http://blogs.splunk.com/david/2008/07/03/simple-transactions/</comments><pubDate>Thu, 03 Jul 2008 15:28:11 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[In this post, I&#8217;ll show you how to use Splunk&#8217;s Transaction search, with several powerful examples.

In the latest releases, we have search-time discovery of transactions, with the new transaction search command.  Transaction collapses a set of events that belong to a transaction into a single event.  You can specify the parameters as arguments [...]]]></description><content:encoded><![CDATA[<p><strong>In this post, I'll show you how to use Splunk's Transaction search, with several powerful examples.</strong></p>
<p><span id="more-194"></span></p>
<p>In the latest releases, we have search-time discovery of transactions, with the new <a href="http://www.splunk.com/doc/3.3/admin/Transam">transaction</a> search command.  Transaction collapses a set of events that belong to a transaction into a single event.  You can specify the parameters as arguments to the transam operator right in the search, or you can refer to a named-transaction definition in transactiontypes.conf.  A few  simple examples will give you an idea of some things you can do.</p>
<ul>
<li>get events with 'http', and group any search results into "bursts" of events, grouping any events that occur within two seconds of each other into the same transaction event.  [Note: there is an implied "search" command at the head of all searches, so "http" is really "search http".]</li>
<pre>http | transaction maxpause=2s</pre>
<li>get events with 'http', and collapse those that share the same host and cookie value, that occur within 30 seconds:</li>
<pre>http | transaction fields=host,cookie maxspan=30s maxpause=30s</pre>
<li>get events with 'sendmail', and collapse those that have the same userid,  between a login and a logout, that occur within 10 minutes:</li>
<pre>sendmail | transaction fields=uid startswith="eventtype=login" endswith="eventtype=logout" maxspan=10m maxpause=10m</pre>
<li>get events with 'http', and then find transactions as defined by email_transaction found in transactions.conf:</li>
<pre>http | transaaction email_transaction</pre>
<li> Find transactions that change a password, near where there were unsuccessful root logins.  To break it down  -  search for unsuccessful root logins, find time ranges around those root logins, find transactions in those those regions, and finally look for password changes in the transaction.
<pre>root login NOT fail*
| localize maxspan=1m maxpause=1m
| map search="search starttimeu=$starttime$ endtimeu=$endtimeu$
| transaction session |  search password change"</pre>
</li>
</ul>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/kp05PItO10M" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2008/07/03/simple-transactions/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Using splunk in Fedora9 x86_64</title><link>http://feedproxy.google.com/~r/splunkdev/~3/nPZ9ucVfCns/</link><comments>http://blogs.splunk.com/mark/2008/05/20/using-splunk-in-fedora9-x86_64/</comments><pubDate>Wed, 21 May 2008 00:52:37 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[For those who use Linux as their primary desktop, using splunk can be a chore. Splunk dashboards are built on Flash9. So, you will likely need the following commands (as root, or sudo) to get Flash working.

rpm -ivh http://fpdownload.macromedia.com/get/flashplayer/current/flash-plugin-9.0.124.0-release.i386.rpm
yum install nspluginwrapper.{i386,x86_64} pulseaudio-lib.i386
yum install flash-plugin
yum erase rhythmbox.*
mozilla-plugin-config -i -g -v
mozilla-plugin-config nspluginwrapper -i /usr/lib/mozilla/plugins/libflashplayer.so

(Optionally, if you haven&#8217;t [...]]]></description><content:encoded><![CDATA[<p>For those who use Linux as their primary desktop, using splunk can be a chore. Splunk dashboards are built on Flash9. So, you will likely need the following commands (as root, or sudo) to get Flash working.</p>
<ul>
<li>rpm -ivh http://fpdownload.macromedia.com/get/flashplayer/current/flash-plugin-9.0.124.0-release.i386.rpm</li>
<li>yum install nspluginwrapper.{i386,x86_64} pulseaudio-lib.i386</li>
<li>yum install flash-plugin</li>
<li>yum erase rhythmbox.*</li>
<li>mozilla-plugin-config -i -g -v</li>
<li>mozilla-plugin-config nspluginwrapper -i /usr/lib/mozilla/plugins/libflashplayer.so</li>
</ul>
<p>(Optionally, if you haven't imported the Adobe GPG key, you will have to run the following command)</p>
<ul>
<li>#rpm  - import /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux</li>
</ul>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/nPZ9ucVfCns" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2008/05/20/using-splunk-in-fedora9-x86_64/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Inder Sabharwal: Aggregating Metrics from all your Splunks…</title><link>http://feedproxy.google.com/~r/splunkdev/~3/NCfB2xRuNUM/</link><comments>http://blogs.splunk.com/inder/2008/05/15/aggregating-metrics/</comments><pubDate>Fri, 16 May 2008 00:05:31 +0000</pubDate><dc:creator>Inder Sabharwal</dc:creator><description><![CDATA[If you found that the new metrics being generated by Splunk on the input (indexing in many cases) and forwarding side to be useful, I am sure you would want to aggregate them all in a central location. Well, you can do that by using Splunk&#8217;s forwarding mechanism itself! Although, it does not matter where [...]]]></description><content:encoded><![CDATA[<p>If you found that the <a href="http://blogs.splunk.com/inder/2008/05/15/forwarder-and-indexer-metrics/">new metrics being generated</a> by Splunk on the input (indexing in many cases) and forwarding side to be useful, I am sure you would want to aggregate them all in a central location. Well, you can do that by using Splunk's <a href="http://www.splunk.com/doc/3.2.3/admin/ForwardingandReceiving" target="_blank">forwarding</a> mechanism itself! Although, it does not matter where you aggregate these metrics, I believe the <a title="How the deployment server works" href="http://www.splunk.com/doc/3.2.3/admin/HowDeploymentServerWorks" target="_blank">Deployment Server</a> instance could be a good location, if you have one setup for your installation.</p>
<h3>Forwarding metrics.log</h3>
<p>Forwarding <em>metrics.log</em> will require that you make the following changes to the configuration on each Splunk instance that you would like to collect the metrics from:</p>
<li>Edit or create <code>inputs.conf</code> in <code>$SPLUNK_HOME/etc/system/local</code> folder<br />
<blockquote><p>[monitor://$SPLUNK_HOME/var/log/splunk/metrics.log]</p>
<p>_TCP_ROUTING = RouteMetricsToDeploymentServer</p></blockquote>
</li>
<li>Similarly for <code>outputs.conf</code><br />
<blockquote><p>[tcpout]<br />
disabled=false<br />
[tcpout:RouteMetricsToDeploymentServer]<br />
server=&amp;lt;deployment_sever_ip&amp;gt;:&amp;lt;deployment_server_port&amp;gt;</p></blockquote>
</li>
<p>If you have many Splunks in your environment, then making these changes on each one of them manually is certainly not an option you would cherish. This is where Deployment Server can help you centralize all your configurations in one place and distribute them to all or selected instances.</p>
<p>Here's something I like to do</p>
<h3>1. Have all Splunks point to a common Deployment Server</h3>
<p>This can be achieved very easily by creating/editing <code>deployment.conf</code> in <code>$SPLUNK_HOME/etc/system/local</code> on each Splunk instance.</p>
<blockquote><p>[deployment-client]<br />
deploymentServerUri=&amp;lt;your_deployment_server_uri&amp;gt;:&amp;lt;mgmt_port&amp;gt;</p></blockquote>
<p>For some of my distributed testing on EC2, I have images that include this configuration in the default image (AMI). Using this approach guarantees that configurations never ever have to be changed by hand!</p>
<h3>2. Create a bundle</h3>
<p>Create a bundle by any name (I called it <em>deployable</em>) and make sure it is available in your Deployment Server's <code><a href="http://www.splunk.com/doc/3.2.3/admin/ConfigDeploymentServer">serverClassPath</a></code>. This bundle should have two files - inputs.conf and outputs.conf - as described above - <a href="http://blogs.splunk.com/devuploads/2008/05/deployabletar.gz">here's a sample bundle</a> you could re-use.</p>
<h3>3. Make the bundle available to all Splunks</h3>
<p>Make all deployment clients that connect to the deployment server to be part of the <em>deployable</em> service class. This is achieved by changing deployment.conf on Deployment Server again as:</p>
<blockquote><p>[distributedDeployment-classMaps]<br />
*=deployable</p></blockquote>
<h3>4. Refresh Deployment Server Configuration</h3>
<p>This CLI on your Deployment Server instance will make it aware of the new configuration without a restart:</p>
<blockquote><p>splunk reload deploy-server -auth admin:changeme</p></blockquote>
<p>You are now all set and all Splunks in your environment will automagically download and apply the bundles within a minute! And in another 30 seconds, your Deployment Server will start aggregating metrics information about your <strong>entire data-center</strong>!</p>
<p>We want to hear about your experiences in managing Splunk - use the Comments below or send me an email directly at <a href="mailto:inder@splunk.com">inder@splunk.com</a>.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/NCfB2xRuNUM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/inder/2008/05/15/aggregating-metrics/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Inder Sabharwal: Forwarder and Indexer Metrics</title><link>http://feedproxy.google.com/~r/splunkdev/~3/ftyQoOh5tO0/</link><comments>http://blogs.splunk.com/inder/2008/05/15/forwarder-and-indexer-metrics/</comments><pubDate>Thu, 15 May 2008 16:36:09 +0000</pubDate><dc:creator>Inder Sabharwal</dc:creator><description><![CDATA[If you were always wondering how much data was being transferred between your forwarders and indexers, we may have some help for you. Splunk now publishes these metrics to metrics.log, which are by default tailed and indexed in  &#8220;_internal&#8221;.
Forwarding-side
Splunk uses a component called TcpOutputProcessor, which is configured using outputs.conf, to forward data to another [...]]]></description><content:encoded><![CDATA[<p>If you were always wondering how much data was being transferred between your forwarders and indexers, we may have some help for you. Splunk now publishes these metrics to metrics.log, which are by default tailed and indexed in  "_internal".</p>
<h3>Forwarding-side</h3>
<p>Splunk uses a component called <span style="font-style: italic" class="Apple-style-span">TcpOutputProcessor</span>, which is configured using outputs.conf, to forward data to another Splunk or non-Splunk entity. This is something that a lot of people also refers to as a <span style="font-style: italic" class="Apple-style-span">forwarder</span>. Each TcpOutputProcessor instance publishes metrics events every 30 seconds - all the fields of these events are described below:</p>
<ul>
<li> <span style="font-weight: bold" class="Apple-style-span">group=tcpout_connections</span> - this field discriminates this event as being a TcpOutput metric.</li>
<li><span style="font-weight: bold" class="Apple-style-span">tcpout_group_name:destIp:destPort</span> - the load-balanced group that this metric belongs to. If you have multiple groups defined, a separate event is published for each of those groups.</li>
<li><span style="font-weight: bold" class="Apple-style-span">host metadata</span> - is always available in an event, and refers to the host on which the forwarder is running.</li>
<li><span style="font-weight: bold" class="Apple-style-span">sourcePort</span> - the local port that is used to connect to the remote entity.</li>
<li><span style="font-weight: bold" class="Apple-style-span">destIp</span> - the ip address of the remote server to which events are being forwarded.</li>
<li><span style="font-weight: bold" class="Apple-style-span">destPort</span> - the destination port on which events are being forwarded.</li>
<li><span style="font-weight: bold" class="Apple-style-span">tcp_bps</span> - bytes per second averaged over last 30 seconds.</li>
<li><span style="font-weight: bold" class="Apple-style-span">tcp_kbprocessed</span> - total KBytes processed since this connection went live.</li>
<li><span style="font-weight: bold" class="Apple-style-span">tcp_eps</span> - events per second averaged over last 30 seconds.</li>
<li><span style="font-weight: bold" class="Apple-style-span">tcp_dropped_events</span> - number of events dropped on this connection.</li>
</ul>
<h3>Indexing side</h3>
<p>Similarly on the indexing side, if you have configured inputs.conf to receive data from one or more forwarders, a metrics event is published every 30 seconds for <span class="Apple-style-span" style="font-style: italic">each</span> connection into your indexer. All the fields of a metrics event on the input side are described below:</p>
<ul>
<li> <span class="Apple-style-span" style="font-weight: bold">group=tcpin_connections</span> - this field discriminates this event as being an input metric.</li>
<li><span class="Apple-style-span" style="font-weight: bold">sourceHost</span> - The hostname of the entity that is forwarding data to this indexer. If hostname is not available, then it's IP address is used.</li>
<li><span class="Apple-style-span" style="font-weight: bold">sourcePort</span> - The remote port of the forwarding entity.</li>
<li><span class="Apple-style-span" style="font-weight: bold">destPort</span> - The local port on the input side for which this metric is being collected. Typically this port is defined in inputs.conf.</li>
<li><span class="Apple-style-span" style="font-weight: bold">tcp_bps</span> - bytes per second averages over last 30 seconds.</li>
<li><span class="Apple-style-span" style="font-weight: bold">tcp_kprocessed</span> - KBytes processed since the connection was established.</li>
<li><span class="Apple-style-span" style="font-weight: bold">tcp_eps</span> - Events per second averaged over 30 seconds.</li>
</ul>
<p>These metrics will now enable you to get unusual insight into the operation of your forwarders and indexers. Here's a sample query that you can run on each indexer instance to get a report on thruput by each forwarding entity:</p>
<blockquote><p><code>index=_internal metrics "group=tcpin_connections" | timechart span=30s avg(tcp_bps) by sourceHost</code></p></blockquote>
<p>Also, I created a <a href="http://www.splunk.com/doc/3.2.3/user/Alerting">saved search</a>, and used Splunk's <a href="http://www.splunk.com/doc/3.2.3/user/ReportGallery">reporting features</a> to always show me the current status on a dashboard. <code> </code><code></code><code> </code><code> </code></p>
<p style="margin: 0px; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; font-size: 12px; line-height: normal; font-size-adjust: none; font-stretch: normal"> <img src="http://blogs.splunk.com/devuploads/2008/05/indexer_thruput_by_forwarder.jpg" alt="Index Thruput by Forwarder" height="367" width="919" /></p>
<p>Now that you have all of this nice data, I am sure you would like it all <a href="http://dev.splunk.com/2008/05/15/aggregating-metrics/">aggregated in one location</a>.</p>
<p>Good luck playing with these metrics, and if you have any suggestions on what more you would like to see, drop me a line at <a href="mailto:inder@splunk.com" title="email inder" target="_blank">inder@splunk.com</a>.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/ftyQoOh5tO0" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/inder/2008/05/15/forwarder-and-indexer-metrics/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Matt Green: Did you know that your Active Directory is just a glorified LDAP?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/-LL-BCsf6BI/</link><comments>http://blogs.splunk.com/matt/2008/05/12/did-you-know-that-your-acitve-directory-is-just-a-glorified-ldap/</comments><pubDate>Tue, 13 May 2008 01:19:35 +0000</pubDate><dc:creator>Matt Green</dc:creator><description><![CDATA[Microsoft Tube Surfers,
Wanted to take a minute to talk about authenticating Splunk against Active Directory.  In case you didn&#8217;t know Active Directory is running on top of LDAP.  While the guys up in Redmond do their best to make sure tha you have no need to know LDAP they give you the ability [...]]]></description><content:encoded><![CDATA[<p>Microsoft <a href="http://www.youtube.com/watch?v=9cdbas62oLQ">Tube Surfers</a>,</p>
<p>Wanted to take a minute to talk about authenticating Splunk against Active Directory.  In case you didn't know Active Directory is running on top of LDAP.  While the guys up in Redmond do their best to make sure tha you have no need to know LDAP they give you the ability to interface with it over LDAP if you know what you're doing.  Let's take this time to let you know what you need to do.</p>
<p>If you are comfortable with the command line you can run the command <a href="http://support.microsoft.com/kb/237677" target="blank">ldifede</a>.  The ldifde command is the windows equivalent of ldapsearch and should allow you to get an ldif entry for yourself and a group.  With those two entries we should be able to come up with authentication.conf that will allow Splunk to authenticate users.</p>
<p>For those of you that are more comfortable with a GUI The Sysinternals team offers a nice utility called <a href="http://technet.microsoft.com/en-us/sysinternals/bb963907.aspx" target="blank">Active Directory Explorer</a>.   This gives you tree view of your Active Directory/LDAP structure.</p>
<p>The information provided from these utilities is pretty much everything you need to know in order to follow along with the <a href="http://www.splunk.com/doc/latest/admin/AuthLDAP">documentation</a>.  If you are still struggling to get it working send an email to support@splunk.com with the output from the ldifde command and your authentication.conf and someone from team will help square you away.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/-LL-BCsf6BI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/matt/2008/05/12/did-you-know-that-your-acitve-directory-is-just-a-glorified-ldap/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Matt Green: Help Me Help You</title><link>http://feedproxy.google.com/~r/splunkdev/~3/L8shsQoiTas/</link><comments>http://blogs.splunk.com/matt/2008/04/30/help-me-help-you/</comments><pubDate>Wed, 30 Apr 2008 23:08:25 +0000</pubDate><dc:creator>Matt Green</dc:creator><description><![CDATA[Peoples of the Interweb,
As one of the Splunk Support Monkeys I am going to try to start a semi-regular series of posts on a topic that is near and dear to me &#8212; getting the Splunk community to be able to troubleshoot their issues without the need to reach out to the Support Team.
The most [...]]]></description><content:encoded><![CDATA[<p>Peoples of the Interweb,</p>
<p>As one of the Splunk Support Monkeys I am going to try to start a semi-regular series of posts on a topic that is near and dear to me  -  getting the Splunk community to be able to troubleshoot their issues without the need to reach out to the Support Team.</p>
<p>The most important piece of any troubleshooting exercise is getting a solid understanding of the problem.  The common statement "Shit is broke" while 'summarizing' the problem doesn't do much in the way of isolating the specific problem.  Taking a minute or two to think about the problem at and documenting the sequence of events leading up to the problem goes a long way to getting outsiders up to speed on the issue.<br />
Here are few things to keep in mind when working with support:</p>
<p><span style="font-weight: bold">I don't work in the next cube over.</span></p>
<p>This means I don't have insight into all of the other moving parts of your network.  Try avoiding acronyms that are specific to your organization.  I don't know the naming convention that you use for machine names, so if one box is in LA and the other is New York tell me, don't expect me to know that foo.company.com is sitting in the LA data center.</p>
<p><span style="font-weight: bold">Less is not more. </span></p>
<p>You can never give a support engineer to much data. Often times folks think that they have identified the offending error message in the logs and provide that one line in their support ticket.  The problem with this is that the support engineer does not get the benefit of context.  Most errors are the result of a series of events leading up the final failure.  Being able to see what was going on leading up to the problem often times is what allows us to identify cause.  The basic rule of thumb is if you think it would be at all useful share.  If I can channel <a href="http://www.youtube.com/watch?v=_RpSv3HjpEw" target="_blank">Don Rumsfeld</a> for moment: It easy to know what you know, it is hard to know what you don't know.</p>
<p><span style="font-weight: bold">Reduce the problem to the fewest number of variables possible.</span></p>
<p>Remember your 7th grade Algebra class and those complex equations that Mr Buckner had you had solve?  You started off solving for x and then you went back using your knowledge of x to determine the value of y.  The same is true when troubleshooting software.  When you try to solve 4 problems at once you end up polluting your results; you can't tell if the change you made for x resulted in y blowing up.  By breaking the problem into smaller chunks you are operating in a more scientific manner and the results have more credibility.</p>
<p><span style="font-weight: bold">Log like there is no tomorrow. </span></p>
<p><a href="http://www.splunk.com/doc/latest/admin/ContactSupport#Loglevelsandstartingindebugmode" target="_blank">Debug logs</a> are your friend.  In normal operations the logs don't need to be verbose but when you are trying to figure something out why not give yourself the benefit of the secret messages that the developer put in the code for precisely this reason.  It is also helpful to push the existing log file out of the way when starting in a debug mode.  While I said early that you can never give a support engineer to much information the majority of the stuff in your logs (especially if you've been running for awhile) is going to be white nows.  Starting in debug mode with a fresh log means that the problem and the only the problem are going to be in the log.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/L8shsQoiTas" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/matt/2008/04/30/help-me-help-you/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Igor Stojanovski: WMI comes to Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/GX46LKFxTrQ/</link><comments>http://blogs.splunk.com/igor/2008/04/29/wmi-comes-to-splunk/</comments><pubDate>Tue, 29 Apr 2008 18:33:42 +0000</pubDate><dc:creator>Igor Stojanovski</dc:creator><description><![CDATA[The Windows release of Splunk Preview debuts with WMI.  So, what is WMI for all you splunkheads out there?  It&#8217;s an OS interface which allows &#8220;instrumented components to provide information and notification&#8221;.  WMI gives you the ability to query system instrumentation data such as system performance, event logs, end countless other events [...]]]></description><content:encoded><![CDATA[<p>The Windows release of <a href="http://www.splunk.com/download">Splunk Preview</a> debuts with WMI.  So, what is WMI for all you splunkheads out there?  It's an OS interface which allows "instrumented components to provide information and notification".  WMI gives you the ability to query system instrumentation data such as system performance, event logs, end countless other events that occur on the system.  It also has the capability of doing this agent-less from remote machines.  The most exciting feature is the ability to do collection of Windows event logs from other machines on your network simultaneously.  A Splunk install is not required on every single node that generates this data, and you don't need to do anything special to facilitate this.  Assuming you've set up proper authentication between the machines, of course.  Setting up proper WMI security is a hot topic on its own.</p>
<p>From the standpoint of configuration and what WMI is capable of doing, in the context of Splunk, WMI can be used in two ways: to pull event logs and to query instrumentation data.  Assuming that you have enough credentials to poll event logs agentlessly, you can  simply specify host name and the log file you are interested in.  This is an example of retrieving "Application" event logs from a remote machine named "remotehost":</p>
<pre>[WMI:RemoteApplication]
namespace = \\remotehost\root\cimv2
interval = 10
event_log_file = Application
disabled = 0</pre>
<p>The other aspect of WMI warrants more explanation.  To get data from WMI providers, you query them using WQL (WMI query language), which is a subset of SQL.  Simply specify a query, and all fields returned by the provider will be automatically collated as an event.  (Some queries return multiple results, and hence generate multiple events.)  An example query will be <em>select FreeMegabytes from Win32_PerfFormattedData_PerfDisk_LogicalDisk</em>, which will poll free disk space from all logical disk partitions on the system.</p>
<p>This is an example config setup that gets runtime information for all running processes on a local machine every 30 seconds:</p>
<pre>[WMI:LocalAllProcesses]
namespace = \\.\root\cimv2
interval = 30
wql = select * from Win32_PerfFormattedData_PerfProc_Process
disabled = 0</pre>
<p>With this you can easily chart memory usage by process.</p>
<p><a href='http://dev.splunk.com/2008/04/29/wmi-comes-to-splunk/wmi-memory-usage-by-process-name/' rel='attachment wp-att-398' title='WMI Memory Usage by Process Name'><img size="50%" src='http://blogs.splunk.com/devuploads/2008/04/wmi_mem_usage.png' alt='WMI Memory Usage by Process Name' /></a></p>
<p>The default install of the preview includes several preset performance queries.  If you look at <em>%SPLUNK_HOME%\etc\system\default\wmi.conf</em>, you will find three default config stanzas.  To see a list of what all is available for querying, google for "WMI classes" and browse the MSDN documentation.  There is tons of stuff that you can splunk, including detailed memory usage, network utilization, disk usage, detailed process runtime information.  Also, take a look at the <a href="http://www.splunk.com/doc/preview/PreviewWMI">WMI documentation</a>.</p>
<p>Happy Splunking with WMI.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/GX46LKFxTrQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/igor/2008/04/29/wmi-comes-to-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ledio Ago: Splunk Windows Registry Monitor</title><link>http://feedproxy.google.com/~r/splunkdev/~3/onfl9omlBJ8/</link><comments>http://blogs.splunk.com/ledio/2008/04/28/splunk-windows-registry-monitor/</comments><pubDate>Mon, 28 Apr 2008 22:53:48 +0000</pubDate><dc:creator>Ledio Ago</dc:creator><description><![CDATA[Hey everyone, just wanted to let you know that a preview release of Splunk just left the docks.
http://www.splunk.com/index.php/preview
I want to introduce to you one the latest features for Windows Splunk - the monitoring of Windows registry in real time for activity/events, and the indexing and searching these events with Splunk.
While working on this we had [...]]]></description><content:encoded><![CDATA[<p>Hey everyone, just wanted to let you know that a preview release of Splunk just left the docks.</p>
<p><a href="http://www.splunk.com/index.php/preview">http://www.splunk.com/index.php/preview</a></p>
<p>I want to introduce to you one the latest features for Windows Splunk - the monitoring of Windows registry in real time for activity/events, and the indexing and searching these events with Splunk.</p>
<p>While working on this we had a few challenges:</p>
<p>First, there aren't any published win32 APIs that does this in user mode.  The best that you can do with win32 API is to poll the registry for certain registry key/hives, and you'll be notified when if the key or subkey of the hive has been changed.  Even when you get a notification for a change, you will not be told which key exactly has changed, you'll have to figure that out yourself .  </p>
<p>Second, scalability.  You can't possibly poll all of the registry in user mode for changes.  There are simply too many keys to query.</p>
<p>The solution is to write a device driver that hooks to the kernel and intercepts all registry events.  The driver bubbles up the events to the user mode for filtering and tagging, and finally pipe them to Splunk for indexing.  Obviously, this driver needs to be very stable and reliable, needs to scale to the point where if you want to monitor all of the events in the registry, and it should be able to handle the load.</p>
<p>With this preview release we launched the first version of the splunk-regmon tool.  The tool writes events to standard output, and using Splunk's ExecProcessor(popen).   Splunk is able to get these events and send them through the indexing pipeline.  A basic filtering is in place, hard coded for now to only monitor registry events related to changes - i.e. Create, Delete, Set, etc. Create type events are represented by "CreateKey" reg_event field, Delete by "DeleteKe" and all of the Set event eg: SetValueKey, are represented by SetKey reg_event field.  In our next release this filtering will be configurable.</p>
<p>Here is what a windows registry event looks like with Splunk:</p>
<p><a href='http://blogs.splunk.com/devuploads/2008/04/registry_event1.jpg' title='Registry Event'><img src='http://blogs.splunk.com/devuploads/2008/04/registry_event1.jpg' alt='Registry Event' /></a></p>
<p>Drop us a note and let us know what you think of this new feature and any concerns you may have, or ideas of how we can make it better.<br />
How would you use it and how it would be useful to you?</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/onfl9omlBJ8" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/ledio/2008/04/28/splunk-windows-registry-monitor/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Matt Green: On the off chance you need help with Windows</title><link>http://feedproxy.google.com/~r/splunkdev/~3/zV6tikmz3n0/</link><comments>http://blogs.splunk.com/matt/2008/04/24/on-the-off-chance-you-need-help-with-windows/</comments><pubDate>Thu, 24 Apr 2008 20:47:38 +0000</pubDate><dc:creator>Matt Green</dc:creator><description><![CDATA[Hello Internets,
As one of the splunkers responsible for answering the phone I&#8217;m going to use this space to talk about something near and dear to my hart &#8212; empowering my customers so they are able to figure out their own problems thereby allowing me read FARK all day long.
Since we recently released our Windows version [...]]]></description><content:encoded><![CDATA[<p>Hello Internets,</p>
<p>As one of the splunkers responsible for answering the phone I'm going to use this space to talk about something near and dear to my hart  -  empowering my customers so they are able to figure out their own problems thereby allowing me read FARK all day long.</p>
<p>Since we recently released our Windows version a bunch of the folks in the office have been trying to figure out how they do the things they do in a UNIX enviornment (like wget a file) in Windows.  I've been sharing some of my favorite Windows resources here at the office and figures the rest of you would probably like to know about them as well.</p>
<p><strong><a href="http://www.google.com/microsoft" target="blank">Google</a></strong><br />
Everyone seems to start here when they are looking for something.  Most however don't know that http://www.google.com/microsoft will restirct your search to Windows sites.  They also have these search sites for linux, bsd, and the mac.</p>
<p><strong><a href="http://technet.microsoft.com/en-us/sysinternals/default.aspx" target="blank">SysInternals</a></strong><br />
Mark and Bryce have created the ultimate coolection of free Windows utilities.  Simple executables that allow to get so many of the diagnostic/monitoring things that a UNIX admin takes for granted.  Some of my favorites (and especially useful in working with Splunk) in no particular order:</p>
<ul>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb897332.aspx" target="blank">AccessEnum</a><br />
Lets you see who has access to what.  This is really helpful when trying to figure out why Splunk isn't indexing one of your files.</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx" target="blank">Process Monitor</a><br />
Watch the registry, running process/thread/DLL, and file system usage in real-time</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb896649.aspx" target="blank">PS Tools</a><br />
A bunch of command-line utilities for listing the processes running, working with the event log, rebooting the machine, etc.</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb963907.aspx" target="blank">Active Directory Explorer</a><br />
Advanced viewer/editor for Actiive Directory.  This will be a godsend you are trying to configure Splunk to authenticate against your domain controller</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb897435.aspx" target="blank">WhoIS</a><br />
Doesn't do much in the way of troubleshooting Splunk, but who doesn't want to be able to see if ultramegaextrmeme.com is available and if not who the lucky owner is?  BTW it is available.</li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx" target="blank">TCPView for Windows</a><br />
Lets you see all the TCP and UDP endpoints on your system, including the local and remote addresses and state of TCP connections.</li>
</ul>
<p>Hope that helps you guys out.  All of you experienced Windows folks if you've got others out that there post to the comments.  If my jaw hits the desk when I click the link I will send you a Splunk <a href="http://www.flickr.com/photos/64249409@N00/2248376055">koozie</a>.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/zV6tikmz3n0" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/matt/2008/04/24/on-the-off-chance-you-need-help-with-windows/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Splunk for Virtualization</title><link>http://feedproxy.google.com/~r/splunkdev/~3/PvPVR36Kdnc/</link><comments>http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/</comments><pubDate>Thu, 27 Mar 2008 21:14:54 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[I&#8217;m looking for some help.
I&#8217;ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API&#8217;s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I&#8217;m curious [...]]]></description><content:encoded><![CDATA[<p>I'm looking for some help.<br />
I've built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API's to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I'm curious are there any splunk customers out there using VMWare or Xen? I'm looking for usecases so that i better understand how to configure the apps. I'd be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I'm trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.</p>
<p>Thanks<br />
e.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/PvPVR36Kdnc" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: The Splunk Python client library (part 1)</title><link>http://feedproxy.google.com/~r/splunkdev/~3/imPll30uyro/</link><comments>http://blogs.splunk.com/johnvey/2008/03/26/the-splunk-python-client-library-part-1/</comments><pubDate>Wed, 26 Mar 2008 22:40:23 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[
Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.


The easiest way to get started with the client library is to get into Splunk&#8217;s Python environment.  Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell [...]]]></description><content:encoded><![CDATA[<p>
Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.
</p>
<p>
The easiest way to get started with the client library is to get into Splunk's Python environment.  Locate your Splunk install directory (<code>/opt/splunk</code> by default), and start the python interactive shell that comes with Splunk:
</p>
<p><code># bin/splunk cmd python<br />
</code></p>
<p>
This will launch the interactive Python prompt, which starts off looking like this:
</p>
<p><code>Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)<br />
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin<br />
Type "help", "copyright", "credits" or "license" for more information.<br />
&amp;gt;&amp;gt;&amp;gt;<br />
</code></p>
<h2>Starting a search</h2>
<p>Import the Splunk modules:
</p>
<p><code>import splunk.auth<br />
import splunk.search as se<br />
</code></p>
<p>
If you have installed Splunk with the default settings, then your hostpath is <code>https://localhost:8089</code>.  The client library knows this default, so you can authenticate directly by providing a username and password:
</p>
<p><code>key = splunk.auth.getSessionKey('admin','changeme')<br />
</code></p>
<p>
The getSessionKey method automatically caches the session key in the current interactive session, so you don't have to pass it along to subsequent methods.  In a production implementation, or if you are connecting to multiple servers, you'll need to keep track of separate session keys.
</p>
<p>
If your server is on a different hostname or port, then you need to first update the session defaults:
</p>
<p><code>splunk.mergeHostPath('splunk_hostname:12000', True)<br />
key = splunk.auth.getSessionKey('admin','changeme')<br />
</code></p>
<p>
The <code>mergeHostPath</code> method takes host information in many different forms:
</p>
<ul>
<li>hostname</li>
<li>hostname:port</li>
<li>https://hostname</li>
<li>http://hostname:port</li>
</ul>
<p>
Next, start a search:
</p>
<p><code>job = se.dispatch('search error')<br />
</code></p>
<p>
This creates a search job handle object <code>job</code> and start a running search on the server for events that contain the term "error". If you are connecting to multiple servers, then you'll also need to provide <code>hostPath</code> and <code>sessionKey</code> parameters as well. This handle is keyed off of the search job ID that is generated by the server, and is available via:
</p>
<p><code>job.id<br />
</code></p>
<p>
With this ID, you can always use your web browser to check on the status of a particular job by opening up:
</p>
<p><code>https://localhost:8089/services/search/jobs/12345</code></p>
<p>where 12345 is the ID that you just generated.</p>
<p>
There are a few properties on the SearchJob object that will be of immediate use:
</p>
<ul>
<li><code>job.isDone</code> - a boolean value that indicates if the search has completed</li>
<li><code>job.count</code> - the number of events that have been matched against the search</li>
<li><code>job.cursorTime</code> - the current position of the search cursor; when dispatching a search, the cursor moves in a reverse chronological order</li>
</ul>
<h2>Working with search results</h2>
<p>The raw events are the original event data that were indexed by Splunk, according to the data input rules.  They are available as an interable container object:
</p>
<p><code>job.events<br />
</code></p>
<p>This object works just like a list, and you can iterate and slice it to obtain events.  The events are stored in reverse chronological order.
</p>
<p><code>for x in job.events:<br />
    print x<br />
</code></p>
<p>
This code will iterate over every event returned in the search and print out the raw text, which could be every event in your index if you so choose.  The iterator will begin returning data as soon as it receives the first event, and will continue until the <code>isDone</code> property is <code>True</code>.
</p>
<p>You can also retrieve specific rows of data using the standard python slice operator:
</p>
<p><code>job.events[2] # returns the 3rd event in the search results<br />
job.events[2:10]	# returns events 3 through 10 as a list<br />
job.events[-1]		# returns the last event in the results<br />
</code></p>
<p>
The items returned by iterating or slicing are actually <code>Result</code> objects that have additional properties:
</p>
<ul>
<li><code>job.events[0].raw</code> - the raw event text (the same value as <code>print job.events[0]</code>)</li>
<li><code>job.events[0].time</code> - the event timestamp, as a <a href="http://docs.python.org/lib/datetime-datetime.html">datetime.datetime</a> object</li>
<li><code>job.events[0].fields</code> - a dict of all the fields associated with the event</li>
</ul>
<p>
For example if you wanted to see the <code>host</code> field for an event:
</p>
<p><code>job.events[0].fields['host']<br />
</code></p>
<p>Or if you wanted to see all of the host entries for each event:</p>
<p><code>for x in job.events:<br />
    print x.fields['host']<br />
</code></p>
<p>Or alternatively, in shorthand:</p>
<p><code>for x in job.events:<br />
    print x['host']<br />
</code></p>
<p>If you want to print out a human-readable timestamp for events that came from the 'firewall' sourcetype:</p>
<p><code>for x in job.events:<br />
    if x['sourcetype'] == 'firewall':<br />
         print x.time.ctime()<br />
</code></p>
<p>
When you are finished with the search job, remove it from the server by calling:
</p>
<p><code>job.cancel()<br />
</code></p>
<p>Otherwise, the job will persist on disk until the specified timeout (TTL), which is 24 hours by default.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/imPll30uyro" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2008/03/26/the-splunk-python-client-library-part-1/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Using the Atom Feed Format in Enterprise Software</title><link>http://feedproxy.google.com/~r/splunkdev/~3/3BKSmWpro4M/</link><comments>http://blogs.splunk.com/johnvey/2008/03/06/using-the-atom-feed-format-in-enterprise-software/</comments><pubDate>Thu, 06 Mar 2008 23:32:45 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[
XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments.  However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema,  resulting in lots of custom code to interface with various systems.  Lots of custom code [...]]]></description><content:encoded><![CDATA[<p>
XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments.  However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema,  resulting in lots of custom code to interface with various systems.  Lots of custom code leads to brittleness, and brittleness leads to frustration.  The key to salvation lies in standardization.
</p>
<p>
Enter the <a href="http://atomenabled.org/">Atom standard</a>: a standards-track schema that defines a generic collection/item container format in XML.  Most people equate Atom to an RSS competitor, which is true, but that only covers half of what it does.  The <a href="http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-09.html">Atom Publishing Protocol</a> is a well-defined protocol for performing CRUD (Create, Read, Update, Delete) operations on items over HTTP.  The <a href="http://atompub.org/2005/01/10/draft-ietf-atompub-format-04.html">Atom Syndication Format</a>, which is the most commonly used portion, defines the XML schema used to deliver data during a Read operation.  Atom was spearheaded by Sam Ruby, and is now back by people like Brad Fitzpatrick, Tim Bray, Jeremy Zawodny, Mark Pilgrim, and is heavily implemented by Google.
</p>
<p>
Like most software systems, the majority of Splunk's internal entities can be loosely viewed as a collection of similar items.  The requested searches, configuration information, saved searches, users, roles  -  all just collections.  So instead of creating five separate XML schemas for each of these collections that perfectly describe their contents, I chose Atom to serve as a single generic container to describe all of the entities.  This kind of reuse is echoed by <a href="http://blogs.msdn.com/pathelland/">Pat Helland</a> of Amazon, who gives a great talk on relating the rise of the industrial age to standardization, and Tim Bray (Mr. XML himself), who <a href="http://www.tbray.org/ongoing/When/200x/2006/01/09/On-XML-Language-Design">advocates against creating your own XML</a> unless absolutely necessary.
</p>
<p>
The benefit of sticking to a standard is that there is a much greater chance that external developers already know exactly how to consume your data with very little work.  Not only are language-level <a href="http://atomenabled.org/everyone/atomenabled/index.php?c=7">Atom parsers available everywhere</a>, but entire applications have been specifically built to consume Atom.  For instance, here's a screenshot of the <a href="http://www.newsfirerss.com/">NewsFire</a> feed reader displaying all of the searches that exist on my local Splunk server:
</p>
<p><img src="http://farm3.static.flickr.com/2392/2315314656_4c381a489c_o.png" alt="search jobs in a feed reader" /></p>
<p>
All I had to do was to supply a URI and login to NewsFire, and then it took care of the rest.  No XSLT, XPath, or custom DOM iteration necessary; it just works.  As far as I know, Splunk is one of a handful of enterprise companies that has integrated Atom at such a core level.  Hopefully, for you it means that there is one less bucket of tag soup you have to deal with, and one better product that you enjoy using.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/3BKSmWpro4M" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2008/03/06/using-the-atom-feed-format-in-enterprise-software/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Jason Gatt: Splunk Replay: Search results in motion</title><link>http://feedproxy.google.com/~r/splunkdev/~3/FFGJ1T1fzP4/</link><comments>http://blogs.splunk.com/jgatt/2008/03/06/splunk-replay-search-results-in-motion/</comments><pubDate>Thu, 06 Mar 2008 20:48:12 +0000</pubDate><dc:creator>Jason Gatt</dc:creator><description><![CDATA[Inspired by <a href="http://fudgie.org">glTail.rb</a> and <a href="http://labs.digg.com/stack">Digg Lab's Stack</a>, <a href="http://www.splunklabs.com/replay/replay.html">Splunk Replay</a> is an animated data visualization that "replays" search results as a simulated event stream.  The simulation displays events at a rate proportional to the times at which the events originally occurred.

Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart.  Upon landing in the column chart, one of the event's fields is output in a readable format below the chart.  Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones.  Rolling your mouse over any column displays the field values for that column.

<a href='http://dev.splunk.com/wp-content/uploads/2008/03/replay.mov' title='replay1.mov'><img style="color: #666" width="410" src="http://dev.splunk.com/wp-content/uploads/2008/03/replay.jpg" height="310" /></a>]]></description><content:encoded><![CDATA[<p>Inspired by <a href="http://fudgie.org">glTail.rb</a> and <a href="http://labs.digg.com/stack">Digg Lab's Stack</a>, <a href="http://www.splunklabs.com/replay/replay.html">Splunk Replay</a> is an animated data visualization that "replays" search results as a simulated event stream.  The application displays events at a rate proportional to the times at which the events originally occurred.</p>
<p>Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart.  Upon landing in the column chart, one of the event's fields is output in a readable format below the chart.  Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones.  Rolling your mouse over any column displays the field values for that column.</p>
<p><a href='http://blogs.splunk.com/devuploads/2008/03/replay.mov' title='replay1.mov'><img style="color: #666" width="410" src="http://blogs.splunk.com/devuploads/2008/03/replay.jpg" height="310" /></a></p>
<p>Replay currently consumes csv files and is configurable through an xml file.  The <a href="http://www.splunklabs.com/replay/replay.html">current demo</a> charts twikipage edits split by twikiuser (both sorted alphabetically) and outputs truncated raw events below the chart.  The simulated event stream is running at a rate 2000 times real time.</p>
<p>I'm currently working on getting Replay hooked directly to Splunk's API and building out interface elements so that it can be configured visually. </p>
<p>You can check out the wiki page on Replay over at Splunk's <a href="http://code.google.com/p/splunk-flash">developers wiki</a>.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/FFGJ1T1fzP4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/jgatt/2008/03/06/splunk-replay-search-results-in-motion/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Exploring Splunk’s REST API</title><link>http://feedproxy.google.com/~r/splunkdev/~3/5SgADQOwGAU/</link><comments>http://blogs.splunk.com/johnvey/2008/03/03/exploring-splunks-rest-api/</comments><pubDate>Mon, 03 Mar 2008 20:15:45 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[
Splunk 3.2 is available for download!  This release is one of our biggest so far, representing a tremendous amount of effort by our engineering team, and is a product that I&#8217;m proud to stand behind.  As I mentioned in my last post about our push for the Splunk Platform, a central tenet is [...]]]></description><content:encoded><![CDATA[<p>
Splunk 3.2 is available for download!  This release is one of our biggest so far, representing a tremendous amount of effort by our engineering team, and is a product that I'm proud to stand behind.  As I mentioned in my last post about our push for the <a href="http://dev.splunk.com/2008/01/31/standing-on-our-own-platform/">Splunk Platform</a>, a central tenet is to make a compelling product that developers will not only understand, but also enjoy using.  While Dr. LogLogic rambles on about how <a href="http://chuvakin.blogspot.com/2008/02/welcome-to-platform-club.html">catering to developers sucks</a>, we know that developers are a huge part of our user base (drop by the #splunk channel on EFNet sometime) and we will continue to make Splunk as flexible and extensible as possible.
</p>
<p>
With 3.2, we have begun moving some of Splunk's core services over to a proper REST API.  Now, for those of you who have already been using the REST API in 3.1, the new API in 3.2 and beyond is distinctly different, and is intended to replace any older versions.  Therefore, the REST API of version 3.1 and before will now be referred to as the UI API, and the term "REST API" will refer to the new API that I'm covering in this post.
</p>
<p>
Before I dive into the details though, I'd like to clarify the usage of "REST" and what I mean when I speak of it.  First of all, REST is <strong>not</strong> a protocol or standard.  There is no RFC, or ISO specification on what constitutes REST; it is a philosophy about the relationship between entities in a software system and the interface to interact with those entities.  Roy Fielding's <a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm">original thesis</a> named it <em>Representational State Transfer</em>, which when put into practice means that URIs should convey meaning in a durable manner.  In essence, REST emphasizes the "what" of a system rather than the "how".  In comparison, SOAP interfaces are based on codified standards that dictate the communication protocol.  Lots more information on REST can be <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">found on Wikipedia</a> or in book form as <a href="http://www.amazon.com/gp/product/0596529260?ie=UTF8&amp;#038;tag=bookson-20&amp;#038;linkCode=as2&amp;#038;camp=1789&amp;#038;creative=9325&amp;#038;creativeASIN=0596529260">RESTful Web Services</a><img src="http://www.assoc-amazon.com/e/ir?t=bookson-20&amp;#038;l=as2&amp;#038;o=1&amp;#038;a=0596529260" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> by Leonard Richardson.
</p>
<h3>The New Search API</h3>
<p>
Splunk's new search interface allows for multiple searches to be scheduled concurrently, and for the results to be retrieved asynchronously.  Assuming that you've installed Splunk using the default settings, you can see all of your search jobs by pointing your browser to:
</p>
<p><code>https://localhost:8089/services/search/jobs</code></p>
<p>This returns an <a href="http://atomenabled.org/">Atom feed</a> of all the search jobs present in the server.  Each job has an ID, and so the URI for the Atom entry of a search job of ID=1234 can be found at:
</p>
<p><code>https://localhost:8089/services/search/jobs/1234</code></p>
<p>
Following that RESTian schema, each facet of a search job can be found as a sub-endpoint as well.  For instance, the events, results, timeline, and summary data for each search can be found at:
</p>
<p><code>https://localhost:8089/services/search/jobs/1234/events<br />
https://localhost:8089/services/search/jobs/1234/results<br />
https://localhost:8089/services/search/jobs/1234/timeline<br />
https://localhost:8089/services/search/jobs/1234/summary<br />
</code></p>
<p>
Each of those endpoints returns data in XML format by default, but can be switched over to JSON or raw text format.
</p>
<p>
The key to implementing a successful REST API lies in using the HTTP protocol to its fullest potential.  Instead of adding a new search via something like <code>/search/add_search</code>, we simply POST to the parent <code>/services/search/jobs</code> endpoint.  Instead of adding an extra <code>/search/delete_search</code> endpoint to delete a job, you issue an HTTP DELETE command directly on the <code>/services/search/jobs/1234</code> endpoint.  By treating each endpoint as a direct entity mapping, we simplify comprehension and dramatically reduce the total number of discrete endpoints.
</p>
<h3>The configuration API</h3>
<p>The same model applies to our configuration system as well.  Splunk stores its configuration in <em>conf</em>-style text files, using traditional stanza-separate key/value pairs.  For example, the <code>server.conf</code> looks like:
</p>
<p><code>[httpServer]<br />
atomFeedStylesheet = /static/atom.xsl<br />
max-age = 3600<br />
follow-symlinks = false<br />
</code></p>
<p>
To access this file from the API, you would first browse to:
</p>
<p><code>https://localhost:8089/services/properties/<strong>server</strong></code></p>
<p>
This endpoint returns an Atom feed of all of the stanzas contained in the file.  To view all of the key/value pairs in the <code>[httpServer]</code> stanza, browse to:
</p>
<p><code>https://localhost:8089/services/properties/server/<strong>httpServer</strong></code></p>
<p>
To read a single key value like <em>max-age</em>, browse to:
</p>
<p><code>https://localhost:8089/services/properties/server/httpServer/<strong>max-age</strong></code></p>
<p>
To change that value, issue an HTTP PUT to the same endpoint.  To add a new key, issue a POST to the stanza-level endpoint, or issue a PUT directly onto the new key name.
</p>
<p>
The advantages of exposing everything via HTTP are obvious when it comes to integration and remote management.  Every modern programming environment speaks HTTP, which means you can programmatically interact with Splunk from wherever you want.  Everyone also uses a web browser, which means that probing the API is as easy as browsing the web.
</p>
<p>
Even with a simple API, there's no reason for developers to recreate a language-specific library to access Splunk so we're working on releasing a few downloadable libraries for use in Python, .NET, Java, and Perl.  Check the <a href="http://code.google.com/p/splunk-labs/">Splunk Labs</a> page for more information about those.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/5SgADQOwGAU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2008/03/03/exploring-splunks-rest-api/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Splunk2LCD : Display your Alerts on an LCD</title><link>http://feedproxy.google.com/~r/splunkdev/~3/FkV5L6tc0Xg/</link><comments>http://blogs.splunk.com/mark/2008/02/22/splunk2lcd-display-your-alerts-on-an-lcd/</comments><pubDate>Sat, 23 Feb 2008 01:08:18 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[This morning I got a nice little LCD from <a href="http://www.crystalfontz.com" target="_blank">Crystalfontz</a> that allows me to connect to it via the open source project <a href="http://www.lcdproc.org" target="_blank">lcdproc</a>.  After a bit of compiling and installing, LCDproc (which runs natively on linux, darwin (osx) and most other unix distros) connects to any serial, parallel or USB LCD device. In this case, the Crystalfontz LCD is  4 line by 20 character display.

<a href="http://dev.splunk.com/wp-content/uploads/2008/02/p1000621.JPG" title="Splunk2LCD"><img src="http://dev.splunk.com/wp-content/uploads/2008/02/p1000621.JPG" alt="Splunk2LCD" /></a>

Once configured and connected, you start the server and accept connections.

I then grabbed the IO-LCDproc perl module and modified it to display to the LCDproc server. You can get the IO-LCDproc through CPAN.
]]></description><content:encoded><![CDATA[<p>This morning I got a nice little LCD from <a href="http://www.crystalfontz.com" target="_blank">Crystalfontz</a> that allows me to connect to it via the open source project <a href="http://www.lcdproc.org" target="_blank">lcdproc</a>.  After a bit of compiling and installing, LCDproc (which runs natively on linux, darwin (osx) and most other unix distros) connects to any serial, parallel or USB LCD device. In this case, the Crystalfontz LCD is  4 line by 20 character display.</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/p1000621.JPG" title="Splunk2LCD"><img src="http://blogs.splunk.com/devuploads/2008/02/p1000621.JPG" alt="Splunk2LCD" /></a></p>
<p>Once configured and connected, you start the server and accept connections.</p>
<p>I then grabbed the IO-LCDproc perl module and modified it to display to the LCDproc server. You can get the IO-LCDproc through CPAN.</p>
<p>Here is the code that would go in your $SPLUNK_HOME/bin/scripts directory</p>
<p>[source:python]<br />
#!/use/bin/perl -w<br />
use IO::LCDproc;<br />
use IO::Socket;<br />
use strict;</p>
<p>&amp;amp;usage if (! $ARGV[0]);</p>
<p>my $client = IO::LCDproc::Client-&amp;gt;new(host =&amp;gt; "localhost", name =&amp;gt; "MYNAME", port =&amp;gt; "13666&amp;#8243;);</p>
<p>my $screen = IO::LCDproc::Screen-&amp;gt;new(name =&amp;gt; "screen");</p>
<p>my $title = IO::LCDproc::Widget-&amp;gt;new( name =&amp;gt; "date", type =&amp;gt; "title");</p>
<p>my $first = IO::LCDproc::Widget-&amp;gt;new(<br />
name =&amp;gt; "first", align =&amp;gt; "center", type =&amp;gt; "string", xPos =&amp;gt; 1, yPos =&amp;gt; 2,<br />
data =&amp;gt; "test");<br />
my $second = IO::LCDproc::Widget-&amp;gt;new(<br />
name =&amp;gt; "second", align =&amp;gt; "center", type =&amp;gt; "string", xPos =&amp;gt; 1, yPos =&amp;gt; 3<br />
);<br />
my $third = IO::LCDproc::Widget-&amp;gt;new(<br />
name =&amp;gt; "third", align =&amp;gt; "center", type =&amp;gt; "string", xPos =&amp;gt; 1, yPos =&amp;gt; 4<br />
);</p>
<p>$client-&amp;gt;add ( $screen );<br />
$screen-&amp;gt;add ( $title, $first, $second, $third );<br />
$client-&amp;gt;connect() or die "Cannot Connect: $!";<br />
$client-&amp;gt;initialize();</p>
<p>$title-&amp;gt;set( data =&amp;gt; "Splunk2LCD" );<br />
$first-&amp;gt;set( data =&amp;gt; "$ARGV[1]" );<br />
$second-&amp;gt;set( data =&amp;gt; "$ARGV[4]" );<br />
$third-&amp;gt;set( data =&amp;gt; "$ARGV[5]" );</p>
<p>sleep 5;<br />
exit 1;</p>
<p>sub usage {<br />
  print &amp;lt;&amp;lt;USAGE;<br />
  LCDproc Client for Splunk<br />
  Mark Cohen<br />
  Usage: ./splunk2lcd2.pl ARGV0 ARGV1 ARGV2<br />
  USAGE<br />
  exit 1;<br />
}<br />
[/source]</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/FkV5L6tc0Xg" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2008/02/22/splunk2lcd-display-your-alerts-on-an-lcd/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: You want a platform?  We got your platform right here, buddy.</title><link>http://feedproxy.google.com/~r/splunkdev/~3/SNrJICDm-1Y/</link><comments>http://blogs.splunk.com/kordless/2008/02/22/you-want-a-platform-we-got-your-platform-right-here-buddy/</comments><pubDate>Fri, 22 Feb 2008 23:35:35 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[There has been a lot of talk about the Splunk Platform of late, but what exactly does it mean when we say we have a platform?  I figured this would be an interesting question to spring upon unsuspecting members of the development team, and here's what they (and I) had for our answers:

[qt:/wp-content/uploads/2008/02/splunk_as_platform_large.mov /wp-content/uploads/2008/02/what_is_poster.mov 625 368]

Browsing over on Wikipedia, <a href="http://en.wikipedia.org/wiki/Platform_%28computing%29">one excerpt</a> states that <em>"a platform describes some sort of hardware architecture or software framework"</em>, and the description for a <a href="http://en.wikipedia.org/wiki/Software_framework">software framework</a>, says it <em>"may include support programs, code libraries, a scripting language, or other software to help develop and glue together the different components of a software project"</em>.]]></description><content:encoded><![CDATA[<p>There has been a lot of talk about the Splunk Platform of late, but what exactly does it mean when we say we have a platform?  I figured this would be an interesting question to spring upon unsuspecting members of the development team, and here's what they (and I) had for our answers:</p>
<div id="vvq4a57e09859eba" class="vvqbox vvqquicktime" style="width:400px;height:300px;"><a href="http://blogs.splunk.com/devuploads/2008/02/splunk_as_platform_large.mov">http://blogs.splunk.com/devuploads/2008/02/splunk_as_platform_large.mov</a></div>
<p>Browsing over on Wikipedia, <a href="http://en.wikipedia.org/wiki/Platform_%28computing%29">one excerpt</a> states that <em>"a platform describes some sort of hardware architecture or software framework"</em>, and the description for a <a href="http://en.wikipedia.org/wiki/Software_framework">software framework</a>, says it <em>"may include support programs, code libraries, a scripting language, or other software to help develop and glue together the different components of a software project"</em>.</p>
<p>A platform can be considered as a type of framework - one which helps developers write software faster by a) giving them the tools to develop against it, and b) transparently dealing with the under-the-hood, nitty-gritty work necessary when dealing with difficult problems.  Difficult problems like indexing and searching gigabytes upon gigabytes of event data, for example.</p>
<p>Well, that's exactly what the Splunk Platform does for developers.  It provides resources, examples, and SDKs for developing a variety of applications around the robust Splunk engine, and it provides a launching point for domain specific development, from availability and security, to business intelligence and compliance.</p>
<p>BTW, this isn't something we're just waving our hands around about and saying "look at this white paper, isn't this a nice idea?".  Nope.  Platform is here, and it's here today, with links to real code, real content, and real resources for the developers looking to write the next great idea.</p>
<p>Here's how to get started writing your first application against the Splunk Platform:</p>
<ol>
<li>1. <a href="http://www.splunk.com/index.php/preview">Download a Preview</a> of Splunk that has a brand new shiny REST-based API built into it.</li>
<li>2. Head on over to the <a href="http://code.google.com/p/splunk-labs/">developer's wiki</a> and start digging around in the API howtos.</li>
<li>4. Download the new <a href="http://code.google.com/p/splunk-net-sdk/">.NET SDK</a> from its Google Code project page.</li>
<li>4. Join the any of the projects and start contributing code/content.</li>
<li>5. Join the new <a href="http://groups.google.com/group/splunk-labs">Splunk Labs</a> list and start interacting (asynchronously) with our developers.</li>
<li>6. Hop on #splunk on IRC and chat with us in real time.</li>
</ol>
<p>We'll be continuing to add content and resources to the Platform effort, and we encourage your participation in the development community as it forms.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/SNrJICDm-1Y" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2008/02/22/you-want-a-platform-we-got-your-platform-right-here-buddy/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ledion Bitincka: Delimiter base KV extraction - advanced</title><link>http://feedproxy.google.com/~r/splunkdev/~3/tdq09BTk4K8/</link><comments>http://blogs.splunk.com/lbitincka/2008/02/22/delimiter-base-kv-extraction-advanced/</comments><pubDate>Fri, 22 Feb 2008 23:30:06 +0000</pubDate><dc:creator>Ledion Bitincka</dc:creator><description><![CDATA[If you&#8217;ve read my previous post on delimiter based KV extraction, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the &#8220;advanced&#8221; cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.
Observations
1. [...]]]></description><content:encoded><![CDATA[<p>If you've read my previous post on <a href="http://dev.splunk.com/2008/02/12/delimiter-based-key-value-pair-extraction/">delimiter based KV extraction</a>, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the "advanced" cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.</p>
<p><u>Observations</u><br />
1. Header-body. Some applications, for different reasons, choose to format their log files using a header and a body section. The header usually describes the way the fields are organized in each logged event, while the body consists of logged events, usually one per line, with field values delimited as described in the header. W3C, CSV etc come to mind, see examples<br />
2. Single-delimiter. Other applications choose to use a single delimiter to delimit keys from values and values from keys, while this is not very common it's been observed in the field.</p>
<p><u>Data Examples</u><br />
The following header-body sample, as you can probably guess, is  from an exchange server. There is a header section which among other things has the list of field names, delimited from each other using the delimiter used to delimit values in the body section, in this case a tab character is used (even though our blogging platform chooses to mangle tabs to spaces - gotta love it !!!).<br />
<code><br />
# Message Tracking Log File<br />
# Exchange System Attendant Version 6.5.7638.1<br />
# Fields: time	client-ip	cs-method	sc-status<br />
14:13:11	10.1.1.9	HELO	250<br />
14:13:13	10.1.1.9	MAIL	250<br />
14:13:19	10.1.1.9	RCPT	250<br />
14:13:29	10.1.1.9	DATA	250<br />
14:13:31	10.1.1.9	QUIT	240<br />
</code></p>
<p>The following example shows how a single-delimiter can be used to list fields, it is pretty easy for us, as humans, to recognize the key value pairs:<br />
<code><br />
"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"<br />
</code></p>
<p><u>Enabling header-body kv/extract </u><br />
The delimiter based KV extraction solves the header-body problem by adding the capability to assign field names to extracted values by doing single-level tokenization/splitting (ie single delimiter) instead of the normal two-layered one described <a href="http://dev.splunk.com/2008/02/12/delimiter-based-key-value-pair-extraction/">earlier</a>. Unfortunately, however, this is <strong>only</strong> available through transforms.conf* and it requires manual specification of the field names (no automatic field name detection). To this end, we introduce another transforms.conf configuration variable, defined as follows:<br />
<code><br />
FIELDS = &amp;lt;quoted string comma/space separated list><br />
- List of names to associate with each extracted field value. The first entry is associated with the first<br />
field value, the second with the second value and so on...<br />
 <br />
Example from above data:<br />
FIELDS= "time", "client-ip", "cs-method", "sc-status"<br />
</code></p>
<p>Thus to enable header-body KV extraction one needs to specify <u>one delimiter and a list of fields to attach to each extracted value</u>. Let's walk through the MS Exchange sample data: (1) we know the field delimiter is the tab character and (2) the field list, in their correct order, is in the header of the file all we have to do is quote the field names. The configuration stanza in transforms.conf should thus look like this:<br />
<code><br />
....transforms.conf....<br />
[exchange]<br />
DELIMS = "\t"<br />
FIELDS  = "time", "client-ip", "cs-method", "sc-status"<br />
</code></p>
<p>To apply this transformation you can then run <em>".... | extract exchange reload=t auto=f| ...."</em>, there's no need to restart the server after editing the transfroms.conf as long as "reload=t" is specified in extract (btw auto=f turns off automatic KV extraction)</p>
<p>The results of this transformation ,on one of the events, would then be:<br />
<code><br />
"14:13:11	10.1.1.9	HELO	250"<br />
 <br />
time=14:13:11<br />
client_ip=10.1.1.9<br />
cs_method=HELO<br />
cs_status=250<br />
</code></p>
<p>Easy huh!? Try it in your data, we'd love to hear back ......</p>
<p>*The reason why this is only available through the configuration is that amount of configuration information needed.</p>
<p><u>Enabling single-delimiter kv/extract </u><br />
There's yet another trick in the delimiter KV extraction - the single-delimiter extraction. Single delimiter extraction pairs extracted field values into key=value as follows: value1=value2, value3=value4 and so on... To enable this extraction via the command line set <em>kvdelim</em> and <em>pairdelim</em> to the same value, for the above example data the <em>extract</em> command should look as follows:<br />
<code><br />
.... | extract kvdelim=" " pairdelim=" " auto=f | ....<br />
</code></p>
<p>To enable single-delimiter extraction via transforms.conf you can either specify one delimiter or two identical delimiters in the <em>DELIMS</em> config variable, thus the following two transforms.conf stanzas are equivalent to each other and to the above command:<br />
<code><br />
....transforms.conf....<br />
[single-delim-1]<br />
DELIMS = " "<br />
 <br />
[single-delim-2]<br />
DELIMS = " ", " "<br />
</code></p>
<p>The results of these extractions for our sample data would be:<br />
<code><br />
"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"<br />
 <br />
url=http://splunk.com<br />
referer=http://dev.splunk.com<br />
ip=10.10.10.10<br />
</code></p>
<p>NOTE: do not specify a FIELDS variable for the single-delimiter extraction because that will enable header-body extraction.</p>
<p>Thoughts, ?, ideas, comments are always welcomed....</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/tdq09BTk4K8" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/lbitincka/2008/02/22/delimiter-base-kv-extraction-advanced/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ledion Bitincka: Delimiter based key-value pair extraction</title><link>http://feedproxy.google.com/~r/splunkdev/~3/d2creUmD-6o/</link><comments>http://blogs.splunk.com/lbitincka/2008/02/12/delimiter-based-key-value-pair-extraction/</comments><pubDate>Tue, 12 Feb 2008 20:26:28 +0000</pubDate><dc:creator>Ledion Bitincka</dc:creator><description><![CDATA[As described in my previous post, key-value pair extraction (or more generally structure extraction) is a crucial first step to further data analysis. While automatic extraction is highly desirable, we believe empowering our users with tools to apply their domain knowledge is equally important. To this end, this post introduces one of the simplest forms [...]]]></description><content:encoded><![CDATA[<p>As described in my previous <a href="http://dev.splunk.com/2008/01/18/key-value-pair-extraction-definition-examples-and-solutions/">post</a>, key-value pair extraction (or more generally structure extraction) is a crucial first step to further data analysis. While automatic extraction is highly desirable, we believe empowering our users with tools to apply their domain knowledge is equally important. To this end, this post introduces one of the simplest forms of key-value pair extractions (KV-extraction) - delimiter based extraction. </p>
<p><u>Observation</u></p>
<p>Most logged events usually contain a list of key-value pairs (e.g. attribute list, method call values etc) in a context-dependent well-defined format. An example of well-defined format: " key-value pairs are separated from each other using ';' while the key is separated from the value using '=' ". More generally, well defined attribute listing formats are not confined to logging, they're part of every event-driven, flexible attribute order, application: e.g. URL get parameter list, HTTP request/response headers,  email headers etc... In most application the delimiters are single characters which are least likely to be part of the key or value, whenever the key/value contains any of the delimiters it is normally enclosed in literal-defining characters usually double-quotes (").</p>
<p><u>Definition: delimiter based KV extraction</u><br />
Let's first define three character classes:<br />
1. [pairdelim] - non-empty list of characters used to separate <strong>key value pairs from each other</strong>. (chars after value, before next key)<br />
2. [kvdelim]   - non-empty list of characters used to separate <strong>the key from the value</strong>. (chars after key, before next value)<br />
3. [quoter]     - list of characters used to enclose a literal - currently *only* quotes are supported and this variable is not configurable</p>
<p>Thus we can formally define a key-value pair list as follows:<br />
<code><br />
kvlist  = &amp;lt;key>[kvdelim]&amp;lt;value>([pairdelim]&amp;lt;key>[kvdelim]&amp;lt;value>)*<br />
key     = &amp;lt;string>|&amp;lt;quoter>&amp;lt;string>&amp;lt;quoter><br />
value  = &amp;lt;string>|&amp;lt;quoter>&amp;lt;string>&amp;lt;quoter><br />
quoter = "<br />
</code></p>
<p>Thus, delimiter KV-extraction can be achieved by a two layer tokenization/splitting process:<br />
1. Split on the pair delimiter to extract candidate KV pairs<br />
2. Split on key-value delimiter to separate key from value</p>
<p><u>Examples:</u><br />
1. URL - the following is an example of finding what are the delimiters for parameters listed in the query part of the URL. Note, since the query starts after the '?' character - the '?' qualifies as a key-value pair delimiter since it is before the first key.<br />
<code><br />
data = "http://usasearch.gov/search?input-form=firstgov&amp;#038;v%3Aproject=firstgov&amp;#038;query=splunk+it&amp;#038;affiliate=uspto&amp;#038;x=0&amp;#038;y=0"<br />
-------------<br />
pairdelim is "?&amp;#038;"  - # parameters in the query are separated by '&amp;#038;', the query starts after '?'<br />
kvdelim is "="   - #  variable names are separated from their values using '='<br />
</code></p>
<p>2. HTTP response header:<br />
<code><br />
data = "GET / HTTP/1.1<br />
Host: dev.splunk.com<br />
Connection: close<br />
User-Agent: Web-sniffer/1.0.25 (+http://web-sniffer.net/)<br />
Accept-Encoding: gzip<br />
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5<br />
Accept-Language: en-us,en;q=0.5<br />
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7<br />
Referer: http://web-sniffer.net/"<br />
-------------<br />
pairdelim is "\r\n"<br />
kvdelim is ":"<br />
</code></p>
<p>That was easy, why don't you try it on the following data?<br />
Note: data_hard is all one line however our blogging software sucks at displaying long lines<br />
<code><br />
data_easy = "May 4 14:47:28 gwrk1 sshd(pam_unix)[4572]: 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=test.abc.net"<br />
data_hard = "loc=544078|action=encrypt|i/f_dir=inbound|i/f_name=eth4c0|__policy_id_tag=product=VPN-1 &amp;#038; FireWall-1[db_tag={xxxx-6123-40CA-XXXX-9620355xxxxx};date=1190818929;policy_name=NYC-BellSouth NY]|src=10.100.0.50|dst=10.104.0.21|proto=icmp|rule=9|scheme:=IKE"<br />
</code></p>
<p><u>Delimiter based KV extraction as part of kv/extract command</u><br />
OK, great! Now that you know what delimiter based KV extraction is and how to find the list of characters that are used as pair delimiters (pairdelim) and key-value delimiters (kvdelim), let's look at how to instruct splunk to perform this type of KV extraction. Well, all you need to do, is add the delimiters as arguments to kv/extract, as follows:<br />
<code><br />
..... | kv pairdelim="?&amp;#038;" kvdelim="=" | .....<br />
or<br />
..... | extract pairdelim="?&amp;#038;" kvdelim="=" | .....<br />
</code></p>
<p><u>Configuration for automated delimiter based KV-extraction</u><br />
transforms.conf is the key-value extraction configuration file. Delimiter-based KV extraction adds another configuration variable to the transforms.conf vocabulary called DELIMS - yes you guessed right this is where we'll specify the pairdelim and the kvdelim. The format of DELIMS is as follows:</p>
<p><code><br />
DELIMS = &amp;lt;pairdelim>, &amp;lt;kvdelim></p>
<p></code></p>
<p>Example:<br />
<code><br />
in ...bundles/local/transforms.conf<br />
.....<br />
# this is equivalent to ..|kv pairdelim="?&amp;#038;" kvdelim="=" |...<br />
[my_extraction]<br />
DELIMS = "?&amp;#038;", "="<br />
.....<br />
</code></p>
<p>You can then use the newly created transform just like any other transform. To remind the forgetful, you can do:<br />
1. ..... | kv my_extraction |....<br />
2. Automatically run"my_extraction" based on source/sourcetype/host with REPORT-* config variable in props.conf</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/d2creUmD-6o" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/lbitincka/2008/02/12/delimiter-based-key-value-pair-extraction/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rob Das: The SSL Performance Odyssey</title><link>http://feedproxy.google.com/~r/splunkdev/~3/50oSK3KPQEI/</link><comments>http://blogs.splunk.com/rob/2008/02/04/the-ssl-performance-odyssey/</comments><pubDate>Mon, 04 Feb 2008 20:14:37 +0000</pubDate><dc:creator>Rob Das</dc:creator><description><![CDATA[When you come to dev.splunk.com, you see pictures of beer pong, full bars, stuffed ponies with fart machines taped to their ass, etc - basically engineers gone wild.  Somewhere between all of this insaneness, we actually find the time to write code and solve problems like this one.This post is all about a crazy-weird [...]]]></description><content:encoded><![CDATA[<h3>When you come to dev.splunk.com, you see pictures of beer pong, full bars, stuffed ponies with fart machines taped to their ass, etc - basically engineers gone wild.  Somewhere between all of this insaneness, we actually find the time to write code and solve problems like this one.This post is all about a crazy-weird performance issue that we were experiencing, how it manifested itself and ultimately how it was fixed.</h3>
<p>I suspect others may be having this problem, as the problem lives in some <em><strong>very</strong></em> popular open source code as far as I can tell.   With that, I'll begin telling you about my journey into hell.</p>
<p>Splunk has a home grown embedded HTTP(S) server that serves up all external interfaces to the 'splunkd' daemon.   We use it as the core engine for our REST and XML/RPC-like API's.  The GUI and the CLI both end up talking to the daemon via this server.</p>
<p>When I wrote the core of it a few months ago, I ran some rudimentary performance tests on several platforms and it seemed decent enough for our use, but a week ago, the manager of the Search and Indexing team (Stephen) said that he was seeing <em>abysmal </em>performance using SSL.  He said that the GUI performance was being impacted.  I didn't believe him and insisted that it was something else and that he was high.</p>
<p>So to prove to him that it wasn't <em>my </em>server, or <em>my </em>problem like all engineers do, I gave him a small python script that hits the server in a tight loop and we checked the performance.  It sucked.  Continuing with the theme of "this isn't my problem" - I told him it was probably the handler of the request that was doing something that made the server seem slow.  This is when he laughed at me and said "watch this":  He proceeds to turn off SSL, re-run the same test and the performance of the server goes up by approximately 50X.  <strong>50 times faster!</strong>    I know that SSL is slower than non-encrypted streams, but there was no way this was the problem.  Whoa!  We can't ship this way.  This needs to be fixed!</p>
<p>In fact, a very small HTTP request (approx. 80 byte)  with a small reply (approx. 300 bytes) was operating at only 23 requests/sec!  When he turned off SSL, he was getting over 1000 req/sec!  What???</p>
<p>So, of course I tried the same test on my OSX laptop and I got 130+ req/sec - within the realm of reasonable and certainly better than 24.  I then tried running the server on my laptop and the client on my Linux Fedora machine resulting in basically the same performance.   Why does this work on my hardware and not his?</p>
<p>Finally, I switched the server and client by putting the server on my Linux box and the client on my Mac.  I re-ran the test and damned if the performance didn't completely suck!  I was getting 20 or so request-replies per second over SSL.</p>
<p><strong>  But, why does the OS matter?  I didn't get it.</strong></p>
<h3>My SSL Performance Bug Diary</h3>
<ul>
<li>Broke out ssldump.  Here is a snippet from an OSX client and a Linux server.  Note the third C-&amp;gt;S line of .0398 seconds.  This is the cause of the slowdown, but why?</li>
</ul>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/ssldumpold.jpg" title="SSL Dump Slow"><img src="http://blogs.splunk.com/devuploads/2008/02/ssldumpold.jpg" alt="SSL Dump Slow" /></a></p>
<ul>
<li>Spent 2 hours looking over every possible OpenSSL build option and try turning various ones on and off.  No difference.  (score: Bug 1, Rob 0)</li>
<li>Spend many hours trying different crypto combinations.  Little difference beyond the obvious and documented performance differences.  (score Bug 2, Rob 0)</li>
<li>Perhaps I need to throw in server-side SSL caching.  I throw it in, with the assumption that the python client implements client-side SSL caching.  No performance change.  (score:  Bug 3, Rob 0)</li>
<li>Thinking it might be the Nagle algorithm, I modify my test to send larger requests and guess what?  The performance is normal again!   I try to find out exactly when it turns from slow to fast (as far as the request size) by trying request sizes of 1, 2, 4, 8, 16, 32........16K bytes.  Wow, just around 1300-1400 bytes is where the performance goes from sucks to fast.  Look at the graph below.  See the spike?  Hmmm..... (score: Bug 3, Rob 1)</li>
</ul>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/mtuspike1.jpg" title="mtuspike1.jpg"><img src="http://blogs.splunk.com/devuploads/2008/02/mtuspike1.jpg" alt="mtuspike1.jpg" /></a><a href="http://blogs.splunk.com/devuploads/2008/02/mtuspike.jpg" title="mtuspike"></a></p>
<ul>
<li>I change the MTU on the server from the default of 1500 bytes to 1000 bytes. The performance cliff now is lowered to somewhere in the 800-900 byte range. The MTU is the key! (score: Bug 3, Rob 2)</li>
<li>It's got to be the Nagle algorithm.  I try turning off the Nagle algorithm on the server.  No performance change.  (score: Bug 5, Rob 2)</li>
<li>I give the problem to our performance engineer.  He can reproduce it.  I suck.</li>
<li>Decide to try ssldump again and this time try a different test - curl sending the same size request as in the python test.  I want to compare timings.  BINGO.  It's not the server, it's a combination of the server running on Linux and <em><strong>Python.</strong></em> (score: Bug 5, Rob 3).  Notice in the following curl ssldump image, the single C-&amp;gt;S line and the fast .0007 second timing.  Contrast this to the previous ssldump image and here enlies the problem :</li>
</ul>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/ssldumpcurl.jpg" title="ssldump curl"><img src="http://blogs.splunk.com/devuploads/2008/02/ssldumpcurl.jpg" alt="ssldump curl" /></a></p>
<ul>
<li>Now to fix it.  It really really seems like Python is the problem.  I try it with urllib2.  Same thing.</li>
<li>I try it with httplib2.  Same thing.</li>
<li>I look at the code for urllib2 and httplib2 and guess what?  They both use httplib.  The problem must be in httplib.  I dig into the code and start commenting shit out and looking at the resulting ssldump output to figure out *exactly* which write is causing the damage.  I find the bug. (score:  Rob wins)</li>
</ul>
<h3>The Problem and the Fix</h3>
<p>I forgot to tell you that we are using Python 2.5.  It turns out that <em>httplib.py</em> sends requests over the wire in 2 chunks.  The first chunk is comprised of the HTTP headers.  The second chunk is the body.  The fix I made appends the body to the headers and sends the request in 1 chunk only.  This is what curl does and this fixes the performance problems.</p>
<p>Here is the fix for download:</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/httplib.py" title="httplib.py">httplib.py</a></p>
<p>Here is my final data:</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/fullgraph.jpg" title="fullgraph.jpg"><img src="http://blogs.splunk.com/devuploads/2008/02/fullgraph.jpg" alt="fullgraph.jpg" /></a></p>
<h3>Things I Still don't Understand</h3>
<p>Because it seems to work and this took so damn long, I am not going to do any further investigations, but there are still many unsolved mysteries. Perhaps one of you can figure them out.</p>
<ul>
<li>Why the extreme falloff on linux where both the client and server are on the same machine at 16K request/reply size?</li>
<li>Why is OSX so much slower than linux?</li>
<li>Why does the new code speed up linux only?</li>
<li>Notice that only the OSX box gets the speed up at the MTU, the Linux box continues the slow performance regardless of the MTU</li>
</ul>
<h3>Windows to Linux Performance Numbers (added 2/5/08)</h3>
<p>So I added a Windows to Linux graph based on the first comment I received below.  Yes, we do test with Windows, and yes, it is not out yet (but will be soon).  The problem manifests itself exactly like it does on other platforms.  Notice the difference:</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/windows-linux.jpg" title="windows-linux.jpg"><img src="http://blogs.splunk.com/devuploads/2008/02/windows-linux.jpg" alt="windows-linux.jpg" /></a></p>
<h3></h3>
<h3>Specs on the Test Hardware</h3>
<ul>
<li>Windows
<ul>
<li>Dual Core, very fast, lots of Ram (will provide detailed specs in a bit)</li>
</ul>
</li>
<li>Linux:
<ul>
<li>2.6.11-1.1369_FC4smp</li>
<li>3.4Ghz P4, Hyperthreaded, 2G Ram</li>
</ul>
</li>
<li>OSX
<ul>
<li>Mac Pro Laptop</li>
<li>1.8Ghz Pentium Core II duo (2 cores), 3G Ram</li>
</ul>
</li>
</ul>
<p><a href="http://blogs.splunk.com/devuploads/2008/02/ssldumpold.jpg" title="SSL Dump Slow"></a></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/50oSK3KPQEI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rob/2008/02/04/the-ssl-performance-odyssey/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Standing on Our Own Platform</title><link>http://feedproxy.google.com/~r/splunkdev/~3/lkpLH-uop9c/</link><comments>http://blogs.splunk.com/johnvey/2008/01/31/standing-on-our-own-platform/</comments><pubDate>Fri, 01 Feb 2008 01:02:34 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[Splunk is on track to become a billion-dollar company and you, the intrepid sysadmin/developer, are going to help us get there.  Now, this is not a statement that I&#8217;m making as an analyst who &#8220;covers&#8221; the enterprise software market, and compiles a list of &#8220;top software companies to watch&#8221;.  I&#8217;m writing this as [...]]]></description><content:encoded><![CDATA[<p>Splunk is on track to become a billion-dollar company and you, the intrepid sysadmin/developer, are going to help us get there.  Now, this is not a statement that I'm making as an analyst who "covers" the enterprise software market, and compiles a list of "top software companies to watch".  I'm writing this as Splunk's Platform Architect, a techie whose goals are to ensure that what comes out of our development group is compelling and exciting to those that are actually working with the product.</p>
<p>It is this developer-centric ethos that sets us apart from so many of the other enterprise software firms and has already paid dividends on community goodwill.  Instead of making prospective buyers jump through registration hoops just to view a guided webcast tour, Splunk provides fully functional software downloads to try out on your own data, inside your own network, free from webinar smoke and mirrors.</p>
<p>We don't just want you to try out the software, we want you to try doing things that aren't covered in our brochureware, things that sound ludicrous at first but are doable.  In fact, in a perverse way, we hope that you do break our product because it reveals new limitations for us to solve, ultimately leading to a product that lets you do your job the way you want, yet easier and faster.</p>
<p>This is where the Splunk Platform comes into play.  We want to increase the ubiquity of Splunk by, 1) exposing major components of Splunk as individual services, and, 2) allowing external developers to build on top of Splunk and leverage our award-winning IT search infrastructure.  Starting with version 3.2 (you can download the preview version today), there is a new REST API that provides unprecedented access and consistency to every aspect of the Splunk Server.  We are leveraging open standards like the Atom Protocol and OpenID to let enterprise developers create mashups with the same ease as those in the "web2.0&amp;#8243; world.  For programmers who want to integrate Splunk functionality into existing applications, you can look forward to Python and .NET SDKs in the near future, with Java and Perl not too far behind.</p>
<p>Amazon's Web Services, Facebook's F8, and Twitter's API have all proven that standardized platforms breed diverse applications, on scales that are much bigger than a single company can produce.  That's the kind of ecosystem we want to cultivate.  My next posts will begin exploring the new REST endpoints that have been added to Splunk, and provide tutorials on how to use those endpoints to interact with Splunk programmatically.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/lkpLH-uop9c" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2008/01/31/standing-on-our-own-platform/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Eric Woo: Your most important IT data: funny quotes</title><link>http://feedproxy.google.com/~r/splunkdev/~3/YyhZNrBLjJI/</link><comments>http://blogs.splunk.com/ewoo/2008/01/30/your-most-important-it-data-funny-quotes/</comments><pubDate>Wed, 30 Jan 2008 22:14:19 +0000</pubDate><dc:creator>Eric Woo</dc:creator><description><![CDATA[bash.org is a natural dataset for splunking. It&#8217;s a huge blob of loosely structured text data, and it&#8217;s made of win.
To play with a live instance, go to bash.splunklabs.com, login: guest, password: guest.
Of course, Splunk duplicates the functionality of the site itself. We can find, for example, the top 100 IRC quotes:

Splunk lets us do [...]]]></description><content:encoded><![CDATA[<p>bash.org is a natural dataset for splunking. It's a huge blob of loosely structured text data, and it's made of win.</p>
<p>To play with a live instance, go to <a href="http://bash.splunklabs.com">bash.splunklabs.com</a>, login: guest, password: guest.</p>
<p>Of course, Splunk duplicates the functionality of the site itself. We can find, for example, the top 100 IRC quotes:</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/01/top_irc.png"><img src="http://blogs.splunk.com/devuploads/2008/01/top_ircpng.jpg" /></a></p>
<p>Splunk lets us do considerably more, though. What are the top one-liners?</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/01/top_one_liners.png"><img src="http://blogs.splunk.com/devuploads/2008/01/top_one_linerspng.jpg" /></a></p>
<p>How many more quotes mention "girlfriend" than "boyfriend", i.e. exactly how bad is this sausage party?</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/01/gf_vs_bf.png"><img src="http://blogs.splunk.com/devuploads/2008/01/gf_vs_bfpng.jpg" /></a></p>
<p>Are there any commonly quoted individuals?</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/01/nicks.png"><img src="http://blogs.splunk.com/devuploads/2008/01/nickspng.jpg" /></a></p>
<p>Are there any interesting trends in quote scores over time? Take a look at high quote scores vs. quote ID:</p>
<p><a href="http://blogs.splunk.com/devuploads/2008/01/max_score_vs_id.png"><img src="http://blogs.splunk.com/devuploads/2008/01/max_score_vs_idpng.jpg" /></a></p>
<p>It seems likely that older quotes, especially good ones, benefit from a disproportionately greater number of views (the rich getting richer, so to speak); this might explain why the peaks in the low-quote-ID ranges are higher than the peaks for more recent quotes. Or maybe the internet just doesn't produce the same quality of LOLs that it once did.</p>
<p>To try this yourself, add the following to props.conf:</p>
<p><code>[sourcetype::bash]<br />
BREAK_ONLY_BEFORE = (#[0-9]* \+)|([0-9]+-[0-9]+-[0-9]+-[0-9]+-[0-9]+-[0-9]+)<br />
REPORT-bash = bash</code></p>
<p>and the following to transforms.conf:</p>
<p><code>[bash]<br />
REGEX = #([0-9]+) \+\((-?[0-9]+)\)- \[X\]<br />
FORMAT = $0 bash_quote_id::$1 bash_quote_score::$2</code></p>
<p>Then, get a static copy of bash.org. You can grab the one I've created <a href="http://blogs.splunk.com/devuploads/2008/01/bashtxt.zip">here</a>, or you can generate it yourself:</p>
<p><code>$ curl -o '#1.html' 'http://bash.org/?browse&amp;amp;p=[001-409]'<br />
$ for cur in * ; do lynx -dump -nonumbers ./$cur >> /tmp/bash.txt ; done</code></p>
<p>Finally, push the data into Splunk:</p>
<p><code>$ splunk add tail -source /tmp/bash.txt -sourcetype bash</code></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/YyhZNrBLjJI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/ewoo/2008/01/30/your-most-important-it-data-funny-quotes/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Performance impact of fast drives (via sorkin)</title><link>http://feedproxy.google.com/~r/splunkdev/~3/wOLuG93bGXE/</link><comments>http://blogs.splunk.com/erik/2008/01/29/performance-impact-of-fast-drives-via-sorkin/</comments><pubDate>Wed, 30 Jan 2008 04:29:54 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[The following is copped from a support email by Stephen Sorkin who is the man behind the splunk server curtain &#8230; thought it should go broader.

I&#8217;m the manager of the search and indexing team at Splunk. We&#8217;re still in the process of writing up our findings from storage benchmarks but here are the general details.
High [...]]]></description><content:encoded><![CDATA[<p>The following is copped from a support email by Stephen Sorkin who is the man behind the splunk server curtain ... thought it should go broader.</p>
<blockquote><p>
I'm the manager of the search and indexing team at Splunk. We're still in the process of writing up our findings from storage benchmarks but here are the general details.</p>
<p>High IO/s typically means both faster indexing in general and faster searching of rare, temporally incoherent events. On average, we've seen indexing speeds increase by about 66% going from an 7200 RPM SATA RAID to a 15K RPM SCSI RAID. We've seen comparable performance from SCSI and SAS RAIDs, provided they're 15K RPM.</p>
<p>The best best benchmarking tool we've found for measuring how Splunk will behave on your disk hardware is bonnie++. If your disk subsystem can sustain 800 IO/s, you're in good shape.</p>
<p>As far as searching goes, IO/s is the dominant factor for non-coherent, infrequently accessed search results. This means, if you're just searching for the newest data, or even have to reach back through 1MM events to return 10k, the disk is NOT the bottleneck, since each individual read() will pull many events off disk. However, if you're searching for a rare term, like a name, that occurs once an hour or once a day, each read() is going to require the drive arm move. If you're using a 7200 RPM SATA drive, that's about 100 IO/s and hence on the order of 100 retrieved events per second. If you have a decent RAID, that could be 800 retrieved events per second.
</p></blockquote>
<p>-s</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/wOLuG93bGXE" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/01/29/performance-impact-of-fast-drives-via-sorkin/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Its about time - Preview #3</title><link>http://feedproxy.google.com/~r/splunkdev/~3/eWXKyfAxxAQ/</link><comments>http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/</comments><pubDate>Wed, 30 Jan 2008 02:39:05 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[
Hey all,
It&#8217;s taken longer than we would have liked but our 3rd preview build has been posted.
Get&#8217;um here
A bunch of work has gone into windows stability, tons of bugs were fixed, and a bunch of customer requests have been implemented ( we will let you know out of band ). We expect that this release [...]]]></description><content:encoded><![CDATA[<p><a href='http://blogs.splunk.com/devuploads/2008/01/picture-12.png' title='hex'><img src='http://blogs.splunk.com/devuploads/2008/01/picture-12.png' alt='hex' / align="right" border="0" width="200" height="200"></a><br />
Hey all,</p>
<p>It's taken longer than we would have liked but our 3rd preview build has been posted.<br />
<a href="http://www.splunk.com/index.php/preview/20080129">Get'um here</a></p>
<p>A bunch of work has gone into windows stability, tons of bugs were fixed, and a bunch of customer requests have been implemented ( we will let you know out of band ). We expect that this release should be more stable, slightly faster, and less buggy.</p>
<p>Left to do, we still have a bunch of IE work, performance improvements, and cleaning up of some features like interactive field extraction and event type discovery.</p>
<p>Its still not production ready so don't even think of trying it out for real - and there is no guarantee that migration will work from a preview to GA ( we will migrate from 3.1.x to GA but not preview ).  Also, don't run splunk as root - its just not good to do until we run through all our testing.</p>
<p>As always, please send us feedback at splunkpreview@splunk.com or hit us up on IRC (irc.efnet.org #splunk).<br />
The last round of info from Preview #2 was awesome please keep it up!</p>
<p>e.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/eWXKyfAxxAQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Gem Noticed by Enterprise Networking Planet</title><link>http://feedproxy.google.com/~r/splunkdev/~3/B0fmw_AJCMQ/</link><comments>http://blogs.splunk.com/kordless/2008/01/24/noticed-by-enterprise-networking-planet/</comments><pubDate>Fri, 25 Jan 2008 01:48:53 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[I have a Google alert set up to email me news of the extraordinary concerning Splunk.  Most of them are press releases by either us or our agency, which are all well and fine (this is how most companies seed stories anyway), but one caught my eye this morning by Charlie Schluting over on [...]]]></description><content:encoded><![CDATA[<p>I have a Google alert set up to email me news of the extraordinary concerning Splunk.  Most of them are press releases by either us or our agency, which are all well and fine (this is how most companies seed stories anyway), but <a href="http://www.enterprisenetworkingplanet.com/netos/article.php/3723406">one caught my eye</a> this morning by Charlie Schluting over on <a href="http://enterprisenetworkingplanet.com">Enterprise Networking Planet</a>.</p>
<p>Two things struck me interesting about Charlie's post.  </p>
<p>First, he noticed the changes in the UI we've been slowly making over the last few releases.  If you've ever done UI design, you know how much sweat goes into every little detail, and how much momentum a design carries over time.  That someone noticed the new changes *and* liked them is a HUGE win for the UI team.  It's even better how fast someone noticed!</p>
<p>Second, he actually spends quite a bit of time explaining the security workaround in the free product - <a href="http://dev.splunk.com/2008/01/14/splunk-hack-4-aliasing-splunk-with-a-subdomain/">one that I covered</a> earlier, coincidently enough.  I figure if someone goes to the time and trouble to figure out how they can keep using the product in a secure, legitimate way, then they must really, really like it.  You simply can't argue with an evangelist like this.</p>
<p>If anyone here is a gem, it's Charlie.</p>
<p><img src='http://blogs.splunk.com/devuploads/2008/01/insight_jun04_mailbox_kohinoor_large.jpg' alt='insight_jun04_mailbox_kohinoor_large.jpg' /></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/B0fmw_AJCMQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2008/01/24/noticed-by-enterprise-networking-planet/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: O’Rly?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/xYJ8yGgc9w4/</link><comments>http://blogs.splunk.com/david/2008/01/21/orly/</comments><pubDate>Tue, 22 Jan 2008 03:00:27 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Below are a few easter egg features found inside Splunk.

From the commandline: &#8220;splunk ftw&#8221; produces an ascii-art &#8220;O&#8217;Rly?&#8220;.
From the commandline: the &#8220;outputrawr&#8221; produces ascii-art fireworks.
From the searchbox, piping results to the &#8220;marklar&#8221; processor (e.g. &#8220;*&#124;marklar&#8221;), converts all search results into the Marklarian language.
From the searchbox, piping result to the &#8220;loglady&#8221; processor (e.g., &#8220;*&#124;loglady&#8221;), converts all [...]]]></description><content:encoded><![CDATA[<p>Below are a few <a href="http://en.wikipedia.org/wiki/Easter_egg_%28media%29#Software-based" target="_blank">easter egg</a> features found inside Splunk.</p>
<ul>
<li>From the commandline: "splunk ftw" produces an ascii-art "<a href="http://en.wikipedia.org/wiki/O_RLY%3F" target="_blank">O'Rly?</a>".</li>
<li>From the commandline: the "outputrawr" produces ascii-art fireworks.</li>
<li>From the searchbox, piping results to the "marklar" processor (e.g. "*|marklar"), converts all search results into the <a href="http://en.wikipedia.org/wiki/Marklar#Marklar" target="_blank">Marklarian</a> language.</li>
<li>From the searchbox, piping result to the "loglady" processor (e.g., "*|loglady"), converts all the search results into quotes from Twin Peaks's <a href="http://en.wikipedia.org/wiki/Margaret_Lanterman" target="_blank">LogLady</a>.</li>
</ul>
<p>Enjoy them while they last, before they are removed by the Silliness Police, who<code>%$($%%$</code><br />
<code>^H^H^H^<blink>NO CARRIER</blink></code></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/xYJ8yGgc9w4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2008/01/21/orly/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ledion Bitincka: Key-value pair extraction definition, examples and solutions….</title><link>http://feedproxy.google.com/~r/splunkdev/~3/sx8zfd4LUx4/</link><comments>http://blogs.splunk.com/lbitincka/2008/01/18/key-value-pair-extraction-definition-examples-and-solutions/</comments><pubDate>Fri, 18 Jan 2008 17:07:48 +0000</pubDate><dc:creator>Ledion Bitincka</dc:creator><description><![CDATA[Most of the time logs contain data which, by humans, can be easily recognized as either completely or semi-structured information. Being able to extract structure in log data is a necessary first step to further, more interesting, analysis. While it would be great to be able to automatically extract the structure from all log data, [...]]]></description><content:encoded><![CDATA[<p>Most of the time logs contain data which, by humans, can be easily recognized as either completely or semi-structured information. Being able to extract structure in log data is a necessary first step to further, more interesting, analysis. While it would be great to be able to <strong>automatically</strong> extract the structure from <strong>all</strong> log data, splunk cannot rival the brain's performance at this time, however it is able to tap into <strong>your</strong> brain for help <img src='http://blogs.splunk.com/lbitincka/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> Read on ......</p>
<p><u><b>Problem definition:</b></u><br />
Extract structured information (in the form of key/field=value form) from un/semi-structured log data.<br />
<em>Note: for the purpose of this post key or field are used  interchangeably to denote a variable name.</em></p>
<p><u><b>Problem examples:</b></u><br />
Splunk debug message (humans: easy, machine: easy)<br />
<code><br />
12-03-2007 13:51:55.114 DEBUG SearchPipelinePerformance - processor=save queryid=_1196718714_619358 executetime=0.014secs<br />
ideal structured information to extract:<br />
processor=save<br />
queryid=_1196718714_619358<br />
executetime=0.014secs<br />
</code><br />
Splunk tries to make it easy for itself to parse it's own log files (in most cases)</p>
<p>Output of the ping command (humans: easy, machine: medium)<br />
<code><br />
64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=2.522 ms<br />
ideal structured information to extract:<br />
bytes=64<br />
from=192.168.1.1<br />
icmp_seq=0<br />
ttl=64<br />
time=2.522 ms<br />
</code></p>
<p>An interesting pattern to note here is that there is no consistent field-value delimiter, nor field-value order. In the "from" field the authors have chosen to use a space as a delimiter, while for "icmp_seq", "ttl" and "time" they've chosen the equal sign. For the "bytes" field they've chosen to place it after the value (yes, they might have also intended for it to mean bytes - the data unit) while for the rest they've chosen field-name followed by field-value. Admittedly, some might think the current format is prettier than the following <strong>consistent</strong> log line which could easily be parsed by machines. (Who thought log files were optimized for prettiness !?)<br />
<code><br />
bytes=64, from=192.168.1.1, icmp_seq=0, ttl=64, time=2.522 ms<br />
</code> </p>
<p>NetScreen log (humans: medium, machine: hard)<br />
<code><br />
%MD%  %DD% 13:41:25 45.2.0.1 NOC-FWa: NetScreen device_id=NOC-FWa  [Root]system-notification-00257(traffic): start_time="2006-05-11 13:40:23&amp;#8243; duration=62 policy_id=41 service=Network Time proto=17 src zone=noc-mgt dst zone=noc-svcs ......<br />
ideal structured information to extract:<br />
device_id=NOC-FWa<br />
start_time=2006-05-11 13:40:23<br />
duration=62<br />
policy_id=41<br />
service=Network Time<br />
proto=17<br />
src zone=noc-mgt<br />
dst zone=noc-svcs<br />
</code></p>
<p>This part of the NetScreen log line <b>...service=Network Time proto=17 src zone=noc-mgt dst zone=noc...</b> is a salient example of the ambiguity that sometimes  exists in log data. What is the correct value of service ? "Network" or "Network Time"? What about the name of the next field? Is it "Time proto" or just "proto"? Well, we can come up with an easy rule for this case, let call it Rule-1: "Field names should NOT contain spaces". Fair/good enough!<br />
Let's move on to the next field, what is it's correct name? "src zone" or just "zone"? A human can recognize that "src zone" is the correct field name, thus we just violated the our Rule-1, we can continue our cycle of adding/violating/modifying|removing rules to our rule set only to recognize that the cycle never ends - which simply translates into "there is no one solution/rule-set that is able to extract structure from ALL unstructured data" - there will always be a degenerate case that violates the rules.</p>
<p><u><b>More degenerate log lines:</b></u><br />
Stay tuned! Links in this section are coming soon....</p>
<p><u><b>Solutions:</b></u><br />
- <a href="http://dev.splunk.com/2008/02/12/delimiter-based-key-value-pair-extraction/">Delimiter based key-value pair extraction</a><br />
- <a href="http://dev.splunk.com/2008/02/22/delimiter-base-kv-extraction-advanced/">Delimiter base KV extraction - advanced</a><br />
Stay tuned! More links coming soon....</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/sx8zfd4LUx4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/lbitincka/2008/01/18/key-value-pair-extraction-definition-examples-and-solutions/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Carl Yestrau: JavaScript Error Reporting with Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/2mEtI7lCDco/</link><comments>http://blogs.splunk.com/carl/2008/01/16/javascript-error-logging-with-splunk/</comments><pubDate>Thu, 17 Jan 2008 00:18:46 +0000</pubDate><dc:creator>Carl Yestrau</dc:creator><description><![CDATA[Keeping track of new browser releases these days can be really challenging. It is less than ideal if your payment processor is throwing a JavaScript onsubmit exception effectively canceling all transactions.
Here is a little technique for indexing JavaScript exceptions in your production and development environments using Splunk.
In JavaScript create an onerror event handler that makes [...]]]></description><content:encoded><![CDATA[<p>Keeping track of new browser releases these days can be really challenging. It is less than ideal if your payment processor is throwing a JavaScript onsubmit exception effectively canceling all transactions.</p>
<p>Here is a little technique for indexing JavaScript exceptions in your production and development environments using Splunk.</p>
<p>In JavaScript create an onerror event handler that makes an HTTP request to a server that has access logs indexed by Splunk. </p>
<pre>
<code>
    function JSErrorLogger(httpBeacon){
        var self = this;
        self.handler = function(msg, url, line){
            var log = {
                "date":new Date(),
                "type":"jserror",
                "line":line,
                "msg":msg,
                "url":url
            }
            var logStr = "";
            for(var i in log){
                logStr += i + ":" + log[i] + " ";
            }
            var imgObj = new Image();
            imgObj.src = httpBeacon + "?" + logStr;
        };
        self.JSErrorLogger = function(){
            window.onerror = self.handler;
        }();
    }
</code>
</pre>
<p>Make sure that this JavaScript is the very first item executed by the interpreter, ensuring all exceptions are caught by the event handler. </p>
<p>Instantiate the class with a URI that points to a beacon on a machine that has Splunk indexing the access log. You may want to set some environment variables in JavaScript that turn logging on for only testing and production machines.</p>
<pre>
<code>
   //if environment test or production
   var splunkJSErrorIndexer = new JSErrorLogger("http://somedomain.com/beacon.gif");
</code>
</pre>
<p>That's it, now you can empirically understand JavaScript exceptions being raised, set blackberry alerts and correlate ui stability issues to deploys:) </p>
<p>Happy JavaScript Monitoring!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/2mEtI7lCDco" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/carl/2008/01/16/javascript-error-logging-with-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Splunk Hack #4 - Aliasing Splunk with a Subdomain</title><link>http://feedproxy.google.com/~r/splunkdev/~3/lyhzDMjnSB4/</link><comments>http://blogs.splunk.com/kordless/2008/01/14/splunk-hack-4-aliasing-splunk-with-a-subdomain/</comments><pubDate>Mon, 14 Jan 2008 18:14:02 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[With the new release of <a href="http://www.splunk.com/index.php/preview/20071229">Splunk Preview</a> out, I've run into a problem keeping the different versions straight on my laptop.  I have the free version, the Preview, the official release, <b>and</b> a version of current running - often times simultaneously.  It's getting messy.

What you really want to do is refer to them with different subdomain names, where something like <font color="#A8C479"><i>http://splunkpreview.mydomain.com/</i></font> would bring up Splunk without having to remember the port number.

If you are running Apache, (like I am on Leopard) you get a reverse proxy server for free.  With just a few lines of configuration, you can alias subdomains (or domains for that matter) to your heart's content.

You also get the ability of putting content behind some basic authentication provided via Apache's HTTP auth methods.  This comes in handy if you'd like to link to your Splunk install from a publicly facing page, but don't want people to know what type of content is behind the authentication.  It also works for limiting access to a particular IP address group or domain.

I've put together a screencast covering how to do this from OS X's version of Apache.  Click on the thumbnail below to play the screencast.

<a href='http://dev.splunk.com/wp-content/uploads/2008/01/kord_proxy_large_out.mov' title='Alilasing Splunk'><img width=528px height=297px  src="http://dev.splunk.com/wp-content/uploads/2008/01/kord_proxy_large_out1.jpg"></a>
]]></description><content:encoded><![CDATA[<p>With the new release of <a href="http://www.splunk.com/index.php/preview/20071229">Splunk Preview</a> out, I've run into a problem keeping the different versions straight on my laptop.  I have the free version, the Preview, the official release, <b>and</b> a version of current running - often times simultaneously.  It's getting messy.</p>
<p>What you really want to do is refer to them with different subdomain names, where something like <font color="#A8C479"><i>http://splunkpreview.mydomain.com/</i></font> would bring up Splunk without having to remember the port number.</p>
<p>If you are running Apache, (like I am on Leopard) you get a reverse proxy server for free.  With just a few lines of configuration, you can alias subdomains (or domains for that matter) to your heart's content.</p>
<p>You also get the ability of putting content behind some basic authentication provided via Apache's HTTP auth methods.  This comes in handy if you'd like to link to your Splunk install from a publicly facing page, but don't want people to know what type of content is behind the authentication.  It also works for limiting access to a particular IP address group or domain.</p>
<p>I've put together a screencast covering how to do this from OS X's version of Apache.  Click on the thumbnail below to play the screencast.</p>
<p><a href='http://blogs.splunk.com/devuploads/2008/01/kord_proxy_large_out.mov' title='Alilasing Splunk'><img width=528px height=297px  src="http://blogs.splunk.com/devuploads/2008/01/kord_proxy_large_out1.jpg"></a></p>
<p><b>Note</b>:  Firewalling the actual port Splunk runs on is left as an exercise for the viewer, as is limiting access to a group of IP addresses.  More information about configuring Apache's <a href="http://httpd.apache.org/docs/2.0/mod/mod_proxy.html">mod_proxy module</a> can be found on <a href="http://httpd.apache.org/">Apache's website</a>.</p>
<p>Here's the configuration code from the screencast:</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1"><span class="sy0">&amp;lt;</span>VirtualHost <span class="sy0">*:</span><span class="nu0">80</span><span class="sy0">&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1">    ServerName preview<span class="sy0">.</span>geekceo<span class="sy0">.</span>com </div>
</li>
<li class="li1">
<div class="de1">    <span class="sy0">&amp;lt;</span>Location <span class="sy0">/&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1">        ProxyPass http<span class="sy0">:</span><span class="co1">//localhost:8000/ </span></div>
</li>
<li class="li1">
<div class="de1">        ProxyPassReverse http<span class="sy0">:</span><span class="co1">//localhost:8000/ </span></div>
</li>
<li class="li1">
<div class="de1">    <span class="sy0">&amp;lt;/</span>Location<span class="sy0">&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&amp;lt;/</span>VirtualHost<span class="sy0">&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1"> </div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&amp;lt;</span>VirtualHost <span class="sy0">*:</span><span class="nu0">80</span><span class="sy0">&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1">    ServerName free<span class="sy0">.</span>geekceo<span class="sy0">.</span>com </div>
</li>
<li class="li1">
<div class="de1">    <span class="sy0">&amp;lt;</span>Location <span class="sy0">/&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1">        ProxyPass http<span class="sy0">:</span><span class="co1">//localhost:8001/ </span></div>
</li>
<li class="li1">
<div class="de1">        ProxyPassReverse http<span class="sy0">:</span><span class="co1">//localhost:8001/ </span></div>
</li>
<li class="li1">
<div class="de1">        AuthType Basic </div>
</li>
<li class="li1">
<div class="de1">        AuthName <span class="st0">&amp;quot;Password Required&amp;quot;</span> </div>
</li>
<li class="li1">
<div class="de1">        AuthUserFile <span class="sy0">/</span>etc<span class="sy0">/.</span>htpasswd </div>
</li>
<li class="li1">
<div class="de1">        <span class="kw1">require</span> valid<span class="sy0">-</span>user </div>
</li>
<li class="li1">
<div class="de1">    <span class="sy0">&amp;lt;/</span>Location<span class="sy0">&amp;gt;</span> </div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&amp;lt;/</span>VirtualHost<span class="sy0">&amp;gt;</span></div>
</li>
</ol>
</div>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/lyhzDMjnSB4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2008/01/14/splunk-hack-4-aliasing-splunk-with-a-subdomain/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Bomberman</title><link>http://feedproxy.google.com/~r/splunkdev/~3/sIJOPvI_Was/</link><comments>http://blogs.splunk.com/david/2008/01/10/bomberman/</comments><pubDate>Fri, 11 Jan 2008 02:19:57 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[The world&#8217;s most fun video game, keeping us sane &#8212; 1993&#8217;s Bomberman for NES, played on the Wii.
&#8220;Look out, rotsky, you&#8217;ve got fast aids!&#8221;


]]></description><content:encoded><![CDATA[<p>The world's most fun video game, keeping us sane  -  1993's Bomberman for NES, played on the Wii.<br />
<em>"Look out, rotsky, you've got fast aids!"</em></p>
<p><object height="355" width="425"><param name="movie" value="http://www.youtube.com/v/jUSpRlth8s4&amp;amp;rel=1"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/jUSpRlth8s4&amp;amp;rel=1" type="application/x-shockwave-flash" wmode="transparent" height="355" width="425"></embed></object></p>
<p><br/></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/sIJOPvI_Was" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2008/01/10/bomberman/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Just in time for new year - its Preview #2</title><link>http://feedproxy.google.com/~r/splunkdev/~3/0CBeDLJQ2EM/</link><comments>http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/</comments><pubDate>Sun, 30 Dec 2007 06:38:39 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[Happy new year (bit early) all dev.splunk.com readers&#8230;.
We have just posted our second 3.2 preview release. (build number 30455)
Its packed with holiday goodness, albeit very raw.
First you will notice we have posted a windows build. Its been in the cooker since last Feb and thanks to Mitch, Ledio, Igor and a bit of Amrit we [...]]]></description><content:encoded><![CDATA[<p>Happy new year (bit early) all dev.splunk.com readers....<br />
We have just posted our second <a href="http://www.splunk.com/index.php/preview/20071229">3.2 preview release</a>. (build number 30455)</p>
<p>Its packed with holiday goodness, albeit very raw.</p>
<p>First you will notice we have posted a windows build. Its been in the cooker since last Feb and thanks to Mitch, Ledio, Igor and a bit of Amrit we now have a single code base that rocks on linux, mac, solaris, freebsd, aix, AND windows.  This was not an easy feat as evidenced by our gift of a <a href="http://valleywag.com/tech/silicon-valley-users-guide/understanding-geeks-++-the-100+word-version-331539.php">pony (soft and electronic)</a> to Mitch for his effort. Its still very raw (the build not the pony), and has a tendency to crash because of a memory fragentation and limited vm space. Which will be fixed by GA... MarkB. will post more on the build so stay tuned for details. Its a big deal for us so be patient and we sure could use feedback on how to make it the best it can be.</p>
<p>Also in this release you will see the UI starts to get some of the async search results. Over the next few releases we will be moving to fully async search in the UI. It will take a few turns but this preview has some of the first cut.</p>
<p>There are a bunch of other improvements; scheduled searches got a bit of a cleanup in the UI and the backend has been improved as well. Performance, bugs, and other tweaks are also spread throughout. I'll get others to post specifics.</p>
<p>In the mean time, as always its a huge help to us in dev if you can kick the tires before we freeze for GA. Please send feedback to splunkpreview@splunk.com, post comments to this blog, or drop by and tell us in person.</p>
<p>Again, thanks for the help and happy new year from all of us in dev@splunk!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/0CBeDLJQ2EM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ben Strawbridge: Configuring roles in Splunk 3.2 preview</title><link>http://feedproxy.google.com/~r/splunkdev/~3/uiYGqSvZkCU/</link><comments>http://blogs.splunk.com/ben/2007/12/27/configuring-roles-in-splunk-32-preview/</comments><pubDate>Thu, 27 Dec 2007 18:35:00 +0000</pubDate><dc:creator>Ben Strawbridge</dc:creator><description><![CDATA[Last week I made a video about how to setup new roles in Splunk 3.2 preview release.  The  video will demonstrate creating a new type of power user, with the same capability of a standard power user, and the addition of the ability to manage and create new users.  You will also [...]]]></description><content:encoded><![CDATA[<p>Last week I made a video about how to setup new roles in Splunk 3.2 preview release.  The  video will demonstrate creating a new type of power user, with the same capability of a standard power user, and the addition of the ability to manage and create new users.  You will also see how to create new roles by configuring authorize.conf.</p>
<p>(Update): While watching the video again and realized I sent a mixed message about where to edit configuration in splunk.  I made it clear that you want to edit in the local bundle directory, and if you look at the terminal that is where I was editing my configuration, however, I later said "default over-rides local, so always edit in default", this is WRONG. <strong>Always make your personalized configuration changes in the local directory, if the configuration file doesn't exist there, create one or copy it from default and edit that one.</strong></p>
<p>Take a look at the video and let me know if you have any questions about this stuff.</p>
<p><a href="http://blogs.splunk.com/devuploads/2007/12/ben_large.mov">Quicktime Video (625&amp;#215;352)</a></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/uiYGqSvZkCU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/ben/2007/12/27/configuring-roles-in-splunk-32-preview/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Carl Yestrau: Hey Browser, You’ve Got Tail!</title><link>http://feedproxy.google.com/~r/splunkdev/~3/_JUP5FFmKkQ/</link><comments>http://blogs.splunk.com/carl/2007/12/05/hey-ui-youve-got-tail/</comments><pubDate>Thu, 06 Dec 2007 00:24:41 +0000</pubDate><dc:creator>Carl Yestrau</dc:creator><description><![CDATA[For those interested in monitoring real-time data being consumed by Splunk we&#8217;ve introduced a new feature called Live Tail to the latest preview release. Additionally, we&#8217;ve added a nifty new REST endpoint /v3/splunk/tail for your custom application needs.

More information can be found in these videos:

A quick walkthrough of the new preview release feature Live Tail, [...]]]></description><content:encoded><![CDATA[<p>For those interested in monitoring real-time data being consumed by Splunk we've introduced a new feature called Live Tail to the latest <a href="http://www.splunk.com/index.php/preview">preview release</a>. Additionally, we've added a nifty new REST endpoint /v3/splunk/tail for your custom application needs.</p>
<p><a href='http://blogs.splunk.com/devuploads/2007/12/live-tail1.png' title='Live Tail'><img src='http://blogs.splunk.com/devuploads/2007/12/live-tail1.png' alt='Live Tail'  width="600px" /></a></p>
<p>More information can be found in these videos:</p>
<ul>
<li>A quick walkthrough of the new preview release feature Live Tail, its UI, and some sample code - <a href="http://blogs.splunk.com/devuploads/2007/11/carl_livetail.mov">See Video</a></li>
<li>An overview of the architecture used to integrate real-time data from Splunk Live Tail in a web browser. Challenges and workarounds when using JavaScript/Flash hybrids - <a href="http://www.johnleestma.com/video/Carl_Handling_.mov">See Video</a></li>
</ul>
<p>Happy <a href="http://dev.splunk.com/2007/11/16/flashas3-urlstream-memory-leak/">Streams</a>!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/_JUP5FFmKkQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/carl/2007/12/05/hey-ui-youve-got-tail/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rory Greene: flexibles roles and chamber of secrets</title><link>http://feedproxy.google.com/~r/splunkdev/~3/MZreWV-hw3k/</link><comments>http://blogs.splunk.com/rory/2007/12/05/flexibles-roles-and-chamber-of-secrets/</comments><pubDate>Wed, 05 Dec 2007 23:16:44 +0000</pubDate><dc:creator>Rory Greene</dc:creator><description><![CDATA[Hi Kids, 
So we have added in flexible roles into the preview release. Well, what does that mean.
We will now allow folks to create their own roles. The previous ones of Admin, Power
and User will be included as defaults.
There is currently no GUI available for editing roles but you can directly edit the
config file $SPLUNK_HOME/etc/bundles/default/authorize.conf.
To [...]]]></description><content:encoded><![CDATA[<p>Hi Kids, </p>
<p>So we have added in flexible roles into the preview release. Well, what does that mean.<br />
We will now allow folks to create their own roles. The previous ones of Admin, Power<br />
and User will be included as defaults.</p>
<p>There is currently no GUI available for editing roles but you can directly edit the<br />
config file $SPLUNK_HOME/etc/bundles/default/authorize.conf.</p>
<p>To add in these roles we did an audit of our system and broke down various actions<br />
into capabilities.  These capabilities can be grouped together to create any role.<br />
Please bear with us here, this is just a first cut and we may not have chopped up<br />
things in a way that makes sense to you. This is the beauty of preview, you got a suggestion<br />
about capabilities you'd like to see added or removed then comment or mail us.<br />
The more feedback we get at this stage the faster this feature will improve.</p>
<p>A role in the splunk system contains the following things.<br />
1. A list of capabilities that role can perform.<br />
2. A list of roles that are contained within this role ( their capabilities will be imported into our role)<br />
3. A list of search filters that should be applied when searching as this role.</p>
<p>Below demonstrates how to define a role called kwyjibo that can edit users information and<br />
make changes to the authentication system. It imports in the capabilities of the roles User and Power.</p>
<p>[role_kwyjibo]<br />
edit_user                          = enabled<br />
change_authentication   = enabled<br />
bounce_authentication   = enabled<br />
importRoles                      = Power;User<br />
srchFilter                           =</p>
<p>If you have any questions, comments please let me know.</p>
<p>Rory</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/MZreWV-hw3k" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rory/2007/12/05/flexibles-roles-and-chamber-of-secrets/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Preivew #1 is up</title><link>http://feedproxy.google.com/~r/splunkdev/~3/4xYPKAWZOLE/</link><comments>http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/</comments><pubDate>Wed, 05 Dec 2007 20:39:56 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[Splunk fans.
We have posted the our first of many preview releases. You can find them here:
Our hope is that every week or two as new features or API&#8217;s become usable that we post builds soliciting feedback.
This first post has a bunch of backend and UI performance improvements as well as some new but hidden features:

live [...]]]></description><content:encoded><![CDATA[<p>Splunk fans.</p>
<p>We have posted the our first of many preview releases. You can find them <a href="http://www.splunk.com/index.php/preview/20071204">here</a>:<br />
Our hope is that every week or two as new features or API's become usable that we post builds soliciting feedback.<br />
This first post has a bunch of backend and UI performance improvements as well as some new but hidden features:</p>
<ul>
<li>live searching of data</li>
<li>flexible roles</li>
<li>scripted authentication</li>
<li>event decoration ( for the xmas season )</li>
<li>auditing of splunk server actions</li>
<li>file system change detection</li>
<li>improved (proper) sub second support</li>
<li>transaction search</li>
<li>new experimental simple search interface</li>
<li>"where" support in search clause ( you dont need to use the "| where" anymore and can just search for foo=10 )</li>
</ul>
<p>I'm not going to explain here what these things mean or how to find them or use them <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Instead the product managers and developers will post here with ideas on what to try and what feedback we are looking for.</p>
<p>I'd like to thank in advance those brave few of you that have the few minutes to install these builds and give us your feedback.</p>
<p>e.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/4xYPKAWZOLE" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Splunk 3.2 Preview #1 is coming</title><link>http://feedproxy.google.com/~r/splunkdev/~3/QFoWFELS4EM/</link><comments>http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/</comments><pubDate>Fri, 30 Nov 2007 00:41:31 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[Hi all,
Just a heads up that we are moving to a model where we post previews of upcoming releases.
Starting now, we are going into a mode where long before a GA release we will be posting development builds. At first, they may be a few weeks apart but over time our goal is to post [...]]]></description><content:encoded><![CDATA[<p>Hi all,</p>
<p>Just a heads up that we are moving to a model where we post previews of upcoming releases.</p>
<p>Starting now, we are going into a mode where long before a GA release we will be posting development builds. At first, they may be a few weeks apart but over time our goal is to post builds as soon as new functionality or API's are ready for comment.</p>
<p>This first Preview #1 will have backend performance and scale improvements as well as some cool new features. The developers and PM's will be posting to this blog the specifics of what is new, how to try it, and where we are going.</p>
<p>Our hope is that we get early feedback on new features and API's before we actually ship.</p>
<p>Thanks in advance for helping try out our early wares.</p>
<p>Kinds Regards,</p>
<p>e.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/QFoWFELS4EM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Erik Swan: Making reports faster by caching scheduled searches</title><link>http://feedproxy.google.com/~r/splunkdev/~3/valKkWDrt8A/</link><comments>http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/</comments><pubDate>Mon, 19 Nov 2007 01:34:21 +0000</pubDate><dc:creator>Erik Swan</dc:creator><description><![CDATA[I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.
If I have a search/report that I want to run faster, I will save that search and [...]]]></description><content:encoded><![CDATA[<p>I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.</p>
<p>If I have a search/report that I want to run faster, I will save that search and have splunk run it over a small timeframe (5,15,30,60 min) taking the results of that search/report and feeding them back into an index i create to hold cached results.</p>
<p>For example, suppose I like to run nightly reports where I show "top users by bandwidth". Its easy enough to run the report every night, but suppose there are times during the day when I want incrementals, or I want to look at last week, or perhaps get dailies over a month. Every time I run the search/report I need to search and recalculate "top users by bandwidth", which if over billions of events can take time <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Instead, I'll just save the search/report and have Splunk run it every 15 minutes with the results being sent to a "cache" index. This way if I ever want to do an adhoc search on "top users" or if I want to do "weekly reports by day" all the data is precalculated.  </p>
<p>Think of this as creating "logs" that are the output of a search/report and then having Splunk index those "logs".  To get fast results you can then search/report on the summarized cached data.</p>
<p>If not obvious why it's faster, suppose you are indexing 500M events a day and 100M of those have bandwidth data. To report on "top bandwidth by users" I need to run a search to get the 100M events then run the report across all 100M.<br />
If instead I were in the background running that same search/report over each hour interval, then saving the data back into splunk, I would reduce the data i'm operating on from 100M down to 1200 ( 24*500 ) (assuming that i'm getting top 500). Doing searches/reports on the later dataset are sub second versus the few minutes it would take to run across the 100M. </p>
<p>Make sense ? - its really simple but odd to explain.</p>
<p>PART ONE - Setup:</p>
<ul>
<li>1. Grab the reportcache search script from <a href='http://blogs.splunk.com/devuploads/2007/11/reportcache.py' title='reportcache.py'>"** here **</a> and put it in your <code>SPLUNK_HOME/etc/searchscripts</code> directory - no need to restart you can now cache any search/report data.</li>
<li>2. Add a cache index - either add the following to your <code>etc/bundles/local/indexes.conf</code> or create a new bundle and add to that <code>indexes.conf</code> You will need to restart splunk after adding the index.<br />
<code><br />
[cache]<br />
homePath   = $SPLUNK_DB/cache/db<br />
coldPath   = $SPLUNK_DB/cache/colddb<br />
thawedPath = $SPLUNK_DB/cache/thaweddb<br />
</code></li>
</ul>
<p>PART TWO - Testing by writing to a file:</p>
<p>I recommend that you first test reportcache by having it output to a file that you scan to make sure things look right. </p>
<ul>
<li>1. Find a search you want to cache. Simple candidate is something like the following report against the internal index that shows queue sizes by queue name.<br />
<code>index=_internal metrics "group=queue"  timechart avg(current_size) by name</code></li>
<li>2. Once you have a search you want to cache - add the following <code>"reportcache index=cache path=/tmp file=testcache.log notimestamp"</code> command to the end. The following assumes you have made an index named "cache". The <code>index</code> attribute is required and you should not use your default unless you know what your doing. Also we are going to output the file to /tmp/testcache.log</code> using the <code>file</code> and <code>path</code> attributes. The <code>notimestamp</code> option simply suppresses adding a timestamp to the filename.<br />
<code>index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache path=/tmp file=testcache.log notimestamp</code></li>
<li>3. Run the search and you should get back the normal search results and not see an error on the screen. If you do see an error it should be self explanatory.</li>
<li>4. Open the file /tmp/testcache.log and make sure the results look ok. They should look like a bunch of lines key=value, key=value</li>
</ul>
<p>PART THREE - Writing to an index:</p>
<ul>
<li>1.  We are now going to have the command put the results into the index. Simply remove the file, path and notimestamp attributes<br />
<code>index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache</code></li>
<li>2. Run the command - you should again see normal results and no error.</li>
<li>4. Wait 30 seconds or so...
<li>5. Run the following search to make sure results made it into the cache index - you should see your cache data after this search<br />
<code>index=cache</code></li>
<li>6. Now click on the report link and see if you can get your report back <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> This part is the somewhat odd part. All the fields should be as they were in the original search but many reports create keys with odd names. The best thing to do is to click around and see what reports you can make. You should be able to get back to the original search/report prior to the caching.</li>
</ul>
<p>PART FOUR - Enabling automatic caching:</p>
<p>After you have found and tested a search/report you want to cache moving forward:</p>
<ul>
<li>1. Save the search along with the reportcache command</li>
<li>2. Schedule the saved search on a small time frame ( 5, 15, 30, etc ) minutes</li>
<li>3. Test by waiting a few hours and looking at the results in the cache index.</li>
</ul>
<p>There is a good chance that either the above description was vague or that there is a bug / edge-case that i did not consider.<br />
One frequent problem i have seen is trying to cache data that has no timestamp. For example,<br />
<code>somesearch | top users</code><br />
will produce restults without timestamps. This makes a mess of the cached data. If you have this problem then try rewriting your search to something like:<br />
<code>somesearch | stats count first(_time) by users | where users != "" | sort -count </code><br />
The above will produce data that has both top and timestamps.</p>
<p>Few other things that are common requests:<br />
Often folks want to go back in time and create cached results for prior data. I have a script that can do that and will post it after more testing.<br />
Another common topic of conversation surrounds the over creation of summary data. In many cases it can be benificial to cache more stuff than you initally need in case you want to run reports later. I'm trying to think of good ways to automatically do this for you.</p>
<p>** IMPORTANT ** - drop me a line and let me know how something like this *should* work. I suspect that we will add a "checkbox" to saved searches that will automatically do the right thing. </p>
<p>I'll leave this post wit the usage info from the top of the search script.</p>
<p># usage: <some report search> | reportcache <reportcache options><br />
#   <reportcache options><br />
#       file=[filename] - default is current time<br />
#       path=[path] - default is $SPLUNK_HOME/var/spool/splunk<br />
#       index=[indexname] - which index to target for results. If blank will use whatever is bundled<br />
#       marker=[string] - this is just a token or k=v used to mark the results for version or other delination or to defeat crc caching<br />
#       format=["csv"|"splunk"] - use the output format "splunk" for feedking back into splunk or csv if you want to save for other tool<br />
#       appendtime - if true this will append current time. Its useful when you are doing something that you want with a timestamp of now<br />
#       notimestampe - is this arg is supplied it will suppress the timestamp in the filename<br />
#       debug - if debug then will just out args to screen</p>
<p># following example will put in var/spool/splunk a file named foo without timestamp, marked with erik=nextrun", and targed to index cache<br />
# index::_internal "group=pipeline" | timechart avg(executes) | cacher file="foo" notimestamp marker="erik=nextrun" index="cache"</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/valKkWDrt8A" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rory Greene: Scripted auth in preview</title><link>http://feedproxy.google.com/~r/splunkdev/~3/VcCAkrJbxtI/</link><comments>http://blogs.splunk.com/rory/2007/11/16/scripted-auth-in-preview/</comments><pubDate>Sat, 17 Nov 2007 00:37:03 +0000</pubDate><dc:creator>Rory Greene</dc:creator><description><![CDATA[Hey Kids, 
How are things? so I&#8217;ve made some progress in my attempt to code myself out of a job. Just checked the scripted auth into the preview branch which should be released in a few days. It&#8217;s very basic right now with more improvements to come. At the moment userLogin,  getUserType and getUserInfo [...]]]></description><content:encoded><![CDATA[<p>Hey Kids, </p>
<p>How are things? so I've made some progress in my attempt to code myself out of a job. Just checked the scripted auth into the preview branch which should be released in a few days. It's very basic right now with more improvements to come. At the moment userLogin,  getUserType and getUserInfo are the only methods you need to fill in. </p>
<p>I've written up a sample that interfaces with PAM on the linux, using /etc/passwd to get user lists. Mac users skip the pamauth.c compile you don't need this app and pam don't  like macs ( can't say I blame pam on that score)</p>
<p>First off a pamauth.c program to compile that will talk to pam for ya.  Donated by Phillppe Troin, thank you fif. Feel free to take and edit for your own purposes, but  you must send fif a chocolate chip cookie if you found it useful.</p>
<p>File pamauth.c is attach due to severe lameness on part of wordpress, insisting on screwing with the #include's</p>
<p><a href="/devuploads/2007/11/pamauth.c">pamauth.c</a></p>
<p>Compile that puppy like so<br />
gcc -Wall -Wextra -o pamauth pamauth.c -lpam</p>
<p>You may need to create an entry for pam<br />
edit /etc/pam.d/pamauth and put this line in<br />
auth        sufficient    pam_unix.so</p>
<p>To access pam root access is usually required so we will just set the pamauth script setuid instead of running splunk as root (which would be deeply stupid BTW).</p>
<p>as root:<br />
chown root.root pamauth; chmod a+s pamauth</p>
<p>You can test it by doing echo  PASSWORD | ./pamauth username<br />
returns 0 for auth passed<br />
returns 1 on fail.</p>
<p>K now that you have your nifty pam app running you need to add your python script that will interface<br />
with splunk. As they say on cooking shows, here's one we made earlier.</p>
<p>[source:py]<br />
# Required functions;<br />
# 1. userLogin    : login with username password pair<br />
# 2. getUserInfo  : get user information. passed back in the form.userId;username;password;realname;userType<br />
# 3. getUserType  : the splunk role to attach that user to.<br />
# optional functions<br />
# 1. getUsers     : Enumerate all users in the system, these will then be displayed on the user page in splunk.<br />
# Later release<br />
# 1. checkSession : Current version just auths and then splunk managed the session, this will allow<br />
#                   session management to be handled here. Careful though splunkd and the frontend<br />
#                   are quite chatty this will be called alot. If it's slow it will degrade performance.</p>
<p>import sys<br />
import subprocess</p>
<p>SUCCESS = "success"<br />
FAILED  = "fail"</p>
<p>PAM_EXE = ""</p>
<p>def writeToStdout( listIn ):<br />
   result = ""<br />
   for fu in listIn:<br />
      result = result + "[" + fu + "]"</p>
<p>   sys.stdout.write( result )</p>
<p>def readFromStdin( ):<br />
   input = sys.stdin</p>
<p>   inStr = ""<br />
   for line in input:<br />
      inStr = inStr + line</p>
<p>   inStr = inStr.replace( "[", "" )<br />
   return inStr.split( ']' )</p>
<p>def userLogin( infoIn ):<br />
   listFu = []<br />
   username = infoIn[0]<br />
   password = infoIn[1]</p>
<p>   command = PAM_EXE + infoIn[0]</p>
<p>   # our check with pam is done with a setuid program called pamauth<br />
   proc = subprocess.Popen( PAM_EXE +  ' %s' % username,<br />
                            shell=True,<br />
                            stdin=subprocess.PIPE,<br />
                            )<br />
   proc.communicate( password)<br />
   retCode = proc.wait()</p>
<p>   if retCode == 0:<br />
      listFu.append( SUCCESS )<br />
   else:<br />
      listFu.append( FAILED )</p>
<p>   return listFu</p>
<p>def getUsers( infoIn ):<br />
   listFu = []<br />
   listFu.append( SUCCESS )<br />
   # just going to use /etc/passwd here but you may use any method you wish.<br />
   FILE = open("/etc/passwd" ,"r")<br />
   fileLines = FILE.readlines()</p>
<p>   for line in fileLines:<br />
      userBits = line.split( ":" )<br />
      if userBits[6].find( '/bin/bash' ) != -1:<br />
         realname = userBits[4]<br />
         if realname == "" :<br />
            realname = userBits[0]<br />
         #              userId       username       password          realName       userType/splunk role<br />
         listFu.append( userBits[2] + ";" +userBits[0] + ";***********;" + realname + ";Admin" )</p>
<p>   FILE.close()</p>
<p>   return listFu</p>
<p># IN UserId<br />
# OUT [RESULT(SUCCESS|FAILED)][userType]<br />
def getUserType( infoIn ):<br />
   # Here you are given a userId<br />
   # you must return the user type (splunk role)<br />
   # I'm just going to make everyone an admin.<br />
   listFu = []<br />
   listFu.append( SUCCESS )<br />
   listFu.append( "Admin" )<br />
   return listFu</p>
<p>def getUserInfo( infoIn ):<br />
   listFu = []<br />
   listFu.append( SUCCESS )<br />
   #userId;<br />
   listFu.append( infoIn[0] + ";" + infoIn[0] + ";***********;" + infoIn[0] + ";Admin" )<br />
   return listFu</p>
<p>if __name__ == "__main__":<br />
   callName = sys.argv[1]<br />
   listIn = []<br />
   listIn = readFromStdin(  )</p>
<p>   returnList = []<br />
   if callName == "userLogin":<br />
      returnList = userLogin( listIn )<br />
   elif callName == "checkSession":<br />
      returnList = checkSession( listIn )<br />
   elif callName == "getUsers":<br />
      returnList = getUsers( listIn )<br />
   elif callName == "getUserType":<br />
      returnList = getUserType( listIn )<br />
   elif callName == "getUserInfo":<br />
      returnList = getUserInfo( listIn )<br />
   else:<br />
      returnList.append("ERROR call name no known" )<br />
      returnList.append( callName )</p>
<p>   writeToStdout( returnList )<br />
[/source]</p>
<p>Change the PAM_EXE variable in the script to point to the app that will check the password. On linux : the pamauth module you just compiled.  On Mac (the piano-accordion of computers): use chkpasswd program shipped with mac.</p>
<p>Now that you have a script auth plugin ready to go all you need to do now is tell splunk about it.</p>
<p>Example of the authentication.conf bundle.</p>
<p>[source]<br />
[auth]<br />
authSettings = fubar<br />
authType = Scripted</p>
<p>[fubar]<br />
programPath = /opt/splunk/bin/python<br />
scriptPath = /home/boo/splunk/scriptedAuth/flubber.py   # my python auth script.<br />
[/source]</p>
<p>Now pay attention here you do need to edit programPath and scriptPath to paths on your system.</p>
<p>Things left to do.<br />
1. Allow users to pass back search filters on userLogin and getUserType.<br />
2. Allow session management to be handled by scripted input. ( right not once auth is confirmed as correct splunk takes over session management).</p>
<p>Also this script will not return user lists on the mac ( not big deal you just can't see all users in the admin/users tab ).  Erik Swan has volunteered to fix this because he loves macs, a little too much really it's kinda unhealthy.</p>
<p>Download this and play with it, let me know of any problems.</p>
<p>I will publish more details on the communication between splunkd and the script but for the moment you folks can reverse engineer this, it's pretty simple, a lame wilder beast could figure it out.</p>
<p>More later, for now it's time for beer pong, played for cold hard cash and ugly women.</p>
<p>Ciao,<br />
Rory </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/VcCAkrJbxtI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rory/2007/11/16/scripted-auth-in-preview/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Carl Yestrau: Flash/AS3 URLStream Memory Leak</title><link>http://feedproxy.google.com/~r/splunkdev/~3/ckvLZJ8yUXE/</link><comments>http://blogs.splunk.com/carl/2007/11/16/flashas3-urlstream-memory-leak/</comments><pubDate>Fri, 16 Nov 2007 21:06:15 +0000</pubDate><dc:creator>Carl Yestrau</dc:creator><description><![CDATA[Lately we have been doing some work with persistent connections. If you are familiar with Comet the Flash/AS3 URLStream class provides an interesting alternative. The URLStream class exposes raw binary data as it is downloaded. 
Unfortunately, this week we ran into a rather tricky memory leak when using this nifty class. An event listener was [...]]]></description><content:encoded><![CDATA[<p>Lately we have been doing some work with persistent connections. If you are familiar with <a href="http://alex.dojotoolkit.org/?p=545">Comet</a> the Flash/AS3 <a href="http://livedocs.adobe.com/flex/2/langref/flash/net/URLStream.html">URLStream</a> class provides an interesting alternative. The URLStream class exposes raw binary data as it is downloaded. </p>
<p>Unfortunately, this week we ran into a rather tricky memory leak when using this nifty class. An event listener was subscribed to the <a href="http://livedocs.adobe.com/flex/2/langref/flash/net/URLStream.html#event:progress">progress</a> event and over time memory usage steadily increased to a point of making the browser inoperable.  </p>
<p>After a little digging we narrowed the problem down to the URLStreams usage of the <a href="http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/flash/utils/ByteArray.html">ByteArray</a>. It seems as if URLStream was reallocating a buffer for the array and the short turn around time (on the reads) was not giving the garbage collector enough time to throw out the old allocation. </p>
<p>The way the leak could be corrected was by deleting the ByteArray (Set null), forcing garbage collection of the read buffer.</p>
<p>Here is the workaround:</p>
<p><code><br />
	var bytes:ByteArray = new ByteArray();<br />
	this.readBytes(bytes, 0, this.bytesAvailable);<br />
	bytes = null;<br />
</code></p>
<p>Happy Streams!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/ckvLZJ8yUXE" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/carl/2007/11/16/flashas3-urlstream-memory-leak/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: reallyDescriptiveNames</title><link>http://feedproxy.google.com/~r/splunkdev/~3/y0HwPqqaRQs/</link><comments>http://blogs.splunk.com/nick/2007/11/07/reallydescriptivenames/</comments><pubDate>Wed, 07 Nov 2007 23:20:10 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[I have a funny habit with our code in the front end, where if something&#8217;s just too complicated, but i cant see the better solution yet, I&#8217;ll give its pieces long descriptive names.  It&#8217;s basically so they&#8217;ll stick out later, we&#8217;ll think &#8216;why is this thing so ugly and complicated&#8217;,  and it&#8217;ll help [...]]]></description><content:encoded><![CDATA[<p>I have a funny habit with our code in the front end, where if something's just too complicated, but i cant see the better solution yet, I'll give its pieces long descriptive names.  It's basically so they'll stick out later, we'll think 'why is this thing so ugly and complicated',  and it'll help us remember to revisit it.   (btw, I'm not claiming that this is good development practice, it's just a trick i use, faintly reminiscent of the <a href="http://catb.org/jargon/html/B/blue-wire.html">blue-wire red-wire</a> stuff in the Mythical Man Month).</p>
<p>So anyway, I bring it up cause Johnvey saw one of it's cousins out in the wild, taking the whole concept to an extreme. <a href="http://worsethanfailure.com/Articles/Really-Descriptive-Names.aspx">Check it out</a>.   </p>
<p>Arguably though, this is so extreme that it's not reallyDescriptiveNames at all, but closer kin to a sort of passiveAggressiveWorkplaceSabotageAdapter.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/y0HwPqqaRQs" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/11/07/reallydescriptivenames/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Amrit Bath: Saving the environment, one beer pong game at a time.</title><link>http://feedproxy.google.com/~r/splunkdev/~3/R2NtQl9NNkw/</link><comments>http://blogs.splunk.com/amrit/2007/11/05/saving-the-environment-one-beer-pong-game-at-a-time/</comments><pubDate>Mon, 05 Nov 2007 21:42:07 +0000</pubDate><dc:creator>Amrit Bath</dc:creator><description><![CDATA[Recycling is universally considered to be a good thing, right?
Good.  Then that means that we at Splunk are obligated to play play beer pong every Friday!  I figure that with all the bottles and cans that subsequently go into the recycling bin, we&#8217;re probably  offsetting a small percentage of the many computers [...]]]></description><content:encoded><![CDATA[<p>Recycling is universally considered to be a good thing, right?</p>
<p>Good.  Then that means that we at Splunk are <em>obligated</em> to play play beer pong every Friday!  I figure that with all the bottles and cans that subsequently go into the recycling bin, we're probably  offsetting a small percentage of the many computers we use here... amirite?</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/11/beerPongAtSplunk.jpg" alt="Al Gore would be proud" /></p>
<p>If you disagree, you can voice your opinions in person.  See you here Friday at 5PM.  <img src='http://blogs.splunk.com/amrit/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/R2NtQl9NNkw" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/amrit/2007/11/05/saving-the-environment-one-beer-pong-game-at-a-time/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Splunk Hack #3 - Splunk on Rails</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Bs0X4VidiQw/</link><comments>http://blogs.splunk.com/kordless/2007/11/02/splunk-hack-3-splunk-on-rails/</comments><pubDate>Sat, 03 Nov 2007 01:09:08 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[Ruby on Rails is a popular programming framework for quickly creating web applications.  It provides its own web server for development testing, and ships with OSX, which means the tools are now widely available to a broad group of programmers/coders/hackers.  Coupled with the fact that most Rails developers use either Linux or OSX, [...]]]></description><content:encoded><![CDATA[<p><a href="http://rubyonrails.com/">Ruby on Rails</a> is a popular programming framework for quickly creating web applications.  It provides its own web server for development testing, and ships with OSX, which means the tools are now widely available to a broad group of programmers/coders/hackers.  Coupled with the fact that most Rails developers use either Linux or OSX, and Splunk runs great on both of those platforms, it seemed obvious that we should come up with some sort of solution for mashing the two together.</p>
<p>I mentioned this in passing to one <a href="http://seandick.com/">Sean Dick</a> who is a developer friend of mine in Oklahoma City.  What follows is a nearly identical post to the one he made over at his <a href="http://seanmdick.blogspot.com">self-named blogpost</a> on Blogger on how to get Rails to integrate with Splunk.  "There's plenty left to do.", he said, but I'm convinced it's worthy of mentioning here.  Thanks for hammering this out Sean!</p>
<p><b>Serious Material from Sean Begins Here</b></p>
<p>As per the norm, this post assumes you've <a href="http://www.splunk.com/download/?ac=kc3">downloaded Splunk</a> for your particular platform.  It also requires a newer install of <a href="http://www.rubyonrails.com/">Ruby on Rails</a>.  Come back when you've completed both these tasks.</p>
<p>Get Splunk started now:<br />
<code><br />
> sudo export SPLUNK_HOME=/opt/splunk/<br />
> sudo ./opt/splunk/bin/splunk start<br />
</code></p>
<p>You need to drop the <a href="http://blogs.splunk.com/devuploads/2007/11/splunkonrails.rb.zip">Splunk on Rails plugin</a> into your Rails app's lib folder and then call it with <strong>require 'splunkbase'</strong>  Note: If you're running Splunk remote or on a non-standard port, don't forget to change the SERVER variable in the plugin file!</p>
<p>Now let's say you want to make sure your rails application is running bug-free, and when one does pop up, you need to know it pronto.  You'll create a new controller into which we're going to put some splunk goodies. I named mine SplunkController, but you can be more creative.</p>
<p><code><br />
class SplunkController &amp;lt; ApplicationController<br />
require 'splunkbase'<br />
@@foo = SplunkBase.new<br />
def index<br />
end<br />
def reports<br />
  @document =" @@foo.splunkSearch('q' => params[:query])<br />
end<br />
end<br />
</code></p>
<p>This is really nothing more than making available the response from splunk to your view in the form of a variable. Defining a page for it to be displayed in is no more difficult than:<br />
<code><br />
&amp;lt;pre&amp;gt;&amp;lt;%= @document %&amp;gt; &amp;lt;/pre&amp;gt;<br />
</code></p>
<p>Now we build the index page we defined so we can pass it the query:<br />
<code><br />
&amp;lt;html><br />
   &amp;lt;head><br />
        &amp;lt;%= javascript_include_tag "prototype" %><br />
   &amp;lt;/head><br />
   &amp;lt;body ><br />
        &amp;lt;%= form_remote_tag(:update => "graphDiv",<br />
                    :url => {:action => :reports }) %><br />
             &amp;lt;%= text_field_tag :query, nil, {:size => "100"} %></p>
<p>             &amp;lt;%= submit_tag "Get a report on your query" %><br />
        &amp;lt;%= end_form_tag %><br />
        &amp;lt;div id="graphDiv"><br />
        &amp;lt;/div><br />
   &amp;lt;/body><br />
&amp;lt;/html><br />
</code></p>
<p>And believe it or not we're ready to start asking Splunk some questions. Try giving it something like:<br />
<code><br />
[search sourcetype::what_you_named_your_source error starthoursago=24] | outputxml<br />
</code></p>
<p>As you've probably gathered, that'll give you a formatted list of all of the errors that have occurred in the last day.</p>
<p>Now let's say you've got a rails application running internally that you don't have the option to/don't feel comfortable with outsource analysis to something like Google Analytics.  Back to our cute little controller, we add in a new definition for the graphing page.<br />
<code><br />
class SplunkController &amp;lt; ApplicationController<br />
 require 'splunkbase' #those two magical words<br />
 @@foo = SplunkBase.new<br />
 def index<br />
 end<br />
 def reports<br />
   @document = @@foo.splunkSearch('q' => params[:query])<br />
 end<br />
 def graph #for graphing, this fixes things up so we can display the data<br />
  @datahash = {}     @queryDoc= @@foo.splunkSearch('q' => params[:query]) #here is the meat<br />
  @queryDoc.each_element("//r/") do |ele| #here we're sorting out what is useful<br />
    @datahash[ele.elements["m[@col='1']"].text] = ele.elements["m[@col='2']"].text.to_i<br />
  end<br />
  @sorted = @datahash.values.sort.reverse  #sorting it for the hell of it<br />
  @chartheight = @datahash.values.max + 50 #to make it look pretty and consistent<br />
end<br />
end<br />
</code></p>
<p>This example is fairly simple and assumes you're just looking for basic metrics on your site's usage. You could build it larger to accept whatever you want splunk to throw back at you. This one expects to see something like <code>"Content Name" => "value"</code>.</p>
<p>Now let's take a stab at setting up the graph:</p>
<p><code><br />
  &amp;lt;samp>&amp;lt;%= @queryDoc.to_s %>&amp;lt;/samp> #gives us a raw return of the data we pulled from Splunk<br />
&amp;lt;/div><br />
&amp;lt;% @sorted.each do |name, height| %> #Iterate through each of the data pairs and grab the height.<br />
  &amp;lt;div class="columnSpacer"><br />
      &amp;lt;div style="margin-top: &amp;lt;%= 100 - ((height * 100)/@chartheight.to_f) %>%"class="graphTitle"><br />
          &amp;lt;%= name %><br />
          &amp;lt;br><br />
          &amp;lt;%= height %> hits<br />
      &amp;lt;/div><br />
      &amp;lt;div class="graphColumn" style="height: &amp;lt;%= (height * 100)/@chartheight.to_f)%>%"></p>
<p>  &amp;lt;/div><br />
&amp;lt;% end %><br />
</code></p>
<p>In the interest of keeping things from getting too esoteric I've committed a no-no and left some programming in the view. All in all, it's pretty light math to get things displaying properly. As you should be able to glean from the code presented we're just iterating through each of the name/value pairs we extracted from the XML Splunk returned and turning them into pretty little bars on a chart. Now all we need to do is put together the index page for accessing the graph function.</p>
<p><code><br />
&amp;lt;html><br />
   &amp;lt;head><br />
        &amp;lt;%= javascript_include_tag "prototype" %><br />
   &amp;lt;/head><br />
   &amp;lt;body ><br />
        &amp;lt;%= form_remote_tag(:update => "graphDiv",<br />
                    :url => {:action => :graph }) %><br />
             &amp;lt;%= text_field_tag :query, nil, {:size => "100"} %></p>
<p>             &amp;lt;%= submit_tag "Get a report on your query" %><br />
        &amp;lt;%= end_form_tag %><br />
        &amp;lt;div id="graphDiv"><br />
        &amp;lt;/div><br />
   &amp;lt;/body><br />
&amp;lt;/html><br />
</code></p>
<p>There's pretty minimal monkey business here, so let's go on to the fun part:</p>
<p>Let's take a look at what controllers are getting the most face-time for our users and what content sections are being perceived as being the most useful. This example comes from a site I did recently for a client and happens to be the most handy rails logfile I have within reach.</p>
<p>Pop in the query:<br />
<code><br />
[search sourcetype="the_name_you_gave_your_source" | top 5 controller ] | outputxml<br />
</code></p>
<p>And you get something like this:</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/11/controllerchart.png"/></p>
<p>There are a myriad of options available to you through Splunk's search interface, and learning to romance the queries to give you what you want would be a section all its own. This one, however consists of limiting the scope of the search ( sourcetype= ) and giving it a context to put it in ( top 5 controller )  -  in this case, the top five controllers.</p>
<p>Next post I will cover the possibilities afforded with the use of bundles in Splunk in conjunction with your Rails application. In the meantime I highly suggest you peruse the REST API documentation supplied in your Splunk install and the admin/developer documentation on Splunk.com to get a more in-depth understanding of what you can do.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Bs0X4VidiQw" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2007/11/02/splunk-hack-3-splunk-on-rails/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Tutorial: Event Types in 3.2</title><link>http://feedproxy.google.com/~r/splunkdev/~3/U7qMUNeQBXg/</link><comments>http://blogs.splunk.com/david/2007/10/27/tutorial-event-types-in-32/</comments><pubDate>Sun, 28 Oct 2007 03:30:49 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Hi, I&#8217;m David Carasso, perhaps you&#8217;ve seen my famous File Classifier Video. It&#8217;s the number one video at CurrentTV.
Below is a second screen capture video that I just made to describe Splunk&#8217;s new Event Typer. The Event Typer dynamically tags system events in custom, yet, universal ways. For example, I can say that for any [...]]]></description><content:encoded><![CDATA[<p>Hi, I'm David Carasso, perhaps you've seen my famous File Classifier Video. It's the number one video at CurrentTV.</p>
<p>Below is a second screen capture video that I just made to describe Splunk's new Event Typer. The Event Typer dynamically tags system events in custom, yet, universal ways. For example, I can say that for any event that happens on Sunday, that has 'status=Fatal', and that has "sourcetype=weblogic", to be dynmaically tagged as a "weekend_fatal_weblogic" event. Topics covered include: what is an event type; how to search, view, and count event types; creating an event type; creating an event-type template; and discovering event-types.</p>
<p>Yes, production value is what you've come to expect from a Carasso Production. That's right 15 minutes of unscripted nerd talk. Now with a bonus 45 seconds of video as I type in an off-camera window. But I promise you'll learn a few useful things you didn't know.<br />
<a href="http://blogs.splunk.com/devuploads/2007/10/eventtype1.mov">EventTyperVideo (15 minutes of emacs magic)<br />
</a></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/U7qMUNeQBXg" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2007/10/27/tutorial-event-types-in-32/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kim Wallace: Stupid Perforce Trick #1</title><link>http://feedproxy.google.com/~r/splunkdev/~3/YFxfc5mGadA/</link><comments>http://blogs.splunk.com/kim/2007/10/26/stupid-perforce-trick-1/</comments><pubDate>Fri, 26 Oct 2007 22:13:28 +0000</pubDate><dc:creator>Kim Wallace</dc:creator><description><![CDATA[We use Perforce at Splunk, and it&#8217;s worked out pretty well for us. I&#8217;m a CVS admin at heart, and I know there&#8217;s some SVN sentiment, but p4 gives us a nice mix of atomic commits, attractive GUI and command-line tools, and someone to call for help if it ever completely eats itself.
Over time I&#8217;ve [...]]]></description><content:encoded><![CDATA[<p>We use Perforce at Splunk, and it's worked out pretty well for us. I'm a CVS admin at heart, and I know there's some SVN sentiment, but p4 gives us a nice mix of atomic commits, attractive GUI and command-line tools, and someone to call for help if it ever completely eats itself.</p>
<p>Over time I've compiled a small library of scripts for various p4 functions that have been written time and again at different sites...<a href="http://blogs.splunk.com/devuploads/2007/10/mergetool.tar">mergetool</a> is one of them. This little tool accepts a merge target ("yours" in p4-speak) and projectile ("theirs" in p4), labels both, performs an integrate, and performs a "safe" resolve -as. It logs any failures for you to resolve by hand, or submits the change set if the resolve completes successfully. It does this with a bunch of logging in a well-organized, date-stamped directory suitable for archiving (or splunking).</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/YFxfc5mGadA" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kim/2007/10/26/stupid-perforce-trick-1/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Tutorial: File Classifier</title><link>http://feedproxy.google.com/~r/splunkdev/~3/A7pqKqnmZ9k/</link><comments>http://blogs.splunk.com/david/2007/10/26/tutorial-file-classifier/</comments><pubDate>Fri, 26 Oct 2007 19:50:51 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Hi, I&#8217;m David Carasso and below is a screen capture video I just made to describe Splunk&#8217;s File Classifer.  The File Classifier takes a file and tell you what type it is. From that sourcetype we determine what to do with the file and how to process it.  It&#8217;s pretty critical for properly [...]]]></description><content:encoded><![CDATA[<p>Hi, I'm David Carasso and below is a screen capture video I just made to describe Splunk's File Classifer.  The File Classifier takes a file and tell you what type it is. From that sourcetype we determine what to do with the file and how to process it.  It's pretty critical for properly handling a file, including time-stamping events and aggregating multiple lines into single events.  There are several methods that the File Classifer uses to classify a file, and we'll cover each one with real-world examples.</p>
<p>Yes, production value is at a new low here as I cover 18 minutes unscripted, but I promise you'll learn a few useful things you didn't know. There's a free Splunk t-shirt for the commentor that guesses the actual number of times I say "uhhhhh".</p>
<p><a href="http://blogs.splunk.com/devuploads/2007/10/D_Carasso_File_Classifier-NativeStreaming.mov">File ClassifierVideo (18 minutes of action packed emacs video)</a></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/A7pqKqnmZ9k" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2007/10/26/tutorial-file-classifier/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Carl Yestrau: JavaScript Hybrids (Extending the browser) - Part 1</title><link>http://feedproxy.google.com/~r/splunkdev/~3/BXKrOgPo3y0/</link><comments>http://blogs.splunk.com/carl/2007/10/15/javascript-hybrids-extending-the-browser-part-1/</comments><pubDate>Mon, 15 Oct 2007 20:33:38 +0000</pubDate><dc:creator>Carl Yestrau</dc:creator><description><![CDATA[I deeply enjoy browser programming, however sometimes I wish it could do more. Things like sockets, streams, audio and improved file system handling would be a real treat. Man would it be fresh if I had access to this functionality in JavaScript.
Now this is going to sound pretty circa 98, but several main stream browser [...]]]></description><content:encoded><![CDATA[<p>I deeply enjoy browser programming, however sometimes I wish it could do more. Things like sockets, streams, audio and improved file system handling would be a real treat. Man would it be fresh if I had access to this functionality in JavaScript.</p>
<p>Now this is going to sound pretty circa 98, but several main stream browser plugins support a JavaScript communication layer. According to the <a href="http://www.adobe.com/products/player_census/flashplayer/">Millward Brown survey</a> plugin installations of Flash (99%) and Java (85%) are pretty ubiquitous. </p>
<p><b>Flash/JavaScript Communication</b><br />
The Flash <a href="http://livedocs.adobe.com/flex/201/langref/flash/external/ExternalInterface.html">ExternalInterface</a> class enables communication between JavaScript and the Flash Player.  ExternalInterface was first introduced in ActionScript 1.0; so Flash Player 8 is the minimum plugin version required.<br />
From JavaScript</p>
<ul>
<li>Call an ActionScript function</li>
<li>Pass arguments</li>
<li>Return a value to the JavaScript callee</li>
</ul>
<p>From ActionScript</p>
<ul>
<li>Call a JavaScript function</li>
<li>Pass arguments</li>
<li>Pass various data types (Boolean, Number, String, etc...)</li>
</ul>
<p><b>Java Applet/JavaScript Communication</b><br />
The scarcely documented <a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:LiveConnect">LiveConnect</a> API provides JavaScript with the ability to call methods of Java classes and vice-versa. Using LiveConnect in applets requires the <a href="http://java.sun.com/javase/6/docs/technotes/guides/plugin/developer_guide/java_js.html#enablingjsobjectsupport">mayscript</a> attribute and the plugin.jar package for newer versions of Java (Howto for <a href="http://developer.apple.com/qa/qa2004/qa1364.html">Mac OS X</a> users). Communication from Java to JavaScript is mitigated through the <a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:LiveConnect:JSObject">netscape.javascript.JSObject</a> class. JavaScript exceptions in Java can be handled using the <a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:LiveConnect:JSException">netscape.javascript.JSException</a> class. Public methods in an applet can be called using the applet container object followed by the method name and arguments (e.g., document.getElementById("myapplet").publicAppletMethod(arg1, argN);).</p>
<p>From JavaScript</p>
<ul>
<li>Call a Java method</li>
<li>Pass arguments</li>
<li>Return a value to the JavaScript callee</li>
</ul>
<p>From Java</p>
<ul>
<li>Call a JavaScript function (Note: does not seem to support deep objects obj.foo(arg))</li>
<li>Pass arguments</li>
<li>Pass various data types (Boolean, Number, String, etc...)</li>
</ul>
<p>It looks like LiveConnect is <a href="http://boomswaggerboom.wordpress.com/2007/04/16/javaplugin-cleanup-for-mozilla-20/">due for an overhaul</a> in the near future, so you may want to keep your eyes out for changes on Mozilla developer <a href="http://boomswaggerboom.wordpress.com/">Josh Aas's blog</a>.   </p>
<p><b>What's Next</b><br />
With the power of Java and Flash this opens up the arena for creating visually hidden gateways (i.e., width:0px; height:0px; applets or swf movies) that extend the browser. Stay tuned for the next part in this series where we make a sample application. Feel the power!</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/BXKrOgPo3y0" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/carl/2007/10/15/javascript-hybrids-extending-the-browser-part-1/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kim Wallace: Being the girl in dev at Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/2jCtKmre3lA/</link><comments>http://blogs.splunk.com/kim/2007/10/12/being-the-girl-in-dev-at-splunk/</comments><pubDate>Sat, 13 Oct 2007 04:59:59 +0000</pubDate><dc:creator>Kim Wallace</dc:creator><description><![CDATA[Like a lot of tech companies, Splunk&#8217;s development organization isn&#8217;t a model of perfect gender balance. For a year and a half now, I&#8217;ve been the only woman in the dev organization. 
Surprisingly, this is not an uncomfortable place to be. In 11 years in industry I&#8217;ve worked in a variety of organizations: the now-bankrupt [...]]]></description><content:encoded><![CDATA[<p>Like a lot of tech companies, Splunk's development organization isn't a model of perfect gender balance. For a year and a half now, I've been the only woman in the dev organization. </p>
<p>Surprisingly, this is not an uncomfortable place to be. In 11 years in industry I've worked in a variety of organizations: the now-bankrupt dot-com best known for putting an ad with a naked guy up during the Super Bowl, 2 major marquee names with vastly differing corporate cultures, a security start-up stocked with emancipated-minor hackers. Aside from that doomed dot-com  -  which had a surprisingly strong gender balance throughout technical roles and a culture blessedly free of gender-based intimidation at all levels  -  Splunk may be the most comfortable place I've ever worked. There's no creepy tokenism (unlike stories I've heard about certain other bay area employers), That Guy Who's Never Seen A Girl Before doesn't work here...and as far as I can tell, no one really gets harassed except Amrit. </p>
<p>Perhaps a better testament for the dev culture than my opinion  -  because, frankly, I'm pretty weird to start with  -  is that other women in the company seem to be pretty comfortable visiting the dev area, either on work errands or just to take a break from the sales-focused environment upstairs. Frankly I can't imagine that happens too often in the bay area...and more's the pity.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/2jCtKmre3lA" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kim/2007/10/12/being-the-girl-in-dev-at-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: Semi-Automatic Discovery of Extraction Patterns for Log Analysis</title><link>http://feedproxy.google.com/~r/splunkdev/~3/-jTFwm-53ag/</link><comments>http://blogs.splunk.com/david/2007/10/12/semi-automatic-discovery-of-extraction-patterns-for-log-analysis/</comments><pubDate>Fri, 12 Oct 2007 17:15:46 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[Here&#8217;s a paper I recently wrote on some of the automatic field extraction we&#8217;re doing with Splunk.
Abstract
This paper presents an interactive bootstrapping process used in Splunk that automatically learns to extract fields from log events. End users simply select one or more example values of a field and a learning process discovers additional instances, along [...]]]></description><content:encoded><![CDATA[<p>Here's a paper I recently wrote on some of the automatic field extraction we're doing with Splunk.</p>
<blockquote><p><strong>Abstract</strong><br />
This paper presents an interactive bootstrapping process used in Splunk that automatically learns to extract fields from log events. End users simply select one or more example values of a field and a learning process discovers additional instances, along with the patterns to extract them. The user is able to correct the instances and save the extraction patterns. Immediately afterward, while searching log events  the newly-taught fields will be extracted from the event's raw text.</p></blockquote>
<p><a href="http://blogs.splunk.com/devuploads/2007/10/autoextract.pdf">Click here to read full paper<br />
<img src="http://blogs.splunk.com/devuploads/2007/10/thumbnail.thumbnail.jpg" /></a></p>
<p>Feedback appreciated.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/-jTFwm-53ag" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2007/10/12/semi-automatic-discovery-of-extraction-patterns-for-log-analysis/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Trekking in the Galapagos</title><link>http://feedproxy.google.com/~r/splunkdev/~3/b6Ww4S_YRhk/</link><comments>http://blogs.splunk.com/johnvey/2007/10/11/trekking-in-the-galapagos/</comments><pubDate>Fri, 12 Oct 2007 03:34:58 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[The Splunk cozy has been to a few countries around the world.  This month, I took it to the Galapagos, and decided to leave it there at Post Office Bay amongst all the other plaques and memorabilia.  I think it&#8217;ll be very comfortable for a while.  See the rest of my Galapagos [...]]]></description><content:encoded><![CDATA[<p>The Splunk cozy has been to a few countries around the world.  This month, I took it to the Galapagos, and decided to leave it there at <a href="http://www.v-liz.com/galapagos/floreana.htm">Post Office Bay</a> amongst all the other plaques and memorabilia.  I think it'll be very comfortable for a while.  See the rest of my <a href="http://www.johnvey.com/photos/galapagos/">Galapagos photo gallery</a>.<br />
<br />
<img src="http://farm3.static.flickr.com/2324/1548528228_58ef308d17.jpg" alt="The Galapagos" /><br />
<br />
<img src="http://farm3.static.flickr.com/2215/1548535056_df690052cd.jpg" alt="The Galapagos" /></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/b6Ww4S_YRhk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/10/11/trekking-in-the-galapagos/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rob Das: Diagraming Splunk’s data-flow (part 2 - performance overlays)</title><link>http://feedproxy.google.com/~r/splunkdev/~3/j1hviVsVKV4/</link><comments>http://blogs.splunk.com/rob/2007/10/11/diagraming-splunk%e2%80%99s-data-flow-part-2-performance-overlays/</comments><pubDate>Fri, 12 Oct 2007 00:49:01 +0000</pubDate><dc:creator>Rob Das</dc:creator><description><![CDATA[In my previous post &#8220;Diagraming Splunk&#8217;s data-flow&#8221; I wrote a small python script that parsed Splunk&#8217;s runtime environment ($SPLUNK_HOME/var/run/splunk/composite.xml) and generated a file which when input into graphviz would generate a nice architectural diagram of how pipelines and processors are wired together.
In this installment, I took it to the next level by using Splunk&#8217;s search [...]]]></description><content:encoded><![CDATA[<p>In my previous post "Diagraming Splunk's data-flow" I wrote a small python script that parsed Splunk's runtime environment ($SPLUNK_HOME/var/run/splunk/composite.xml) and generated a file which when input into <a href="http://www.graphviz.org/">graphviz</a> would generate a nice architectural diagram of how pipelines and processors are wired together.</p>
<p>In this installment, I took it to the next level by using Splunk's search capability to overlay performance metrics on the diagram.  The combination of Splunk logging metrics information for each processor within each pipeline (thanks Brad) and the ability to have Splunk execute a <em>search processor </em>written in Python made this possible.  Here is how you use it:</p>
<p>First download <a href="http://www.graphviz.org/">graphviz</a>.  I particularly like the OSX application that they've written because you can see the graph on the screen and as the file changes, those changes are reflected in the graph you are viewing.  If you don't have a Mac, use the command line version to generate different types of output file formats like .jpeg, etc.</p>
<p>Go to <a href="http://www.splunkbase.com/addons/Search_Commands/Splunk/Performance_tuning/addon:Perfgraph">SplunkBase</a> to download my python script.  Copy the .py file into $SPLUNK_HOME/etc/searchscripts</p>
<p>Start Splunk.</p>
<p>Type the following into the search box:<img alt="index___internal metrics pipeline processor NOT get - over all time - localhost - Splunk 3.2-UNSTABLE-4.jpg" src="http://blogs.splunk.com/devuploads/2007/10/index___internal%20metrics%20pipeline%20processor%20NOT%20get%20-%20over%20all%20time%20-%20localhost%20-%20Splunk%203.2-UNSTABLE-4.jpg" /><br />
This will search for the appropriate metrics information and pipe the results through the script.</p>
<p>There are 2 options to perfgraph:</p>
<p><em>perfgraph [output filename] [cpu, execs, cumhits]</em></p>
<p>Unfortunately (because I'm lazy) you can't specify cpu, execs or cumhits without also specifying an output file.The  parameter is the full path and file name of the 'dot' file you wish to create.  It defaults to /tmp/out.dot.</p>
<p>The second parameter, if specified tells the script to highlight in red the slowest processor (cpu), the processor with the most hits (execs) or the processor with the most cumulative hits (cumhits).  This parameter defaults to 'none', or no highlighting.</p>
<p>The above search string results in the following graph (portion).  Notice the performance information overlayed into the processors:<br />
<img alt="out.dot-1.jpg" src="http://blogs.splunk.com/devuploads/2007/10/out.dot-1.jpg" /></p>
<p>If you specify the output file and 'cpu', the processor with the most cpu time will be highlighted.  Here is the search:</p>
<p><img alt="index___internal metrics pipeline processor NOT get | perfgraph _tmp_out.dot cpu - over all time - localhost - Splunk 3.2-UNSTABLE.jpg" src="http://blogs.splunk.com/devuploads/2007/10/index___internal%20metrics%20pipeline%20processor%20NOT%20get%20%7C%20perfgraph%20_tmp_out.dot%20cpu%20-%20over%20all%20time%20-%20localhost%20-%20Splunk%203.2-UNSTABLE.jpg" /></p>
<p>It results in the following graph (portion).  Notice the red processor:</p>
<p><img alt="out.dot-2.jpg" src="http://blogs.splunk.com/devuploads/2007/10/out.dot-2.jpg" /></p>
<p>Next steps:</p>
<ul>
<li>Overlay queue metrics into the queue nodes</li>
<li>Overlay indexer throughputs into the indexer nodes</li>
</ul>
<p>You see.  Splunk provides endless fun.  Insane!  Enjoy.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/j1hviVsVKV4" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rob/2007/10/11/diagraming-splunk%e2%80%99s-data-flow-part-2-performance-overlays/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Splunk Hack #2 - Logging Safari Requests on the iPhone</title><link>http://feedproxy.google.com/~r/splunkdev/~3/UKYgbg2j5Gs/</link><comments>http://blogs.splunk.com/kordless/2007/10/10/splunk-hack-2-logging-safari-requests-on-the-iphone/</comments><pubDate>Thu, 11 Oct 2007 05:50:56 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[Mark Cohen posted a while back about enabling syslog on the iPhone for the sole purpose of logging to a Splunk instance on your laptop.  This hack is a follow up to that post, and extends it slightly to include logging of the pages browsed by Safari on the phone.  WARNING: If you [...]]]></description><content:encoded><![CDATA[<p><a href="http://dev.splunk.com/author/mark/">Mark Cohen</a> posted a while back about <a href="http://dev.splunk.com/2007/08/26/splunking-your-iphone/">enabling syslog on the iPhone</a> for the sole purpose of logging to a Splunk instance on your laptop.  This hack is a follow up to that post, and extends it slightly to include logging of the pages browsed by Safari on the phone.  WARNING: If you brick your phone, you can still use it as an ergonomic pot-scraper.  Splunk won't be responsible for you going off and getting your <del>$600</del> $400 piece of joy stuffed, but we'll be happy to log the event.</p>
<p>Let's get dirty.  Go into settings..general..auto-lock and set locking to 'never'.  This will keep the phone on while you hack around on it.   Keeping the phone on and connected to the network will drain your battery like nobody's business, so make sure you plug in the charging cable.</p>
<p>Now install <a href="http://iphone.nullriver.com/beta/">AppTap</a>.  Follow the instructions, and come back here when you are all done.</p>
<p>Using the AppTap installer on the phone, install the Community Sources, BSD Subsystem, Term-vt100, OpenSSH, Tinyproxy, and UIctl apps, in that order.  UIctl will let you stop and start sshd on the phone.  Launch it now to see if sshd is running.  Click on the 'load' button if it's not.</p>
<p>Ping your phone from your computer with its IP address.  You can use the terminal on the phone to grab the IP address:</p>
<p><code><br />
# ifconfig en0<br />
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500<br />
        inet 10.0.1.194 netmask 0xffffff00 broadcast 10.0.1.255<br />
        ether 00:1c:b3:f0:0b:a6<br />
#<br />
</code></p>
<p>Ssh to the phone from your terminal.  The default root password is 'dottie'.</p>
<p><code><br />
foobar:~ kord$ ssh root@10.0.1.194<br />
root@10.0.1.194's password:<br />
Last login: Wed Oct 10 13:45:22 2007 from 10.0.1.191<br />
# hostname<br />
Kord's iPhone<br />
#<br />
</code></p>
<p>Now add a syslog.conf file to /etc/: </p>
<p><code><br />
bash-3.2# echo "*.* @10.0.1.191" > /etc/syslog.conf<br />
bash-3.2# cat /etc/syslog.conf<br />
*.* @10.0.1.191<br />
</code></p>
<p>Obviously, you'll want to use the IP address of the machine on which you are going to install Splunk.  Speaking of Splunk, at this point you should already have it installed.  If you don't, <a href="http://www.splunk.com/index.php/predownload?d=progeneric?ac=kc1">download it here</a>, and install it now.  You can <a href="http://dev.splunk.com/2007/10/07/charting-your-osx-battery-usage-with-splunk/">reference my first hack</a> for instructions on getting Splunk up and running quickly on your system.    Smile.   Splunk goooood.</p>
<p>Back in your ssh session to the iPhone, you'll need to move the syslogd executable to an alternate location, kill the old instance, and start the new one with a few parameters.</p>
<p><code><br />
# cd /usr/sbin/<br />
# mv syslogd syslogd.mine<br />
# launchctl stop com.apple.syslogd<br />
....wait for about 5 seconds....<br />
# /usr/sbin/syslogd.mine -bsd_out 1 &amp;#038;<br />
</code></p>
<p>Syslogd should now use the new /etc/syslog.conf file that you just created when it starts up.  You can check if it's running properly:</p>
<p><code><br />
# ps -ax |grep syslog<br />
  110  p0  S      0:02.91 /usr/sbin/syslogd.mine -bsd_out 1<br />
#<br />
</code></p>
<p>Now fire up Splunk, and hit your instance of it in a browser:  http://localhost:8000.  Click on the 'admin' link in the top right, click on the 'data inputs' tab at the top, 'network ports' just below that, and then click on the 'add input' button to the right.</p>
<p>Click on the UDP radio button under 'source'.  The port listed should change to 514.  Click on the 'add' button at the bottom.  You should now be getting data coming into Splunk on UDP port 514.  Grab some coffee whilst Splunk eats ALL the logfiles coming in from the iPhone.</p>
<p>Now let's get Tinyproxy serving requests for Safari on the phone and logging through syslogd.  Check that Tinyproxy is running on the iPhone first:</p>
<p><code><br />
# ps -ax |grep tiny<br />
  354  ??  S      0:00.10 /usr/bin/tinyproxy<br />
  355  ??  S      0:00.00 /usr/bin/tinyproxy<br />
 1428  p1  S+     0:00.01 grep tiny<br />
</code></p>
<p>Edit tiny's configuration file to set his logs to go to syslogd.  Keep in mind there is more to the config file than the few lines that I'm showing.</p>
<p><code><br />
# vi /usr/local/etc/tinyproxy/tinyproxy.conf<br />
~<br />
# log only errors<br />
#Logfile "/var/log/tinyproxy.log"<br />
#LogLevel Info<br />
Syslog On<br />
</code></p>
<p>Now on the iPhone, go to settings..wifi networks..<your network ssid>..http proxy.  Enter the host as 127.0.0.1 and the port as 8080, just as you see in the screenshot below:</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/10/foo_0.png"/></p>
<p>Lastly, kill Tinyproxy so he'll start logging correctly.  He restarts automagically, so all you need to do is kill the process ids:</p>
<p><code><br />
# ps -ax |grep tiny<br />
  354  ??  S      0:00.11 /usr/bin/tinyproxy<br />
  355  ??  S      0:00.05 /usr/bin/tinyproxy<br />
 1651  p1  S+     0:00.01 grep tiny<br />
# kill -9 354 355<br />
# ps -ax |grep tiny<br />
 1654  ??  S      0:00.01 /usr/bin/tinyproxy<br />
 1655  ??  S      0:00.00 /usr/bin/tinyproxy<br />
 1657  p1  S+     0:00.02 grep tiny<br />
#<br />
</code></p>
<p>That should be about it.  You should have Splunk filling up with logs that contain web requests being requested by the Safari browser on your iPhone.  Don't forget to restore the syslog plist file, reboot, and fix it to lock after a few minutes timeout.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/UKYgbg2j5Gs" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2007/10/10/splunk-hack-2-logging-safari-requests-on-the-iphone/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rob Das: Diagraming Splunk’s data-flow</title><link>http://feedproxy.google.com/~r/splunkdev/~3/fZsuCOTb30o/</link><comments>http://blogs.splunk.com/rob/2007/10/10/diagraming-splunks-data-flow/</comments><pubDate>Wed, 10 Oct 2007 16:57:59 +0000</pubDate><dc:creator>Rob Das</dc:creator><description><![CDATA[This blog entry is not about how the framework works.  It is about a semi-cool visualization that I created using python and graphviz. If you watched the video where I presented Splunks framework architecture from a high level you know what pipelines and processors are.  If you haven&#8217;t here is a very quick [...]]]></description><content:encoded><![CDATA[<p>This blog entry is not about how the framework works.  It is about a semi-cool visualization that I created using python and <a href="http://www.graphviz.org/">graphviz</a>. If you watched the video where I presented Splunks framework architecture from a high level you know what pipelines and processors are.  If you haven't here is a very quick overview.</p>
<ul>
<li>A <strong><em>pipeline</em></strong> is a thread of execution that lives within the splunkd process.  Each pipeline executes a series of <strong><em>processors</em></strong>, each one which operates on data.  The data is created when the first processor on the pipeline reads it from some input (like tailing a file, or receiving it on a network port).  Each processor then does something to the data.  Eventually, the data gets indexed and execution is returned to the first processor to get more data again.</li>
</ul>
<ul>
<li>Pipelines are connected via <strong><em>queues</em></strong>. A queue output processor (the last processor in a pipeline) puts data on to a queue and blocks if the queue is full.  A queue input processor (the first processor at the top of a pipeline) gets the data item from the bottom of the queue and sends it on down the pipeline. If there is no data, it blocks waiting for some to be put on the queue.</li>
</ul>
<p>Enough already.  Go watch the video.  So, I decided that I'm tired of drawing these diagrams and wrote some code to produce them for me.</p>
<p>I Implemented some python code that took the <em>composite.xml </em>file, parsed it and produced a <em>.dot </em>file.  Composite.xml, for those of you who don't know is an amalgamation of all pipelines and processors in the system.  It represents the current (or last) runtime environment for Splunk.  It lives in $SPLUNK_HOME/var/run/splunk.</p>
<p>I then took the resultant .dot file and ran it through  <em><a href="http://www.graphviz.org/">graphviz</a>.</em> After lots of tweeking, here is what I came up with.  Click on the image to see a larger version which is actually readable.</p>
<p><strong>Results </strong>(click to enlarge)<br />
<a href="http://blogs.splunk.com/devuploads/2007/10/test.jpg"><img width="253" height="177" alt="Auto-generated pipeline graph" src="http://blogs.splunk.com/devuploads/2007/10/test.jpg" /></a></p>
<p><strong>Python Transformation Code</strong></p>
<p>Untar this.  It's only a single python file, but this blogging software wouldn't let me upload a .py file.</p>
<p><a href="http://blogs.splunk.com/devuploads/2007/10/viz.tar">viz.tar</a></p>
<p><strong>Future Work</strong></p>
<ul>
<li>Annotate the graph with run time statistics like average per-processor timing, average queue size, max queue size, etc.  This would require looking at the logs.</li>
<li>Launching this from Splunk, firing off the python along with the metrics data pre-sifted ala Splunk.</li>
</ul>
<p>Got more ideas?  Please post them here.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/fZsuCOTb30o" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rob/2007/10/10/diagraming-splunks-data-flow/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Amrit Bath: Things you don’t want to hear at work</title><link>http://feedproxy.google.com/~r/splunkdev/~3/c8U3qafV4cM/</link><comments>http://blogs.splunk.com/amrit/2007/10/09/things-you-dont-want-to-hear-at-work/</comments><pubDate>Wed, 10 Oct 2007 01:21:08 +0000</pubDate><dc:creator>Amrit Bath</dc:creator><description><![CDATA[Lots of things are said here that are&#8230; hmm, what&#8217;s the word&#8230; inappropriate?  disgusting?  TMI?  omgwtfbbq?
My boss just told me, &#8220;Amrit, I have a camera on my computer.  And when I&#8217;m at home, anytime you want, I can turn on the camera and you can watch.&#8221;
There was more, but I think [...]]]></description><content:encoded><![CDATA[<p>Lots of things are said here that are... hmm, what's the word... inappropriate?  disgusting?  TMI?  omgwtfbbq?</p>
<p>My boss just told me, "Amrit, I have a camera on my computer.  And when I'm at home, anytime you want, I can turn on the camera and you can watch."</p>
<p>There was more, but I think my ears reflexively closed in on themselves.</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/10/Gorilla_donotwant.jpg" alt="do not want" /></p>
<p>:/</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/c8U3qafV4cM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/amrit/2007/10/09/things-you-dont-want-to-hear-at-work/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rob Das: The framework team is hiring</title><link>http://feedproxy.google.com/~r/splunkdev/~3/zBc0sO37PmQ/</link><comments>http://blogs.splunk.com/rob/2007/10/09/the-framework-team-is-hiring/</comments><pubDate>Tue, 09 Oct 2007 18:52:30 +0000</pubDate><dc:creator>Rob Das</dc:creator><description><![CDATA[Splunk&#8217;s framework team is involved in many diverse projects. The &#8220;framework&#8221; itself is really a set of generic code that makes up the runtime environment of Splunk.  In addition, we also handle bringing data into the system, distributing this data across enterprise topologies, authentication, access controls, configuration management, distributed deployment, high availability, real-time streaming, [...]]]></description><content:encoded><![CDATA[<p>Splunk's framework team is involved in many diverse projects. The "framework" itself is really a set of generic code that makes up the runtime environment of Splunk.  In addition, we also handle bringing data into the system, distributing this data across enterprise topologies, authentication, access controls, configuration management, distributed deployment, high availability, real-time streaming, encryption and much much more.</p>
<p>Splunk is extending it's reach into extremely large deployments involving thousands of machines and devices across multiple data centers.  The framework team is responsible for making Splunk excel in these challenging environments.  If this sounds interesting and you want to work with some extremely talented people, please drop me some email.</p>
<p><strong>Framework Architect / Senior Engineer</strong></p>
<p>We are looking for a highly motivated engineer who will be responsible for driving the design and implementation of Splunk's network management, scalability, and distributed deployment technology.  The right candidate is fluent in C++, high performance networking and  concurrent / multi-threaded design.</p>
<p><strong>Qualifications</strong></p>
<ul>
<li>Minimum 5 years of relevant industry experience</li>
<li>Expert C++ knowledge, deep understanding of design patterns and experience building clean external API's.</li>
<li>Significant experience with multi-threaded design and implementation</li>
<li>Has designed &amp;amp; implemented high throughput server systems</li>
<li>Practical experience with network protocols and complex topologies</li>
<li>BS/MS Computer Science / Engineering</li>
<li>Excellent verbal and written communication skills</li>
</ul>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/zBc0sO37PmQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rob/2007/10/09/the-framework-team-is-hiring/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rory Greene: I’m cold and there are wolves after me</title><link>http://feedproxy.google.com/~r/splunkdev/~3/jjNW6Bgh8cM/</link><comments>http://blogs.splunk.com/rory/2007/10/08/im-cold-and-there-are-wolves-after-me/</comments><pubDate>Tue, 09 Oct 2007 00:53:43 +0000</pubDate><dc:creator>Rory Greene</dc:creator><description><![CDATA[Just fresh from the splunk poker game. Good fun, made a whopping $10. Jef looked like
he was on the verge or paying for his kids education. Maverick even threatened to sing,
good times.
So Erik did a pretty good job of describing the environment here at splunk.
The people here are great and lots of fun, there are [...]]]></description><content:encoded><![CDATA[<p>Just fresh from the splunk poker game. Good fun, made a whopping $10. Jef looked like<br />
he was on the verge or paying for his kids education. Maverick even threatened to sing,<br />
good times.</p>
<p>So Erik did a pretty good job of describing the environment here at splunk.<br />
The people here are great and lots of fun, there are some great problems<br />
just begging to be solved, we need more monkeys on them typewriters </p>
<p>Poker games, golf, visits to the jackson arms, beer pong, foosball<br />
(Raffy really needs a challenge )</p>
<p>Don't worry about that collage bit  http://en.wikipedia.org/wiki/Collage</p>
<p>Erik insists everyone draw a picture of themselves in crayon, but really<br />
who doesn't ask for that in a serious interview these days.</p>
<p>In the coming weeks I'm going to be working on a way to allow people to<br />
plug in their own auth systems. We've had requests running the gamut from<br />
the normal stuff like PAM, RADIUS etc to carrier pidgeon and bob's trusty<br />
auth system. The most common thread of all these is that they are all scriptable.<br />
You folks know your own auth systems. We'll throw this in the unstable<br />
release/dev branch that we'll be launching and hopefully get some feed back<br />
from you folks to fine tune it before we put it into stable.</p>
<p>Now that I've said that in public I'm well and truly screwed and will have to do it.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/jjNW6Bgh8cM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rory/2007/10/08/im-cold-and-there-are-wolves-after-me/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Splunk Hack #1 - Charting Your OSX Battery Usage with Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Ohf0pn42qPI/</link><comments>http://blogs.splunk.com/kordless/2007/10/07/charting-your-osx-battery-usage-with-splunk/</comments><pubDate>Mon, 08 Oct 2007 04:08:45 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[This is an easy-to-follow tutorial for charting battery usage on your Mac laptop with a small shell script and Splunk.  Watching your battery charge is as exciting as watching paint dry, but analyzing it over time is pretty interesting.  You may discover a few things about the software you run - like it [...]]]></description><content:encoded><![CDATA[<p>This is an easy-to-follow tutorial for charting battery usage on your Mac laptop with a small shell script and Splunk.  Watching your battery charge is as exciting as watching paint dry, but analyzing it over time is pretty interesting.  You may discover a few things about the software you run - like it eats your battery's amps for desert.</p>
<p>A friend of mine, Sean Dick, showed me a version of this idea using Splunk on Linux and a program called 'apci'.   As I'm a Mac fanboy of sorts, I dug up a shell script for the Mac that will print out a single logfile-like line containing laptop battery information, including amp draw, amp-hours left, and more.  It's aptly named 'battery', and you can download it <a href="http://www.mitt-eget.com/software/macosx/#battery">here</a>.</p>
<p>I suggest you put battery in a directory under your home directory, say something called 'scripts'.  Head into 'terminal' to start the dirty work.</p>
<p>Here's an example output line from 'battery short':</p>
<p><code>G4:~ kord$ ./scripts/battery short<br />
2007-10-07 18:34:27 1 _________i__ 11.232V -1.454A 2.788Ah of 4.720Ah (59.1%) of 4.400Ah (107.3%) 13 cycles</code></p>
<p>The line of underscores with an 'i' in it are the battery flags set.  'i' means my battery is installed.  Duh.  Other flags include whether the lid is closed, the battery is on fire, or it's just on the charger.  See the battery.rtf file for more information on the flags.  I have a G4 laptop, but just got my battery replaced for free!  Only 13 cycles on it so far!</p>
<p>Splunk eats logfiles, so you'll need to get a logfile rolling on your battery output.  I'm going to assume you know how to use vi (text editor) do the rest of this work.</p>
<p>You'll need to set up a cronjob to create the logfile and continue logging to it every so often.    Switch to root and create a logfile for battery in /var/log: </p>
<p><code><br />
G4:~ kord$ su<br />
Password:<br />
G4:/Users/kord root# cd /var/log<br />
G4:/var/log root# touch battery.log<br />
G4:/var/log root# chown kord battery.log<br />
G4:/var/log root# ls -la battery.log<br />
-rw-r--r--   1 kord  wheel  0 Oct  7 18:45 battery.log<br />
G4:/var/log root# exit<br />
G4:~ kord$<br />
</code></p>
<p>Now use 'crontab -e' and put in a line that looks something like the second line of this:</p>
<p><code><br />
G4:~ kord$ crontab -l<br />
* * * * *       /Users/kord/scripts/battery short >> /var/log/battery.log<br />
</code></p>
<p>That will cause the battery script to run once a minute and append it to the battery.log file in the log directory.  After a few minutes tail the logfile with 'tail /var/log/battery.log' and make sure you've got data in there.  Also, I've edited my own crontab, but you could elect to do it as root (thus skipping the chown step above).</p>
<p>Obviously you will need Splunk installed to chart the battery usage out of the logfiles.  If you haven't installed it already, there's a free version up on the website you can <a href="http://www.splunk.com/index.php/predownload?d=kordless">download</a>.  Follow the instructions for installing it on OSX.</p>
<p>Assuming that you installed Splunk in in '/Applications/splunk/' you can do the following to start it:</p>
<p><code><br />
G4:~ root# cd /Applications/splunk<br />
G4:/Applications/splunk root# export SPLUNK_HOME='/Applications/splunk/'<br />
G4:/Applications/splunk root# ./bin/splunk start<br />
</code></p>
<p>Now you'll need to <a href="http://www.splunkbase.com/addons/Inputs/Systems_Management/Monitoring/addon:OSX_Battery_Monitor">download</a> my addon for Splunk, which is basically a bundle of configuration files.  For reference, I also put the battery script in the tar file, along with an example crontab file.  To get the bundle in the right place, start by un-taring it:</p>
<p><code><br />
G4:~ kord$ tar xvfz battery.tar.gz<br />
battery/<br />
battery/addon.conf<br />
battery/bin/<br />
battery/bin/battery<br />
battery/bin/battery.rtf<br />
battery/bin/crontab.example<br />
battery/props.conf<br />
battery/screenshot.jpg<br />
battery/transforms.conf<br />
</code></p>
<p>Now move it to the correct location in Splunk's directory:</p>
<p><code><br />
G4:~ kord$ su<br />
Password:<br />
G4:/Users/kord root# mv battery /Applications/splunk/etc/bundles/<br />
</code></p>
<p>And restart Splunk now:</p>
<p><code><br />
G4:/Users/kord root# /Applications/splunk/bin/splunk restart<br />
</code></p>
<p>We'll spend the rest of our time in a browser, using Splunk's kick-ass web interface.</p>
<p>If you left the default port alone, you should be able to fire up Firefox and hit http://localhost:8000 and see the initial login screen (or not if you are using the free version).  I'll leave the particulars of getting to the initial search interface on Splunk to you.</p>
<p>Add the battery.log file to the list of files Splunk monitors.  Click on 'admin', then click on the 'data inputs' tab.  Click on the 'Add input' link to the right of 'Files &amp;#038; Directories' at the bottom.  Leave the data access to 'tail' and give the full path to the logfile -  '/var/log/battery.log' in my example above.  Host can be constant, DNS name doesn't matter, and set the source type pulldown to '_battery'.  Remember, this sourcetype won't be in the list until you install the battery bundle.</p>
<p>Click on 'add' to add the source type.  Go get a cup of coffee while Splunk eats this and other files on your computer and builds the index.</p>
<p>Back from the caffeine, you should now click on the 'splunk>' logo at the top left.  Type in the following in the search bar, sans the quotes: 'source::/var/log/battery.log'.  Click on the 'fields' pulldown on the left and check a few extracted fields, such as battery_ah_remaining, battery_draw, battery_percent, and battery_volts.  Click on 'fields' again to close and reload with the extracted fields showing.</p>
<p>You should get something that looks like this:</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/10/splunk_fields.jpg" alt="" /></p>
<p>If you have about an hour's or so data logged, try entering 'source::/var/log/battery.log | timechart avg(battery_draw)' in the search box at the top to generate a report for the last 60 minutes.</p>
<p>Here's what my amp draw looks like for the last 3 hours:</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/10/splunk_graph.jpg" alt="" /></p>
<p>The move 'up' in the graph halfway through is actually a drop in amps drawn on the battery when I restarted Firefox.  The cause?  Firefox had a Flash game running in another tab, and it had eventually heated up the processor enough to kick on the fans!</p>
<p>Here's another one, showing the evidence of me having a newer battery installed - almost five hours of continuous usage after 4PM, with only a few screen sleeps:</p>
<p><img src="http://blogs.splunk.com/devuploads/2007/10/splunk_graph_2.jpg" alt="" /></p>
<p>It's interesting how the laptop charges at a rate almost the same as it discharges.  It preserves battery life doing it that way, especially with the new lithium-polymer batteries.</p>
<p>See what else you can dig up about your battery.  Try charting with some of the flags that are set - like how often the charger is on the laptop, or what the draw rate is if you have the screen clamshell closed.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Ohf0pn42qPI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2007/10/07/charting-your-osx-battery-usage-with-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Stephen Sorkin: the search and indexing team is hiring</title><link>http://feedproxy.google.com/~r/splunkdev/~3/rZdTcdKh8Gk/</link><comments>http://blogs.splunk.com/ssorkin/2007/10/06/the-search-and-indexing-team-is-hiring/</comments><pubDate>Sat, 06 Oct 2007 17:03:21 +0000</pubDate><dc:creator>Stephen Sorkin</dc:creator><description><![CDATA[hello world. i&#8217;m the manager of the search and indexing team at splunk. our team is responsible for the amorphous category of &#8220;data processing and storage.&#8221; this includes such tasks as character set normalization, grouping multiple consecutive lines into logical events, timestamp extraction, metadata extraction, indexing and storage on the input side. on the retrieval [...]]]></description><content:encoded><![CDATA[<p>hello world. i'm the manager of the search and indexing team at splunk. our team is responsible for the amorphous category of "data processing and storage." this includes such tasks as character set normalization, grouping multiple consecutive lines into logical events, timestamp extraction, metadata extraction, indexing and storage on the input side. on the retrieval side, we maintain the APIs to access the index and all the various transformations that fall under the label "reporting," like automatic key/value extraction methods that take raw text and produce semi-structured data. we have more problems to solve than people to solve them, so we're looking to grow our team in the near future. below are some of the positions that we're hiring for, but if you're smart, clever and creative, we probably have a spot for you.</p>
<p><strong>Indexing Architect</strong><br />
We're looking for an exceptionally talented engineer to drive the design, implementation and maintenance of our core indexing and search technology. The right candidate will have significant experience writing high performance C/C++ code that interacts with the file system at low levels.</p>
<p><b>Qualifications</b></p>
<ul>
<li>Minimum 5 years of relevant industry experience.</li>
<li>Expert level knowledge of C/C++ programming and a deep understanding of multi-process, highly concurrent software design.</li>
<li>BS/MS Computer Science/Engineering. PhD is welcome.</li>
<li>Experience in algorithm and data-structure design.</li>
<li>Excellent verbal and written communication skills.</li>
</ul>
<p><strong>Search and Indexing Engineer (all levels)</strong><br />
We're looking for an exceptionally talented engineers from recent college grads to seasoned software veterans to contribute technically to Splunk's Search and Indexing team. The right candidate will care deeply about writing high-quality, efficient and maintainable software.</p>
<p><b>Qualifications</b></p>
<ul>
<li>Strong background in C/C++ programming.</li>
<li>BS/MS Computer Science/Engineering or related field. PhD is welcome.</li>
<li>Understanding of algorithmic complexity and data-structure tradeoffs.</li>
<li>Excellent verbal and written communication skills.</li>
</ul>
<p><strong>Data Mining Architect</strong><br />
Do you love data? Terabytes of semi-structured, inconsistent, machine-generated data? If you're creative and have inspired ideas of how to summarize, group and link this data, we're looking for you. </p>
<p><b>Qualifications</b></p>
<ul>
<li>PhD in Statistics, IEOR, Computer Science or related field.</li>
<li>Strong modern AI and statistics background.</li>
<li>Solid command of one or more scripting languages.</li>
<li>Understanding of algorithmic development and data-structure tradeoffs in C/C++.</li>
<li>Strong publication history.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/rZdTcdKh8Gk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/ssorkin/2007/10/06/the-search-and-indexing-team-is-hiring/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kim Wallace: Packaging Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/0E5_GlvlXXA/</link><comments>http://blogs.splunk.com/kim/2007/10/05/packaging-splunk/</comments><pubDate>Sat, 06 Oct 2007 00:30:06 +0000</pubDate><dc:creator>Kim Wallace</dc:creator><description><![CDATA[Splunk runs on a lot of platforms for a relatively young product and that number is always increasing. The day I started, there were packages for Intel and PowerPC Macintoshes, i686 Linux, Solaris 8 on Sparc, and FreeBSD on x86, all created with BitRock InstallBuilder, run from a simple shell script, usually by Erik. There [...]]]></description><content:encoded><![CDATA[<p>Splunk runs on a lot of platforms for a relatively young product and that number is always increasing. The day I started, there were packages for Intel and PowerPC Macintoshes, i686 Linux, Solaris 8 on Sparc, and FreeBSD on x86, all created with BitRock InstallBuilder, run from a simple shell script, usually by Erik. There really wasn't much control over what went into the installer  -  if a file was in the installer prep directory and the shell script didn't know to delete it, out it went.</p>
<p>By the time 2.1 was on its way, we'd decided to switch to native packages, and our list of platforms had expanded to include Solaris on Intel, with several more on the horizon. We also wanted to provide the "rail tarball" distribution we continue to support, in part so that QA could get started before the packaging automation was complete. </p>
<p>What is that packaging automation, you might ask? Obviously writing custom code to package each platform (not to mention spec or pkgmap files in each platform's native format) was not a very maintainable solution. Instead we use a locally modified version of Easy Software's EPM package manager. After a little work, EPM lets us use a common set of list files to create relocatable packages using common pre- and post-install scripts across all of the 9 platforms we now build on. We're able to control every file and permission that goes into the packages, and in most cases we can add packaging for a new OS platform with a minimum of work (for something very different we haven't previously had in house, like AIX, more time might need to be spent cleaning up EPM's support for the platform). We've piggy-backed creation of the "rail tarball" distributions on the EPM list file structure, so those packages too are completely defined. EPM itself is built during the Splunk build process like any other 3rd party dependency, so any new patch to the tool are available to the build systems almost as soon as it's checked in. </p>
<p>The downside to this is a little loss of flexibility for some platforms; trivial changes to the RPM spec file or FreeBSD ports originpath have usually required modifying EPM's source. It's not a bad trade-off, though.</p>
<p>EPM has recently been made open source; check it out at http://www.epm-home.org . If you're interested in my patches, feel free to drop me a line, but in some cases superior patches have been contributed at epm-home.org.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/0E5_GlvlXXA" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kim/2007/10/05/packaging-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Rob Das: Software configuration - why does this wheel need re-invention?</title><link>http://feedproxy.google.com/~r/splunkdev/~3/NphJN4gMlJg/</link><comments>http://blogs.splunk.com/rob/2007/10/02/software-configuration-why-does-this-wheel-need-re-invention/</comments><pubDate>Wed, 03 Oct 2007 02:01:09 +0000</pubDate><dc:creator>Rob Das</dc:creator><description><![CDATA[I have worked on so many software projects that I can&#8217;t possibly enumerate them.  Most of my contribution to these projects has been on the server side of things.  Every one of these projects needed to be configured in some way, shape or form and I just realized that every one of them [...]]]></description><content:encoded><![CDATA[<p>I have worked on so many software projects that I can't possibly enumerate them.  Most of my contribution to these projects has been on the server side of things.  Every one of these projects needed to be configured in some way, shape or form and I just realized that every one of them had it's own configuration subsystem that was implemented from scratch.  Many of these configurations could be managed via GUI's and/or CLI's, and others simply were "managed" via vi, or emacs.  They all share one thing in common however - they all suck in one way or another.  Why?  Because configuration subsystems are incredibly difficult to get right.</p>
<p>Building a configuration system on the surface seems boring.  If I went and showed the sales guys how cool my configuration system was they would roll their eyes back into their heads.   Put some rotating, flashing thing on the GUI and they think you're the coolest, most creative developer around.  The fact is that a good configuration system makes a <em>huge </em>difference to a product.  In fact, it can make or break it in some cases.</p>
<p>Indulge me in allowing me to share a typical "configuration system lifecycle".  Please tell me if this seems familiar to you.  I have personally gone through this many times.</p>
<ul>
<li>Version 1.0 - simple configuration language, usually XML.  Why?  Because you need to get something up and running quickly.  XML has tons of parsers, validators, etc.  Users of this early release need to edit the configuration files using a text editor.  They need to restart the system every time a change is made. The developer states that this is fine - the product is "not intended for use by people that can't use an editor". Fuck em'.</li>
<li>Version 1.5 - The next release has some really complex configuration.  However, it's still only modifiable via a text editor.  Maybe flow control is introduced.  Changing a configuration in the wrong way causes very bad and very weird things to happen.  Customer Support gets lots of calls.  There is no way to tell what a customer changed and what the default configuration was supposed to be without comparing the two configuration files side by side.</li>
<li>Version 2.0 - We need an adminstration GUI so people can configure this without have to call support every single time!  So a GUI is added.  Every administered item is coded into the server and into the GUI because every configuration has different validation, different things to check, etc.  The customers are much happier.  Until graybeard decides he hates the GUI and insists on using emacs.  The GUI and emacs don't get along very well.  Things break again.</li>
<li>Version 2.5 - The executives decide that we need a way for "the community" to build widgets that other people can use.  They need to package these widgets up in some way that they can be downloaded and added to the system without disturbing local and default configurations.  The engineers decide to use layering to separate these 3 things out.  But layering in XML is nasty and people will get confused.  So out with the XML to something "simpler".  Boy did this open a can of worms.  All the different parts of the system need to be modified to handle the new configuration syntax.  We are just about ready to ship.  Boy is this code base different - "Oh SHIT! We forgot we need migration scripts!".  So they are frantically built and hastily tested.  The product ships.  Customers complain.  Not only do the migration scripts hork periodically, but the configuration language is new to them.</li>
<li>Version 3.0 - The server engineers are adding lots of new features to support customer requirements.  Unfortunately, every new feature needs custom GUI and CLI work to handle the administration of that feature.  This is simply not sustainable, so it's been decided to data drive the GUI and CLI from a specification file that describes the syntax, the interdependencies, etc for each configuration item/file.  Furthermore, the community is going gangbusters, but downloading new widgets requires a restart of the server.  So does all configuration changes.  Once again every part of the system is changed to handle this dynamic configuration.  Man is this hard - "what do I do with the data that is already in the queues when the queue is supposed to be shrunk in this re-configuration, asks one of the brightest engineers?"  Hmm.</li>
</ul>
<p>You get the idea.<br />
So here in a nutshell is a list of reasons why configuration systems are so difficult.  I'm sure you can add more:</p>
<ul>
<li>They are actually small languages.  I have seen XML, simple linear lists of attribute/value pairs, scripting languages with flow control, strange and weird languages like in sendmail, etc.</li>
<li>They need validation so they don't break the system</li>
<li>If there is GUI or CLI access, they need to be dynamically updated</li>
<li>Consistency between updates is critical so that someone editing a config file using via doesn't collide with someone using the GUI.</li>
<li>They need to be migrated from version to version or need some kind of backward compatability</li>
<li>They can be layered so local changes override system defaults</li>
<li>They need to be extensible so ultimately 3rd parties can develop configurations that are add-ons</li>
<li>They need solid documentation - ultimately self generating.</li>
<li>They should be data driven such that every time someone invents something that needs new configuration, the GUI and/or CLI doesn't need new code.</li>
<li>They need to support dynamic loading with no system restarting</li>
<li>They may need to support versioning in systems that are composed of modules, each which may be independently revved.</li>
</ul>
<p><strong>Conclusion</strong></p>
<p>Configuration systems are often overlooked, but can be the core of an entire system.  There is <em>no </em>substitute for a really good one.  It's almost impossible to get it right the first time, but you must really think long and hard about where you want it to go and what you want it to become.</p>
<p>Yes.  I copped out.  I didn't tell you how to do these things.  I didn't tell you where you can look on SourceForge to find the ultimate configuration system so you don't need to re-invent the wheel yet again.  That is because there is none - at least not that I know of.  I have some ideas on how to build a generic configuration system that if open-sourced could save engineers months of time, but that is the topic of a different post.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/NphJN4gMlJg" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/rob/2007/10/02/software-configuration-why-does-this-wheel-need-re-invention/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kord Campbell: Tech Talk #1 - Pipelines and Processors</title><link>http://feedproxy.google.com/~r/splunkdev/~3/5AjsCls3Utk/</link><comments>http://blogs.splunk.com/kordless/2007/10/01/tech-talk-1-pipelines-and-processors/</comments><pubDate>Mon, 01 Oct 2007 16:53:15 +0000</pubDate><dc:creator>Kord Campbell</dc:creator><description><![CDATA[Rob Das gives us the skinny on Splunkd&#8217;s use of various pipelines and processors.  This is the first pass at Splunk&#8217;s tech talks, so the screen caps of the terminal are a little blurry on the smaller versions.  We&#8217;ll be re-filming this particular piece again this week, except this time the beer guy [...]]]></description><content:encoded><![CDATA[<p>Rob Das gives us the skinny on Splunkd's use of various pipelines and processors.  This is the first pass at Splunk's tech talks, so the screen caps of the terminal are a little blurry on the smaller versions.  We'll be re-filming this particular piece again this week, except this time the beer guy is going to do it.</p>
<p><a href="http://video.google.com/videoplay?docid=5537946582161381694&amp;#038;hl=en"><img class="pic" width="120" height="90" border="0" src="http://blogs.splunk.com/devuploads/2007/09/tech_talk_1_screenshot.jpg"/></a></p>
<p>More video formats are available from Splunk's <a href="http://dev.splunk.com/videos-from-splunk/">Tech Talks</a> section.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/5AjsCls3Utk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kordless/2007/10/01/tech-talk-1-pipelines-and-processors/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Ben Strawbridge: Scrum caps for scrums</title><link>http://feedproxy.google.com/~r/splunkdev/~3/geE_oW2oQSI/</link><comments>http://blogs.splunk.com/ben/2007/09/21/scrum-caps-for-scrums/</comments><pubDate>Fri, 21 Sep 2007 20:32:02 +0000</pubDate><dc:creator>Ben Strawbridge</dc:creator><description><![CDATA[We have been using agile development processes splunk for the past few months, including sprints, daily standing meetings and functional scrum groups.  Our fearless chief mind, David suggested that we should have a team leader hat, like they wear for a rugby scrum, to protect them from thrown objects.

I thought it was a great [...]]]></description><content:encoded><![CDATA[<p>We have been using agile development processes splunk for the past few months, including sprints, daily standing meetings and functional scrum groups.  Our fearless chief mind, David suggested that we should have a team leader hat, like they wear for a rugby scrum, to protect them from thrown objects.</p>
<p><a title="Our scrum leader" href="http://blogs.splunk.com/devuploads/2007/09/david-scrum-leader.jpg"><img alt="Our scrum leader" src="http://blogs.splunk.com/devuploads/2007/09/david-scrum-leader.jpg" /></a></p>
<p>I thought it was a great idea too, what do you think?</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/geE_oW2oQSI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/ben/2007/09/21/scrum-caps-for-scrums/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: Intangibles</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Pm7hbIdvuLk/</link><comments>http://blogs.splunk.com/nick/2007/09/19/intangibles/</comments><pubDate>Wed, 19 Sep 2007 23:38:02 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[There&#8217;s lots of subtle things that are required for good user experience. Simplicity, speed, comprehensibility, consistency.   These are the core value of any software, and there&#8217;s a spectrum on which they&#8217;re at the other end from &#8216;Features&#8217;. 
Features are cool. They make you sound smart. Whether you&#8217;re a customer talking to a sales [...]]]></description><content:encoded><![CDATA[<p>There's lots of subtle things that are required for good user experience. Simplicity, speed, comprehensibility, consistency.   These are the core value of any software, and there's a spectrum on which they're at the other end from 'Features'. </p>
<p>Features are cool. They make you sound smart. Whether you're a customer talking to a sales guy, or an engineer fleshing out an idea you had.  New stuff tends to show up in sentences as the word 'feature'.  It's exciting. Sure it has a certain cost in speed or something.  It tends to not color entirely within the lines.   But that's OK.  It's new, therefore it's cool. </p>
<p>Jumping forward many years though, everything at some point was new,  and gets old and those costs start to suck.   After a while these intangibles have pretty much been sacrificed away and you're in large-company hell, sitting in endless meetings trying to figure out how and when everything got all bloated and slow. </p>
<p>So, we're trying to swim against this current as we scale (no kidding).  We're trying to prioritize speed and simplicity. We're trying to keep talking to users in the trenches. We're fighting off checkbox-itis, we're trying to have new corner-case features built in offshoot, quasi-standalone manners,  we're trying to use the extensible architectures we have, and create more of them when needed.    </p>
<p>In short, we're trying to keep that thing about Splunk that is cool.   That you can get going quickly, you can set it up quickly,  you can change directions quickly.  It's yours to drive.  When you want to do something with Splunk for the first time, it generally makes sense and doesnt take very long. </p>
<p>So enter YOU!  We sometimes suck at all this, and could use your help to suck less.  And if you the user want to talk to us, about how you use Splunk, what searches you run with it, what stuff you've found easy, what you've found hard, we want to know.   It's not terribly hard to figure out my email address since you know my name.  So email me.   or email ui.  Really.   =)</p>
<p>We have some nifty but lightweight web-ex type stuff that just requires a browser, and we can do a 10min conference call and watch you drive your splunk instance around from the safety of our desks.   Email me for an invite. </p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Pm7hbIdvuLk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/09/19/intangibles/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: wayback machine</title><link>http://feedproxy.google.com/~r/splunkdev/~3/xAz8Hs1a8eI/</link><comments>http://blogs.splunk.com/nick/2007/09/19/wayback-machine/</comments><pubDate>Wed, 19 Sep 2007 19:01:54 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[Im a pretty nostalgic guy, so hanging out with me there&#8217;s a lot of &#8216;back in the day&#8217;,  &#8216;onion on my belt&#8217;  kind of stuff.  You have been warned. 
So my history at splunk &#8212; I started here in March &#8216;05.  First UI Developer, inheriting the front end built by our [...]]]></description><content:encoded><![CDATA[<p>Im a pretty nostalgic guy, so hanging out with me there's a lot of 'back in the day',  'onion on my belt'  kind of stuff.  You have been warned. </p>
<p>So my history at splunk  -  I started here in March '05.  First UI Developer, inheriting the front end built by our notorious founder Erik Swan.  They brought me in as a dHTML guru and gave me free reign (crossed fingers notwithstanding).   But for better or for worse Splunk has always been pretty different on the client-side.  Even the alphas and private betas all were all client-side XSLT and had that holy crap moment where you wonder why the hell  everything is clickable and lighting up on mouseover.</p>
<p>Then during the sprint to 3.0 we ran off in even crazier directions, and did all the things we'd talked about doing, but held back from  (eg endless pager, free form charting in Flash, rethinking the timeline interactions, replacing the tabs with more compact layers). </p>
<p>From this point forward though, there will be more building out and less building up if that makes sense.  ie no more monolithic single all powerful UI, but rather links between quasi-standalone bits.   And on the monolith instead of bolting on new features Instead we'll be solidifying things, cleaning, improving, fixing. </p>
<p>That said, there will still be a lot of unusual and useful interactivity.  Actually probably more so overall if the monolith-maintenance burden falls as expected.</p>
<p>So interesting times here at Splunk. More news as it comes.  </p>
<p>And oh, we are hiring.  We are extremely hiring.  If you are interested, or you know someone who's interested, or a friend of yours is really smart and you think he needs a new job... Send them our way.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/xAz8Hs1a8eI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/09/19/wayback-machine/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Driving Miss Erik</title><link>http://feedproxy.google.com/~r/splunkdev/~3/jjRINlKqjzs/</link><comments>http://blogs.splunk.com/johnvey/2007/09/18/driving-miss-erik/</comments><pubDate>Tue, 18 Sep 2007 23:33:56 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[Adventures on a mini-bike amongst the boxes in engineering:

External view:
<embed src='http://www.brightcove.tv/playerswf' bgcolor='#FFFFFF' flashVars='allowFullScreen=true&#038;initVideoId=1184398456&#038;servicesURL=http://www.brightcove.tv&#038;viewerSecureGatewayURL=https://www.brightcove.tv&#038;cdnURL=http://admin.brightcove.com&#038;autoStart=false' base='http://admin.brightcove.com' name='bcPlayer' width='486' height='412' allowFullScreen='true' allowScriptAccess='always' seamlesstabbing='false' type='application/x-shockwave-flash' swLiveConnect='true' pluginspage='http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash'></embed>

Internal view:
<embed src='http://www.brightcove.tv/playerswf' bgcolor='#FFFFFF' flashVars='allowFullScreen=true&#038;initVideoId=1184397465&#038;servicesURL=http://www.brightcove.tv&#038;viewerSecureGatewayURL=https://www.brightcove.tv&#038;cdnURL=http://admin.brightcove.com&#038;autoStart=false' base='http://admin.brightcove.com' name='bcPlayer' width='486' height='412' allowFullScreen='true' allowScriptAccess='always' seamlesstabbing='false' type='application/x-shockwave-flash' swLiveConnect='true' pluginspage='http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash'></embed>
]]></description><content:encoded><![CDATA[<p>Adventures on a mini-bike amongst the boxes in engineering:</p>
<p>External view:<br />
<embed src='http://www.brightcove.tv/playerswf' bgcolor='#FFFFFF' flashVars='allowFullScreen=true&amp;#038;initVideoId=1184398456&amp;#038;servicesURL=http://www.brightcove.tv&amp;#038;viewerSecureGatewayURL=https://www.brightcove.tv&amp;#038;cdnURL=http://admin.brightcove.com&amp;#038;autoStart=false' base='http://admin.brightcove.com' name='bcPlayer' width='486' height='412' allowFullScreen='true' allowScriptAccess='always' seamlesstabbing='false' type='application/x-shockwave-flash' swLiveConnect='true' pluginspage='http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash'></embed></p>
<p>Internal view:<br />
<embed src='http://www.brightcove.tv/playerswf' bgcolor='#FFFFFF' flashVars='allowFullScreen=true&amp;#038;initVideoId=1184397465&amp;#038;servicesURL=http://www.brightcove.tv&amp;#038;viewerSecureGatewayURL=https://www.brightcove.tv&amp;#038;cdnURL=http://admin.brightcove.com&amp;#038;autoStart=false' base='http://admin.brightcove.com' name='bcPlayer' width='486' height='412' allowFullScreen='true' allowScriptAccess='always' seamlesstabbing='false' type='application/x-shockwave-flash' swLiveConnect='true' pluginspage='http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash'></embed></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/jjRINlKqjzs" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/09/18/driving-miss-erik/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Dev vs. Support Boat Race</title><link>http://feedproxy.google.com/~r/splunkdev/~3/WW_bxH8sy3Y/</link><comments>http://blogs.splunk.com/johnvey/2007/09/18/dev-vs-support-boat-race/</comments><pubDate>Tue, 18 Sep 2007 23:11:08 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[Dev destroys support in a 4 on 4 boat race.
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/cUJwu0wMGi0"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/cUJwu0wMGi0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object>]]></description><content:encoded><![CDATA[<p>Dev destroys support in a 4 on 4 boat race.<br />
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/cUJwu0wMGi0"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/cUJwu0wMGi0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/WW_bxH8sy3Y" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/09/18/dev-vs-support-boat-race/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: AjaxWorld 2007</title><link>http://feedproxy.google.com/~r/splunkdev/~3/SmYHuAj2g2M/</link><comments>http://blogs.splunk.com/johnvey/2007/09/17/ajaxworld-2007/</comments><pubDate>Tue, 18 Sep 2007 01:45:34 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[For all you hardcore Web 2.0 fanboys, I&#8217;m giving a talk at AjaxWorld on &#8220;High-Performance AJAX Application Design&#8221; down in Santa Clara at the end of September.  The official blurb is:
Designing an AJAX application that meets enterprise scalability and performance requirements presents technical challenges that aren’t addressed by traditional AJAX frameworks. This session will [...]]]></description><content:encoded><![CDATA[<p>For all you hardcore Web 2.0 fanboys, I'm giving a talk at <a href="http://ajaxworld.com/">AjaxWorld</a> on "High-Performance AJAX Application Design" down in Santa Clara at the end of September.  The official blurb is:</p>
<blockquote><p>Designing an AJAX application that meets enterprise scalability and performance requirements presents technical challenges that aren’t addressed by traditional AJAX frameworks. This session will highlight the techniques used in Splunk to address handling large amounts of data in the browser, persistent multi-panel state management, interface customization and localization, and interactive DOM-accessible graphics support. By leveraging existing, though less common, techniques such as iframe-style AJAX, in-browser XSLT, and contextual CSS, modern browsers can provide a compelling interface without the need for a thick-client installation.</p></blockquote>
<p>Come by and say hi.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/SmYHuAj2g2M" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/09/17/ajaxworld-2007/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Drugging employees for fun and profit</title><link>http://feedproxy.google.com/~r/splunkdev/~3/vQbe3fVxf4U/</link><comments>http://blogs.splunk.com/johnvey/2007/09/05/drugging-employees-for-fun-and-profit/</comments><pubDate>Thu, 06 Sep 2007 02:09:56 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[
On a daily basis, I pay homage to the wonder that is Blue Bottle Coffee espresso, which flows freely &#8212; some would say excessively &#8212; from our kitchen.  The benefits to productivity that this fine coffee bestows upon the dev team is enormous, easily eclipsing other contenders such as video games or foosball.  [...]]]></description><content:encoded><![CDATA[<p><img alt="Blue Bottle Coffee" title="Blue Bottle Coffee" src="http://farm2.static.flickr.com/1084/1333494074_d4f5539fdd.jpg" /></p>
<p>On a daily basis, I pay homage to the wonder that is <a href="http://www.bluebottlecoffee.net/">Blue Bottle Coffee</a> espresso, which flows freely  -  some would say excessively  -  from our kitchen.  The benefits to productivity that this <a href="http://www.yelp.com/biz/EFefgQdk_19WxQXvVtwEog">fine coffee</a> bestows upon the dev team is enormous, easily eclipsing other contenders such as video games or foosball.  Of course, there were some hurdles to get to this point, namely somebody pouring M&amp;#038;Ms into the bean grinder of the <a href="http://www.capresso.com/prod_super_avantgarde.html">super-automatic</a> that was previously in service.  The result was a pitiful molten mess of chocolate, beans, plastic, and gears.  And, of course, the perpetrator was never discovered.  So the only recourse was to beef up the machinery and move to a true commercial setup: a <a href="http://www.laspaziale.com/english/frame_s1_en.html">La Spaziale</a>, <a href="http://www.coffeegeek.com/proreviews/detailed/mazzermini">Mazzer Mini</a>, and freshly delivered Blue Bottle.  BB even asked us what hardware we were running, and sent us the most compatible beans.  Brilliant.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/vQbe3fVxf4U" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/09/05/drugging-employees-for-fun-and-profit/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Splunking your iPhone</title><link>http://feedproxy.google.com/~r/splunkdev/~3/pj21KSvXEfI/</link><comments>http://blogs.splunk.com/mark/2007/08/26/splunking-your-iphone/</comments><pubDate>Sun, 26 Aug 2007 08:08:34 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[Had a little fun last night. Enabled syslogd on the iPhone and sent the logs to a splunk instance via UDP/514
Process is hacking your iPhone and install ssh. Enable syslogd by the following method. (Thanks to core on #iphone)
syslog
20:00  so to get syslog running you need /etc/syslogd.conf from your mac
20:01  then break the [...]]]></description><content:encoded><![CDATA[<p>Had a little fun last night. Enabled syslogd on the iPhone and sent the logs to a splunk instance via UDP/514</p>
<p>Process is hacking your iPhone and install ssh. Enable syslogd by the following method. (Thanks to core on #iphone)</p>
<div align="left"><strong>syslog</strong></div>
<div align="left">20:00  so to get syslog running you need /etc/syslogd.conf from your mac</div>
<div align="left">20:01  then break the syslog in /System/Library/LaunchDaemons/apple.com.syslogd by putting in bad values</div>
<div align="left">20:01  then restart the phone and run 20:01  /usr/sbin/syslogd -bsd_out 1 &amp;#38;</div>
<p>Then edit /etc/syslog.conf and append *.*            @loghost</p>
<p>Restart syslogd and you're set.</p>
<p>Then just set splunk up to listen on 514/UDP and you have iPhone logs.</p>
<p>Interesting bit found? launchd, the service that starts up the daemons on the iPhone just keeps respawning services. The iPhone lacks a standard service control mechanism such as the sysv-compatible init process.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/pj21KSvXEfI" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2007/08/26/splunking-your-iphone/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Johnvey Hwang: Download Splunk 3.0 Today!</title><link>http://feedproxy.google.com/~r/splunkdev/~3/qruK3j2R6xo/</link><comments>http://blogs.splunk.com/johnvey/2007/08/03/download-splunk-30-today/</comments><pubDate>Sat, 04 Aug 2007 06:10:54 +0000</pubDate><dc:creator>Johnvey Hwang</dc:creator><description><![CDATA[I&#8217;m pleased to announce that Splunk 3.0 has been released, and is available for download immediately!  It&#8217;s been a very long road to GA, but I think it is worth the wait.  With 3.0, exploring your unstructured data has never been easier, thanks to the new reporting interface.  As always, we love [...]]]></description><content:encoded><![CDATA[<p>I'm pleased to announce that Splunk 3.0 has been released, and is <a title="download Splunk" href="http://www.splunk.com/index.php/predownload?d=progeneric">available for download</a> immediately!  It's been a very long road to GA, but I think it is worth the wait.  With 3.0, exploring your unstructured data has never been easier, thanks to the new reporting interface.  As always, we love user feedback so try it out and let us know what you like and what you don't  -  either <a href="mailto:johnvey@splunk.com">to me</a>, or to <a href="mailto:support@splunk.com">support@splunk.com</a>.  Stop guessing about what's going on in your datacenter and start getting answers with <a href="http://www.splunk.com">Splunk</a>.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/qruK3j2R6xo" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/johnvey/2007/08/03/download-splunk-30-today/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Amrit Bath: Administering remote Splunk servers via the CLI</title><link>http://feedproxy.google.com/~r/splunkdev/~3/m4x9LLfBx_E/</link><comments>http://blogs.splunk.com/amrit/2007/07/03/administering-remote-splunk-servers-via-the-cli/</comments><pubDate>Tue, 03 Jul 2007 15:02:46 +0000</pubDate><dc:creator>Amrit Bath</dc:creator><description><![CDATA[It&#8217;s a little known (mainly because it&#8217;s undocumented) fact that it is possible to use the Splunk CLI to manage remote Splunk servers.  This capability has been built into the product since version 2.1, and allows one to do things such as remotely manage data inputs, run searches, manage users, etc.  For fairly [...]]]></description><content:encoded><![CDATA[<p>It's a little known (mainly because it's undocumented) fact that it is possible to use the Splunk CLI to manage remote Splunk servers.  This capability has been built into the product since version 2.1, and allows one to do things such as remotely manage data inputs, run searches, manage users, etc.  For fairly obvious reasons, this cannot be done with commands that require Splunkd to be stopped.</p>
<p>The syntax is simple:</p>
<p><strong>/opt/splunk/bin/splunk &amp;lt;command&amp;gt;  [&amp;lt;subcommand&amp;gt;] &amp;lt;params&amp;gt; -uri https://my2ndSplunkBox:8089</strong></p>
<p>The key here is the <strong>-uri</strong> parameter, which instructs the PCL to send all SOAP requests to the specified server.  There are 3 pieces to the parameter: protocol, host, and port.</p>
<p>The protocol must be one of <strong>http</strong> or <strong>https</strong>, depending on whether or not SSL is enabled on the Splunkd port.  Most users will want the latter, as recent versions of Splunk enable SSL on this port by default.</p>
<p>The second part is the hostname or IP address of the host that the remote Splunk server is running on.  This should need no real explanation - in this case, the remote server has the hostname <strong>my2ndSplunkBox</strong>.</p>
<p>The last part of the argument is the Splunkd port (aka the management port).  Note that this is <em>not</em> the port that's used to reach the web interface, but the port that Splunkd listens on for incoming SOAP requests.  If you're unsure of what this port is, try the default, which is <strong>8089</strong>.  Alternatively, <strong>splunk show splunkd-port</strong> will display the Splunkd port that the current server is listening on.</p>
<p>As a practical example, one can add a tailed data input on the <strong>/var/log</strong> directory of host <strong>my2ndSplunkBox</strong> with the following command:</p>
<p><strong>splunk add tail /var/log -uri https://my2ndSplunkBox:8089</strong></p>
<p>The only caveat to this feature is that if you're logged into your Splunk server via <strong>splunk login</strong>, you will have to re-authenticate when sending commands to the remote server (and once again when you resume targetting your local server by leaving off <strong>-uri</strong>). Workarounds include using the <strong>-auth</strong> parameter or the <strong>SPLUNK_USERNAME</strong> and <strong>SPLUNK_PASSWORD</strong> environment variables, but these are better left to a later post.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/m4x9LLfBx_E" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/amrit/2007/07/03/administering-remote-splunk-servers-via-the-cli/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Amrit Bath: HI@WEB2.0</title><link>http://feedproxy.google.com/~r/splunkdev/~3/NOsHHIdotlY/</link><comments>http://blogs.splunk.com/amrit/2007/07/03/hiweb20/</comments><pubDate>Tue, 03 Jul 2007 14:24:36 +0000</pubDate><dc:creator>Amrit Bath</dc:creator><description><![CDATA[Well, I guess I had to start &#8220;blogging&#8221; eventually&#8230;
Hi, I&#8217;m Amrit, the main CLI (Command Line Interface) and PCL (Python Control Layer) guy here at Splunk.  This means that I maintain our more common bash scripts (bin/splunk &#38; friends), and our Python support scripts (site-packages/splunk/clilib/), which do the heavy lifting for a number of [...]]]></description><content:encoded><![CDATA[<p>Well, I guess I had to start "blogging" eventually...</p>
<p>Hi, I'm Amrit, the main CLI (Command Line Interface) and PCL (Python Control Layer) guy here at Splunk.  This means that I maintain our more common bash scripts (bin/splunk &amp;#38; friends), and our Python support scripts (site-packages/splunk/clilib/), which do the heavy lifting for a number of CLI &amp;#38; Web UI features.</p>
<p>These aren't the only things I work on, but they are the parts of the Splunk codebase that have consumed most of my time since starting here in December 2005.  I should also mention that Ivan Tam (no blog.. yet..?), who now works on the SplunkWeb UI, helped write the first implementation of the PCL during mid-2006.</p>
<p>Every now and then I'll post some tips &amp;#38; tricks related to the things I'm working on, which you'll hopefully find useful.</p>
<p>KTHXBAI</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/NOsHHIdotlY" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/amrit/2007/07/03/hiweb20/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Rant on Syslogd</title><link>http://feedproxy.google.com/~r/splunkdev/~3/qwD84sxyaww/</link><comments>http://blogs.splunk.com/mark/2007/04/25/rant-on-syslogd/</comments><pubDate>Thu, 26 Apr 2007 00:46:44 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[Syslogd really should either be modified or ditched for syslog-ng. As anyone who looks at logs knows, its crucial to have full, standard time stamps. This should include, HH:MM:SS:MS YYYY-MM-DD.Rfc3164 states :


5.1 Dates and Time

It has been found that some network administrators like to archive
their syslog messages over long periods of time.  It has [...]]]></description><content:encoded><![CDATA[<p>Syslogd really should either be modified or ditched for syslog-ng. As anyone who looks at logs knows, its crucial to have full, standard time stamps. This should include, HH:MM:SS:MS YYYY-MM-DD.Rfc3164 states :</p>
<pre>
<pre>
<h3><a name="section-5.1"></a>5.1 Dates and Time</h3>

It has been found that some network administrators like to archive
their syslog messages over long periods of time.  It has been seen
that some original syslog messages contain a more explicit time stamp
in which a 2 character or 4 character year field immediately follows
the space terminating the TIMESTAMP.  This is not consistent with the
original intent of the order and format of the fields.  If
implementers wish to contain a more specific date and time stamp
within the transmitted message, it should be within the CONTENT
field.  Implementers may wish to utilize the ISO 8601 [<a href="http://tools.ietf.org/html/rfc3164#ref-7">7</a>] date and
time formats if they want to include more explicit date and time
information.

Additional methods to address this desire for long-term archiving
have been proposed and some have been successfully implemented.  One
such method is that the network administrators may choose to modify
the messages stored on their collectors.  They may run a simple
script to add the year, and any other information, to each stored
record.  Alternatively, the script may replace the stored time with a
format more appropriate for the needs of the network administrators.
Another alternative has been to insert a record into the file that
contains the current year.  By association then, all other records
near that informative record should have been received in that same
year.  Neither of these however, addresses the issue of associating a
correct timezone with each record.</pre>
</pre>
<p>IMHO, this is backwards. We shouldn't require developers to put the year in the content field or have people post process logs to include the year.. Syslog should properly write out the year.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/qwD84sxyaww" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2007/04/25/rant-on-syslogd/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: How to modify the 2.1 UI’s default behaviour to only search recent events</title><link>http://feedproxy.google.com/~r/splunkdev/~3/IXVpme3b0eU/</link><comments>http://blogs.splunk.com/nick/2007/02/12/how-to-modify-the-21-uis-default-behaviour-to-only-search-recent-events/</comments><pubDate>Tue, 13 Feb 2007 05:14:47 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[iIf you only ever care about the last few hours or the last day of your data, this simple change will speed up your search results tremendously.  Until our next big release which will basically be this way by default,  here&#8217;s how you can do this in 2.1 code.
This is a change in [...]]]></description><content:encoded><![CDATA[<p>iIf you only ever care about the last few hours or the last day of your data, this simple change will speed up your search results tremendously.  Until our next big release which will basically be this way by default,  here's how you can do this in 2.1 code.</p>
<p>This is a change in three places, but fortunately very fast to make, and all in the same file.<br />
$SPLUNK_HOME/share/splunk/search/dynamic/main_ui.html</p>
<p>Note: The example here will set your UI to search only the past 6 hours by default.  After doing this it should be easy to see how to change it to search 1 day, or 45 minutes etc...</p>
<p>Note: Also you dont need to restart the front end to see these changes, but you DO have to refresh your browser by clicking the refresh button up top.</p>
<p>step 1) around line  70, change<br />
&amp;lt;div class="#productVersion#Version landingPageState #userType#User noTimeFields eventsTab relativeTimeMode #dynamicallySetStates#" id="outerWrapper" /&amp;gt;</p>
<p>to<br />
&amp;lt;div class="#productVersion#Version landingPageState #userType#User eventsTab relativeTimeMode #dynamicallySetStates#" id="outerWrapper" /&amp;gt;<br />
(basically this removes the 'noTimeFields' state so the time controls are now open by default)</p>
<p>step 2) around line 122 of the same file,  change<br />
&amp;lt;input type="text" id="relStartTime" /&amp;gt;</p>
<p>to<br />
&amp;lt;input type="text" value="6&amp;#8243; id="relStartTime" /&amp;gt;</p>
<p>(now the UI will load with "6&amp;#8243;  already entered into the relative start field)</p>
<p>step 3) around line 125, still in the same file,  change<br />
&amp;lt;option value="hours"&amp;gt;Hours ago&amp;lt;/option&amp;gt;</p>
<p>to<br />
&amp;lt;option value="hours" selected="selected"&amp;gt;Hours ago&amp;lt;/option&amp;gt;</p>
<p>(this means that hours will be selected by default. instead of minutes</p>
<p>That's it. You're done. Refresh your browser and the UI will now restrict it's searches to the most recent 6 hours by default.    If you really only ever care about the last 2 hours, switching it to 2 hrs may speed you up even more.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/IXVpme3b0eU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/02/12/how-to-modify-the-21-uis-default-behaviour-to-only-search-recent-events/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: quick way to allow you to autologin and run a search from a single link</title><link>http://feedproxy.google.com/~r/splunkdev/~3/TxT2uhiUMrw/</link><comments>http://blogs.splunk.com/nick/2007/02/09/quick-way-to-allow-you-to-autologin-and-run-a-search-from-a-single-link/</comments><pubDate>Sat, 10 Feb 2007 02:58:03 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[This is a quick update to Mark&#8217;s post from 10/9/2006
Again, to reiterate Mark&#8217;s qualifier - this is all assuming you understand that by doing this, you send users and passwords in clear text and the risks involved.
So, uncommenting the 2 lines as described in Mark&#8217;s post will only get you the first part, ie the [...]]]></description><content:encoded><![CDATA[<p>This is a quick update to <a href="http://blogs.splunk.com/mark/2006/10/09/allowing-users-to-log-in-with-http-get-in-21x/">Mark's post from 10/9/2006</a></p>
<p>Again, to reiterate Mark's qualifier - this is all assuming you understand that by doing this, you send users and passwords in clear text and the risks involved.</p>
<p>So, uncommenting the 2 lines as described in Mark's post will only get you the first part, ie the ability to send a GET request that logs you in. We've had people ask if that request can go further and also return results right away for a particular search they also pass in.  Obvious request but somehow we didnt anticipate it.</p>
<p>So until we wrap this feature up in a bow in a release, once again this involves editing python by hand.  And this time it's more than just uncommenting two lines.  It's cut and paste, and if you know python you know that tab-indentation is meaningful, and this seemingly simple action can be deadly.  You have been warned.  Back up the file and proceed carefully.</p>
<p>Alrighty, still with us?   =)  Find the 2 lines that Mark blogs about uncommenting.   (this will be XMLResource.py, line 395 - 400 ish depending on which 2.1 release this is)</p>
<p>Now replace those two lines with these lines below. NOTE: REPLACE HYPHENS WITH SPACES. wordpress seems to insist on removing leading spaces.</p>
<pre>--------if ("usr" in request.args) and ("pwd" in request.args) :
------------logger.info("user is attempting login on GET")
------------if ("q" in request.args) :
----------------logger.info("user attempting login on GET is requesting redirection to a permalink")
----------------sessNS = request.getSession().sessionNamespaces
----------------sessNS["postLoginRedirect"] = "/?q=" + request.args["q"][0]
 -  -  -  - return self.render_POST(request)</pre>
<p>now restart the python front end using splunk restartss  (a full splunk restart is not necessary)<br />
And now you'll have the ability to embed URL's like this in the webapp of your choice</p>
<p>http://your.host/login?usr=username&amp;#38;pwd=password&amp;#38;q=interestingTerm1%20interestingTerm2</p>
<p>UPDATE -  - </p>
<p>as pointed out in the first comment (thanks!!) the above snippet will happily fall into a recursive loop if the auth information it's given is incorrect.   New improved version below: (AGAIN, REPLACE LEADING HYPHENS WITH SPACES)</p>
<pre>--------if ("usr" in request.args) and ("pwd" in request.args) :
------------logger.info("user is attempting login on GET")
------------sessNS = request.getSession().sessionNamespaces
------------if ("cannotConnectToSplunkd" not in sessNS and "error" not in sessNS) :
----------------if ("q" in request.args) :
--------------------logger.info("user attempting login on GET is requesting redirection to a permalink")
--------------------sessNS["postLoginRedirect"] = "/?q=" + request.args["q"][0]
 -  -  -  -  - -return self.render_POST(request)</pre>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/TxT2uhiUMrw" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/02/09/quick-way-to-allow-you-to-autologin-and-run-a-search-from-a-single-link/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: one minute guide to making search results autorefresh</title><link>http://feedproxy.google.com/~r/splunkdev/~3/JrM_2vte96I/</link><comments>http://blogs.splunk.com/nick/2007/01/10/one-minute-guide-to-making-search-results-autorefresh/</comments><pubDate>Wed, 10 Jan 2007 08:37:10 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[Everybody wants this, and until the day when it&#8217;s built into the UI somewhere, you can use this little bookmarklet to do it in about a minute.
So. the link below is your friend.  (If you&#8217;ve used bookmarklets before you know what to do.  Otherwise, read on. )
Instead of clicking this link though, right-click [...]]]></description><content:encoded><![CDATA[<p>Everybody wants this, and until the day when it's built into the UI somewhere, you can use this little bookmarklet to do it in about a minute.</p>
<p>So. the link below is your friend.  (If you've used bookmarklets before you know what to do.  Otherwise, read on. )</p>
<p>Instead of clicking this link though, right-click it or option click it, and choose 'bookmark this link'.<br />
<a href="void(stateManager.refreshCurrentView());void(window.myInterval=setInterval('stateManager.refreshCurrentView()',30000));">splunk 30 second refresh</a></p>
<p>Once you've done that, then whenever you have Splunk loaded,  clicking that bookmark will run the tiny little script, and the upshot is that the UI will start autorefreshing in 30 seconds and every 30 seconds thereafter.</p>
<p>And if you want to change the 30 seconds, edit the bookmark, find the 30000 and change it to whatever.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/JrM_2vte96I" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2007/01/10/one-minute-guide-to-making-search-results-autorefresh/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Kim Wallace: Meet the plumber</title><link>http://feedproxy.google.com/~r/splunkdev/~3/V2t7GUPF6VU/</link><comments>http://blogs.splunk.com/kim/2006/12/14/meet-the-plumber/</comments><pubDate>Fri, 15 Dec 2006 02:37:43 +0000</pubDate><dc:creator>Kim Wallace</dc:creator><description><![CDATA[Hi! My name is Kim, and I&#8217;m the release engineer here at Splunk.
Thanks to my acquisition-happy former employer, Symantec, I&#8217;ve seen a variety of startup approaches to release engineering. Most frequently it seems some senior developer has a bug up you-know-where about how the build system should work, and some poor junior developer or sysadmin [...]]]></description><content:encoded><![CDATA[<p>Hi! My name is Kim, and I'm the release engineer here at Splunk.</p>
<p>Thanks to my acquisition-happy former employer, Symantec, I've seen a variety of startup approaches to release engineering. Most frequently it seems some senior developer has a bug up you-know-where about how the build system should work, and some poor junior developer or sysadmin type person dutifully does the drudge work (usually by hand). At other sites, some very diligent and detail-oriented person creates and executes a process with a great deal of record-keeping and attention to detail but often not a lot of automation. Consistency across different build platforms usually isn't a strong point.</p>
<p>Here at Splunk, things are a bit different. I called myself the plumber in the title of this post because that's how I see my job: I create and maintain the plumbing that produces consistent, reproducible Splunk builds across all of our platforms, with as much visibility as I can muster. I see my contribution more as enforcing process through tools  -  ideally, tools that enable process in a way that is more convenient for everyone than "doing it wrong"  -  rather than personally pushing all the buttons and scribbling in all the logbooks. And I've had the good fortune to come into a culture that encourages this approach.</p>
<p>Whew. That's a mouthful for an introduction. In the near future I hope to write a bit more about how the plumbing works, and some neat tools I've found along the way. I'm sure y'all will be waiting with baited breath. <img src='http://blogs.splunk.com/kim/wp-includes/images/smilies/icon_wink.gif' alt=';-)' /></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/V2t7GUPF6VU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/kim/2006/12/14/meet-the-plumber/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: selinux and splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/OETymwPI8iQ/</link><comments>http://blogs.splunk.com/mark/2006/11/21/selinux-and-splunk/</comments><pubDate>Wed, 22 Nov 2006 03:40:55 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[If you&#8217;ve enabled selinux for whatever reason, you need to either disable it or configure it to allow splunk to run.
To configure selinux to allow splunk to run, you need to run the chcon command on the splunk lib directory. Here is what you type :
chcon -c -v -R -u system_u -r object_r -t lib_t [...]]]></description><content:encoded><![CDATA[<p>If you've enabled selinux for whatever reason, you need to either disable it or configure it to allow splunk to run.</p>
<p>To configure selinux to allow splunk to run, you need to run the chcon command on the splunk lib directory. Here is what you type :</p>
<p>chcon -c -v -R -u system_u -r object_r -t lib_t $SPLUNK_HOME/lib 2&amp;gt;&amp;#38;1 &amp;gt; /dev/null<br />
You can also disable the check when splunk starts by adding this line to the $SPLUNK_HOME/bin/setSplunkEnv script</p>
<p>export SPLUNK_IGNORE_SELINUX=1</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/OETymwPI8iQ" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2006/11/21/selinux-and-splunk/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Telling Splunk to not phone home for update info.</title><link>http://feedproxy.google.com/~r/splunkdev/~3/9JvwC_1J4-M/</link><comments>http://blogs.splunk.com/mark/2006/11/21/telling-splunk-to-not-phone-home-for-update-info/</comments><pubDate>Wed, 22 Nov 2006 03:22:21 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[(2.1.1 only)
We&#8217;ve had a few people ask for this. Its going to be in the documentation eventually, but until then here is how you do it.
Edit $SPLUNK_HOME/etc/myinstall/search.xml
Change :
&#60;updateCheckerBaseURL&#62;http://quickdraw.splunk.com/js/&#60;/updateCheckerBaseURL&#62; &#60;updateCheckerBaseURL&#62;0&#60;/updateCheckerBaseURL&#62;
(2.1)
$SPLUNK_HOME/share/splunk/search/static/js/update_checker_pro.js.
At the top of the file, and within that same setup function, comment out these two lines:
createUpdateCheckerScriptlet();
setTimeout(&#8217;possiblyFallBackToCannotConnectMessage()&#8217;, 5000);
]]></description><content:encoded><![CDATA[<p>(2.1.1 only)</p>
<p>We've had a few people ask for this. Its going to be in the documentation eventually, but until then here is how you do it.</p>
<p>Edit $SPLUNK_HOME/etc/myinstall/search.xml</p>
<p>Change :</p>
<p>&amp;lt;updateCheckerBaseURL&amp;gt;http://quickdraw.splunk.com/js/&amp;lt;/updateCheckerBaseURL&amp;gt; &amp;lt;updateCheckerBaseURL&amp;gt;0&amp;lt;/updateCheckerBaseURL&amp;gt;</p>
<p>(2.1)</p>
<p>$SPLUNK_HOME/share/splunk/search/static/js/update_checker_pro.js.</p>
<p>At the top of the file, and within that same setup function, comment out these two lines:<br />
createUpdateCheckerScriptlet();<br />
setTimeout('possiblyFallBackToCannotConnectMessage()', 5000);</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/9JvwC_1J4-M" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2006/11/21/telling-splunk-to-not-phone-home-for-update-info/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Mark Cohen: Allowing users to log in with HTTP GET in 2.1x</title><link>http://feedproxy.google.com/~r/splunkdev/~3/Ci93p4sXVyM/</link><comments>http://blogs.splunk.com/mark/2006/10/09/allowing-users-to-log-in-with-http-get-in-21x/</comments><pubDate>Tue, 10 Oct 2006 04:41:31 +0000</pubDate><dc:creator>Mark Cohen</dc:creator><description><![CDATA[I&#8217;ve had to field a few of these requests so here goes.
Assuming you understand that by doing this, you send users and passwords in clear text and the risks involved.
There is a way to do this through http GET, but it requires modifying a bit of python.
Edit line 395 of XMLResourse.py located in $SPLUNK_HOME/lib/python2.4/site-packages/splunk/search/XMLResource.py
def render_GET(self, [...]]]></description><content:encoded><![CDATA[<p>I've had to field a few of these requests so here goes.</p>
<p>Assuming you understand that by doing this, you send users and passwords in clear text and the risks involved.</p>
<p>There is a way to do this through http GET, but it requires modifying a bit of python.</p>
<p>Edit line 395 of XMLResourse.py located in $SPLUNK_HOME/lib/python2.4/site-packages/splunk/search/XMLResource.py</p>
<p>def render_GET(self, request) :<br />
# backdoor so scripts can auto-login just with a GET request instead of having to craft a proper HTTP POST.  Doesnt help said script keep track of the cookie, which is the hard part.<br />
#if ("usr" in request.args) and ("pwd" in request.args) :<br />
#    return self.render_POST(request)<br />
logger.debug("LoginResource.render_GET")<br />
sessNS = request.getSession().sessionNamespaces</p>
<p>Uncomment out the if and return lines and restart splunk.</p>
<p>To log in, you would enter this URL</p>
<p>http://your.host/login?usr=username&amp;#38;pwd=password</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/Ci93p4sXVyM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/mark/2006/10/09/allowing-users-to-log-in-with-http-get-in-21x/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: Auto host resolving in splunk using python</title><link>http://feedproxy.google.com/~r/splunkdev/~3/gY5xRB-jX88/</link><comments>http://blogs.splunk.com/brian/2006/07/05/auto-host-resolving-in-splunk-using-python/</comments><pubDate>Thu, 06 Jul 2006 06:55:46 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[This only works in 2.0.x
Ok so I&#8217;ve had a couple of people ask me how to resovle the ip addresses in their syslog files to their hostnames in splunk.
There&#8217;s no way to do this just by tweaking a config variable .. we need to dig a little deeper under the surface. It&#8217;s actually pretty easy [...]]]></description><content:encoded><![CDATA[<p>This only works in 2.0.x<br />
Ok so I've had a couple of people ask me how to resovle the ip addresses in their syslog files to their hostnames in splunk.<br />
There's no way to do this just by tweaking a config variable .. we need to dig a little deeper under the surface. It's actually pretty easy to get splunk to call out to python during event processing so I've used that functionality to solve this problem. </p>
<p><strong>Note that this will negatively impact indexing performance but it should work until we get this behavior baked into splunk. </strong></p>
<p>First up I've created a python script that calls socket.gethostbyaddr to resolve the hosts. It will also cache the results so that the performance hit for dns misses is reduced.<br />
So copy and paste the following into your favorite editor and save it to &amp;lt;SPLUNK_HOME&amp;gt;lib/python2.4/site-packages/splunk/pyHostNameResolve.py . This directory is where the dynamic loaded python will look for scripts; the filename will be referenced later in a config change.</p>
<pre>
<code>
#Copyright (C) 2006 Splunk Inc. All Rights Reserved. This work contains trade
#secrets and confidential material of Splunk Inc., and its use or disclosure in
#whole or in part without the express written permission of Splunk Inc. is prohibited.

from pipeline_data import PipelineDataWrapper #This is a virtual module/class that gets inserted into the python namespace at runtime by splunk
import traceback
import socket

#Set global variables
HOST_KEY = "MetaData:Host"

HOST_RESOLVE_MAP = {} #cache so we don't have to call gethostbyaddr ( expensive ) every event

def resolveHost( pdata, confDictString ):
    global HOST_RESOLVE_MAP
    try:

        host = pdata.get(HOST_KEY)

        resolvedHostName = None

        if host.startswith("host::") :
            host = host[6:]

        if host in HOST_RESOLVE_MAP:
            resolvedHostName = HOST_RESOLVE_MAP[ host ]

        if not resolvedHostName:
            try:
                resolved = socket.gethostbyaddr(host)
                resolvedHostName = resolved[0]
                HOST_RESOLVE_MAP[ host ] = resolvedHostName
            except:
                HOST_RESOLVE_MAP[ host ] = host
                print "Could not resolve " + host
                return 1

        if resolvedHostName :
            pdata.put( HOST_KEY, "host::"+resolvedHostName )

        return 1    

    except:
        print "EXCEPTION !!"
        traceback.print_exc()
        return -1
</code>
</pre>
<p>Ok now open your &amp;lt;SPLUNK_HOME&amp;gt;/etc/myinstall/splunkd.xml and insert the following chunk of xml between the diskusageprocessor and the bytequotaprocessor in the indexerpipe pipeline :</p>
<pre>
               &amp;lt;processor name="hostnameresolver" plugin="pythonprocessor"&amp;gt;
                                 &amp;lt;config&amp;gt;
                                         &amp;lt;scriptFilename&amp;gt;splunk.pyHostNameResolve&amp;lt;/scriptFilename&amp;gt;
                                         &amp;lt;command&amp;gt;resolveHost&amp;lt;/command&amp;gt;
                                         &amp;lt;pyContext&amp;gt;resolveContext&amp;lt;/pyContext&amp;gt;
                                         &amp;lt;pyConfig&amp;gt;&amp;lt;![CDATA[]]&amp;gt;&amp;lt;/pyConfig&amp;gt;
                                 &amp;lt;/config&amp;gt;
                         &amp;lt;/processor&amp;gt;
</pre>
<p>Ok now fire up splunk and you should start seeing your hosts getting resolved. <strong>Note that this will negatively impact performance but it should work until we get this behavior baked into splunk. </strong><br />
Cheers,<br />
Brian</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/gY5xRB-jX88" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2006/07/05/auto-host-resolving-in-splunk-using-python/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: Splunk Cheat Sheet !</title><link>http://feedproxy.google.com/~r/splunkdev/~3/hK-gSo3AXWU/</link><comments>http://blogs.splunk.com/brian/2006/04/27/splunk-cheat-sheet/</comments><pubDate>Fri, 28 Apr 2006 05:26:29 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[I&#8217;ve been pretty busy so I haven&#8217;t updated for a while but I thought I should share this :
Corey Shields has made a great splunk cheat sheet ! It&#8217;s available at :  http://staff.osuosl.org/~cshields/?p=140 
It&#8217;s pretty awesome, and I&#8217;m recommending that everyone I know that uses splunk downloads it.
Until next time,
Brian
]]></description><content:encoded><![CDATA[<p>I've been pretty busy so I haven't updated for a while but I thought I should share this :<br />
Corey Shields has made a great splunk cheat sheet ! It's available at : <a href="http://staff.osuosl.org/~cshields/?p=140"> http://staff.osuosl.org/~cshields/?p=140 </a><br />
It's pretty awesome, and I'm recommending that everyone I know that uses splunk downloads it.<br />
Until next time,<br />
Brian</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/hK-gSo3AXWU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2006/04/27/splunk-cheat-sheet/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: hip deep in fastmovingness</title><link>http://feedproxy.google.com/~r/splunkdev/~3/qvN0kcXVFto/</link><comments>http://blogs.splunk.com/nick/2006/03/30/hip-deep-in-fastmovingness/</comments><pubDate>Thu, 30 Mar 2006 15:33:19 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[Full speed ahead for the next big round of improvements and fixes and we&#8217;re all going cheerfully bonkers.  I was especially cheerful/bonkers today because I spent the morning prototyping some SVG stuff.  In particular, since the splunk ui runs almost entirely on xml and client-side xslt, I was looking into how feasible/fast/stable it [...]]]></description><content:encoded><![CDATA[<p>Full speed ahead for the next big round of improvements and fixes and we're all going cheerfully bonkers.  I was especially cheerful/bonkers today because I spent the morning prototyping some SVG stuff.  In particular, since the splunk ui runs almost entirely on xml and client-side xslt, I was looking into how feasible/fast/stable it would be for our client-side XSL to just generate SVG directly, and for javascript to clone those svg nodes into a big complex DOM.</p>
<p>The answer is - omg it works well. Fast, seemingly stable, it can be pushed. Even in a big javascript front end like ours, the event handlers on svg elements pass right up into our existing framework.  Some small tweaks had to be made to accomodate it, but no showstoppers.  And it is rare for such a complicated thing to present so few obstacles in practice.<br />
So thanks to Mozilla for being generally awesome, and particularly for turning on SVG in their release builds . Of course i have absolutely no idea if any svg will ever appear in the product ...  We do after all have a great deal of other more mundane improvements in the works.  =)</p>
<p>Also, my apologies for not talking about skins. I had wanted to post a big treatise on skins, but since I'm rewriting and reworking all the css right now, it would be way too cruel; any skins you made would die with the upgrade to 1.3. So Im saving the post for another day.</p>
<p>Until then, for the indomitably curious, suffice it to say that the 'invert' link hidden in the footer actually cycles through the skin list, and the fact that there are currently only two skins shipping in the product does not prevent you the user from hacking the front end and making a third, fourth skin, etc...</p>
<p>if you're feeling adventurous, put another skin file in [opt/splunk]/share/splunk/search/static/css/skins/,  crack open photoshop and use all the existing skin graphics as a base to make your new skin from,<br />
and as for how to hook up this new skin,  for later 1.2 dot releases, look for the skinFileList in [opt/splunk]lib/python2.4/site-packages/splunk/search/SearchService.py</p>
<p>(for I think 1.2 and older builds,   the list of skins was just the link tags in share/splunk/search/dynamic/main_ui.html</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/qvN0kcXVFto" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2006/03/30/hip-deep-in-fastmovingness/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Nick Mealy: UI tinkerings</title><link>http://feedproxy.google.com/~r/splunkdev/~3/d2p-TQpJFUM/</link><comments>http://blogs.splunk.com/nick/2006/03/15/ui-tinkerings/</comments><pubDate>Wed, 15 Mar 2006 15:35:36 +0000</pubDate><dc:creator>Nick Mealy</dc:creator><description><![CDATA[First post, so i&#8217;ll begin at the beginning.
Im the front-end guy.  From the xsl,js,html and css on the client side, up to the python in splunkSearch, I am responsible (read: to blame) for the current implementation, and also for much of the interaction design.   I&#8217;ve worked here just over a year now, [...]]]></description><content:encoded><![CDATA[<p>First post, so i'll begin at the beginning.</p>
<p>Im the front-end guy.  From the xsl,js,html and css on the client side, up to the python in splunkSearch, I am responsible (read: to blame) for the current implementation, and also for much of the interaction design.   I've worked here just over a year now, and I have no noticeable scars or weird tics to show for it, so I guess I've got that going for me.</p>
<p>What possessed me to come work here:  Essentially all of my experience before splunk was at services companies, and for the prior 3 or 4 years specifically I had become this sort of high-throughput template builder and dhtml-specialist. Boring stuff I know, but I mention it because I came to splunk partly to get away from this.  You can build lots of really complex front ends while at services companies. You can do build for flexibility and simplicity all you want, but when the project is over you never really see the code again (or worse the code never gets updated or changed by anyone and so it never evolves at all).  So outside of maybe some escalated issues, you never really know what were the good parts of the implementation and what were the bad.  And the development pain induced by changing requirements, the codebase evolution and accidental devolution, the day to day suffering really, you get spared from all that and that sucks.</p>
<p>So here at splunk I get all that too. I'm not just some head-in-the-clouds master-template builder, I'm actually the poor slob responsible for maintaining it too.  =)</p>
<p>One nice silver lining is that sometimes I'll get rewarded with these little gems of code that have been going on a journey towards the nonsensical. Where say, at one point it did something simple and made sense, then something complicated was factored in alongside, then at 2AM before such and such release that changed, then later there was a quick tweak to address someones feedback, then another change etc... and then suddenly you look at it one day and its just a stunningly silly little thing and you throw it away and rewrite it in 5 minutes.</p>
<p>I'll try and post again in a couple days.  There's a lot of dry topics that I would love to ramble on about, but this post is pretty parched actually, now that I read it.</p>
<p>So maybe instead I'll post on how to create your own skin or something.</p>
<p>or even better, I'll update you on recent injuries imparted by our skateboard-trapeze-of-death... (so far Rory and Marc have had the only mishaps, Rory had nothing broken, and Marc escaped with no injuries at all)</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/d2p-TQpJFUM" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/nick/2006/03/15/ui-tinkerings/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: Splunking from Python Part I</title><link>http://feedproxy.google.com/~r/splunkdev/~3/G-cQG4vpjvs/</link><comments>http://blogs.splunk.com/brian/2006/03/14/splunking-from-python-part-i/</comments><pubDate>Wed, 15 Mar 2006 02:10:22 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[One of the neat things about splunk is that it&#8217;s search interface is a SOAP call. In this post I&#8217;m going to talk about using the python modules that ship with splunk to talk to splunk over this SOAP interface.
First off you will need to set some environment variables so that you are running the [...]]]></description><content:encoded><![CDATA[<p>One of the neat things about splunk is that it's search interface is a SOAP call. In this post I'm going to talk about using the python modules that ship with splunk to talk to splunk over this SOAP interface.<br />
First off you will need to set some environment variables so that you are running the version of python that ships with splunk :</p>
<p><code><br />
export SPLUNK_HOME=&amp;lt;WHERE_YOU_INSTALLED_SPLUNK&amp;gt; <br />
export PATH=$SPLUNK_HOME/bin:$PATH<br />
export LD_LIBRARY_PATH=$SPLUNK_HOME/lib:$LD_LIBRARY_PATH<br />
</code></p>
<p>Ok so now you should be good to go so fire up python. Your python version should be 2.4.2. If it's not do a "which python" from the command prompt to make sure you are using the python that shipped with splunk.<br />
We need to do some setup before any searches can be run :</p>
<p><code><br />
Python 2.4.2 (#1, Mar  11 2009, 21:45:07)<br />
[GCC 4.0.2] on linux2<br />
Type "help", "copyright", "credits" or "license" for more information.<br />
</code><br />
<code><br />
&amp;gt;&amp;gt;&amp;gt; import splunk.search.splunkTest #initialize the python internals without using twistd<br />
&amp;gt;&amp;gt;&amp;gt; import splunk.search.SearchCore as SearchCore #This is the module we are going to use to issue searches<br />
</code><br />
If you want to run against a remote splunk server or on different ports you can run the following :</p>
<p><code><br />
&amp;gt;&amp;gt;&amp;gt; SearchCore.SearchService.gSearchService._searchEngineURL = "http://&amp;lt;remote_host&amp;gt;:&amp;lt;searchengine_port&amp;gt;"<br />
</code></p>
<p>The method on the SearchCore module that executes the queries is called <i>runQuery</i> and it takes two arguments.</p>
<p><code><br />
def runQuery(queryString, userStr )<br />
</code></p>
<p>The userStr can be any string for now; in future releases it will probably be an auth token. It is the user that your searches will appear under in the searchhistory domain. <br />
The queryString is where the magic happens <img src='http://blogs.splunk.com/brian/wp-includes/images/smilies/icon_smile.gif' alt=')' /> . <br />
Basically a query string contains three major elements. </p>
<p>QUERY : Terms following this are as you would see in the splunk web ui search box. This pulls the resulting ids into an id space internally in the query. <br />
GET : Terms following this instruct splunk on what extract from ids in the id space into results the result space.<br />
OUTPUT : How to format the results from the result space to output.</p>
<p>For a more detailed reference on the query syntax check out : http://www.splunk.com/index.php/docs?doc=developer.html&amp;#38;vers=#58 </p>
<p>Now for our first search :</p>
<p>The meta::all key is a splunk key that every event in the system will have. </p>
<p><code><br />
&amp;gt;&amp;gt;&amp;gt; SearchCore.runQuery("QUERY meta::all","brian")<br />
</code></p>
<p>You will get the result "&amp;lt;queryResult&amp;gt;&amp;lt;/queryResult&amp;gt;" from this as we have not specified an OUTPUT element. Note that unless you specify a domain to run these queries in they will run in the default index ( main ).</p>
<p>Run :<br />
<code><br />
&amp;gt;&amp;gt;&amp;gt; SearchCore.runQuery("QUERY meta::all OUTPUT splunkui::1.0&amp;#8243;,"brian") # We use the splunkui output here because we want to do things that the ui does like get events ...<br />
</code></p>
<p>Now the result is :<br />
<code><br />
&amp;lt;queryResult&amp;gt;&amp;lt;eventIndexedCount&amp;gt;58728&amp;lt;/eventIndexedCount&amp;gt;<br />
&amp;lt;ids&amp;gt;<br />
&amp;lt;/ids&amp;gt;<br />
&amp;lt;projectedResultCount&amp;gt;1001&amp;lt;/projectedResultCount&amp;gt;<br />
&amp;lt;clampedStartTime&amp;gt;1049204073&amp;lt;/clampedStartTime&amp;gt;<br />
&amp;lt;clampedEndTime&amp;gt;1142300808&amp;lt;/clampedEndTime&amp;gt;<br />
&amp;lt;/queryResult&amp;gt;<br />
</code></p>
<p>Of course your numbers will be different. <br />
The projected result count element is legacy and can be safely ignored. <br />
The eventIndexedCount is the total number of events in this domain.<br />
The clampedStartTime/clampedEndTime constrain the timerange in which results for this query may appear.</p>
<p>Note there is still no event output ... lets fix that :</p>
<p><code><br />
&amp;gt;&amp;gt;&amp;gt; SearchCore.runQuery("QUERY meta::all GET events::0-2 OUTPUT splunkui::1.0 format::raw", "brian")  #The format::raw tells the outputter to ignore all segment information<br />
</code></p>
<p>Results :</p>
<p><code><br />
&amp;lt;queryResult&amp;gt;&amp;lt;eventIndexedCount&amp;gt;19704&amp;lt;/eventIndexedCount&amp;gt;<br />
&amp;lt;ids&amp;gt;<br />
&amp;lt;/ids&amp;gt;<br />
&amp;lt;projectedResultCount&amp;gt;1001&amp;lt;/projectedResultCount&amp;gt;<br />
&amp;lt;clampedStartTime&amp;gt;1041618608&amp;lt;/clampedStartTime&amp;gt;<br />
&amp;lt;clampedEndTime&amp;gt;1142302043&amp;lt;/clampedEndTime&amp;gt;<br />
&amp;lt;results type="events"&amp;gt;      <br />
          &amp;lt;result cd="0:1532081&amp;#8243;&amp;gt;<br />
                &amp;lt;segtext xml:space="preserve"&amp;gt;Oct 14 16:29:38 liftoff sendmail[20336]: i9ENTcHf020336: from=&amp;lt;erik@transaction-engines.com&amp;gt;, size=667, class=0, nrcpts=1, msgid=&amp;lt;416F0BE2.3060306@transaction-engines.com&amp;gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&amp;lt;/segtext&amp;gt;           <br />
                &amp;lt;timestamp&amp;gt;1141691378&amp;lt;/timestamp&amp;gt;               <br />
                &amp;lt;source cd="1&amp;#8243; string="/opt/splunk/var/spool/splunk/maillog"&amp;gt;<br />
                        &amp;lt;dir&amp;gt;/opt/splunk/var/spool/splunk/&amp;lt;/dir&amp;gt;<br />
                        &amp;lt;file&amp;gt;maillog&amp;lt;/file&amp;gt;<br />
                &amp;lt;/source&amp;gt;<br />
                &amp;lt;host cd="1&amp;#8243;&amp;gt;localhost&amp;lt;/host&amp;gt;<br />
                &amp;lt;sourcetype cd="1&amp;#8243; base="sendmail_syslog"&amp;gt;sendmail_syslog&amp;lt;/sourcetype&amp;gt;<br />
                &amp;lt;type cd="38&amp;#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0  "&amp;gt;<br />
                        &amp;lt;tags&amp;gt;&amp;lt;tag&amp;gt;transaction&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;class&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;sendmail&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;com&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;size&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;net&amp;lt;/tag&amp;gt;&amp;lt;/tags&amp;gt;<br />
                &amp;lt;/type&amp;gt;<br />
          &amp;lt;/result&amp;gt;       <br />
          &amp;lt;result cd="0:2223455&amp;#8243;&amp;gt;<br />
                &amp;lt;segtext xml:space="preserve"&amp;gt;Oct 18 15:14:27 liftoff sendmail[2527]: i9IMERup002527: from=&amp;lt;erik@transaction-engines.com&amp;gt;, size=3690, class=0, nrcpts=1, msgid=&amp;lt;41744043.3060306@transaction-engines.com&amp;gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&amp;lt;/segtext&amp;gt;                <br />
                &amp;lt;timestamp&amp;gt;1141686867&amp;lt;/timestamp&amp;gt;          <br />
                &amp;lt;source cd="1&amp;#8243; string="/opt/splunk/var/spool/splunk/maillog"&amp;gt;<br />
                        &amp;lt;dir&amp;gt;/opt/splunk/var/spool/splunk/&amp;lt;/dir&amp;gt;<br />
                        &amp;lt;file&amp;gt;maillog&amp;lt;/file&amp;gt;<br />
                &amp;lt;/source&amp;gt;<br />
                &amp;lt;host cd="1&amp;#8243;&amp;gt;localhost&amp;lt;/host&amp;gt;<br />
                &amp;lt;sourcetype cd="1&amp;#8243; base="sendmail_syslog"&amp;gt;sendmail_syslog&amp;lt;/sourcetype&amp;gt;<br />
                &amp;lt;type cd="38&amp;#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0  "&amp;gt;<br />
                        &amp;lt;tags&amp;gt;&amp;lt;tag&amp;gt;transaction&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;class&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;sendmail&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;com&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;size&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;net&amp;lt;/tag&amp;gt;&amp;lt;/tags&amp;gt;<br />
                &amp;lt;/type&amp;gt;<br />
          &amp;lt;/result&amp;gt;       <br />
          &amp;lt;result cd="0:3155870&amp;#8243;&amp;gt;<br />
                &amp;lt;segtext xml:space="preserve"&amp;gt;Oct 21 14:03:53 liftoff sendmail[11725]: i9LL3quJ011725: from=&amp;lt;erik@transaction-engines.com&amp;gt;, size=2663, class=0, nrcpts=1, msgid=&amp;lt;41782438.7060303@transaction-engines.com&amp;gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&amp;lt;/segtext&amp;gt;<br />
                &amp;lt;timestamp&amp;gt;1141423433&amp;lt;/timestamp&amp;gt;          <br />
                &amp;lt;source cd="1&amp;#8243; string="/opt/splunk/var/spool/splunk/maillog"&amp;gt;<br />
                        &amp;lt;dir&amp;gt;/opt/splunk/var/spool/splunk/&amp;lt;/dir&amp;gt;<br />
                        &amp;lt;file&amp;gt;maillog&amp;lt;/file&amp;gt;<br />
                &amp;lt;/source&amp;gt;<br />
                &amp;lt;host cd="1&amp;#8243;&amp;gt;localhost&amp;lt;/host&amp;gt;<br />
                &amp;lt;sourcetype cd="1&amp;#8243; base="sendmail_syslog"&amp;gt;sendmail_syslog&amp;lt;/sourcetype&amp;gt;<br />
                &amp;lt;type cd="38&amp;#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0  "&amp;gt;<br />
                        &amp;lt;tags&amp;gt;&amp;lt;tag&amp;gt;transaction&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;class&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;sendmail&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;com&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;size&amp;lt;/tag&amp;gt;&amp;lt;tag&amp;gt;net&amp;lt;/tag&amp;gt;&amp;lt;/tags&amp;gt;<br />
                &amp;lt;/type&amp;gt;<br />
          &amp;lt;/result&amp;gt;<br />
&amp;lt;/results&amp;gt;<br />
&amp;lt;/queryResult&amp;gt;<br />
</code></p>
<p>Now you can see the actual event text in the segtext element in the results.<br />
If you want to get counts like you see in the tab headings in the splunkui you can use OUTPUT term <code>scheduler::1.0</code>.<br />
This will give you the following output : </p>
<p><code><br />
&amp;lt;queryResult&amp;gt;<br />
&amp;lt;schedResults&amp;gt;<br />
    &amp;lt;eventCount&amp;gt;10000+&amp;lt;/eventCount&amp;gt;<br />
    &amp;lt;hostCount&amp;gt;1+&amp;lt;/hostCount&amp;gt;<br />
    &amp;lt;sourceCount&amp;gt;1+&amp;lt;/sourceCount&amp;gt;<br />
    &amp;lt;typeCount&amp;gt;239+&amp;lt;/typeCount&amp;gt;<br />
    &amp;lt;sourceTypeCount&amp;gt;1+&amp;lt;/sourceTypeCount&amp;gt;<br />
    &amp;lt;eventtagCount&amp;gt;62+&amp;lt;/eventtagCount&amp;gt;<br />
    &amp;lt;starttime&amp;gt;12/31/1969:16:00:00&amp;lt;/starttime&amp;gt;<br />
    &amp;lt;endtime&amp;gt;03/13/2006:18:50:48&amp;lt;/endtime&amp;gt;<br />
&amp;lt;/schedResults&amp;gt;<br />
&amp;lt;/queryResult&amp;gt;<br />
</code></p>
<p>Note the + marks that are the equivalent of the &amp;gt; signs in the ui that tell you that there may be more than what is displayed.<br />
You may mix the splunkui and scheduler outputs in a single querystring. </p>
<p>Tune in next time where I'll explain some of the more advanced elements of the search language.</p>
<p>Brian</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/G-cQG4vpjvs" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2006/03/14/splunking-from-python-part-i/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: Slow queries and solutions.</title><link>http://feedproxy.google.com/~r/splunkdev/~3/vpOVB-cpD74/</link><comments>http://blogs.splunk.com/brian/2006/03/10/slow-queries-and-solutions/</comments><pubDate>Sat, 11 Mar 2006 05:07:25 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[Since the launch of the 1.2 product some people are experiencing really slow query times. This is especially noticable when you are running a live splunk pretty often, as this tends to fragment the database quiet a bit.
Fear not as there is a hidden undocumented call that you can make ! If you run the [...]]]></description><content:encoded><![CDATA[<p>Since the launch of the 1.2 product some people are experiencing really slow query times. This is especially noticable when you are running a live splunk pretty often, as this tends to fragment the database quiet a bit.</p>
<p>Fear not as there is a hidden undocumented call that you can make ! If you run the query "++cmd++::optimize" you will cause a database optimization. This call may take a while to return so use with care. Soon we will have a release with an auto-optimizer but if it's hampering your splunking right now you can create a live splunk to run every 10-30 mins that runs "++cmd++::optimize".</p>
<p>Laters,</p>
<p>Brian</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/vpOVB-cpD74" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2006/03/10/slow-queries-and-solutions/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>Brian Murphy: First Post</title><link>http://feedproxy.google.com/~r/splunkdev/~3/OYteMl1kbOk/</link><comments>http://blogs.splunk.com/brian/2006/03/07/28/</comments><pubDate>Tue, 07 Mar 2006 07:20:34 +0000</pubDate><dc:creator>Brian Murphy</dc:creator><description><![CDATA[First Post !
So this is the start of my splunk blog.
First up I&#8217;m splunk employee #1. Way back in Sept. 2004 I joined Erik, Rob and Michael when they were still based down in the VC offices in Palo Alto. I&#8217;m responsible for searches and indexing so if you have splunks that are taking WAAAY [...]]]></description><content:encoded><![CDATA[<p>First Post !</p>
<p>So this is the start of my splunk blog.</p>
<p>First up I'm splunk employee #1. Way back in Sept. 2004 I joined Erik, Rob and Michael when they were still based down in the VC offices in Palo Alto. I'm responsible for searches and indexing so if you have splunks that are taking WAAAY too long to complete I'm the person that's probably responsible.</p>
<p>I'll post more later on what I'm coding, struggling against or just hacking on.</p>
<p>Brian out.</p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/OYteMl1kbOk" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/brian/2006/03/07/28/</feedburner:origLink></item><item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><title>David Carasso: One Geeks Reasons for Splunk</title><link>http://feedproxy.google.com/~r/splunkdev/~3/vJXrGOyVVyU/</link><comments>http://blogs.splunk.com/david/2005/09/30/one-geeks-reasons-for-splunk/</comments><pubDate>Fri, 30 Sep 2005 15:04:16 +0000</pubDate><dc:creator>David Carasso</dc:creator><description><![CDATA[ I don&#8217;t think our website makes it painfully clear why you&#8217;d want Splunk.  Here is my view why you will want Splunk. 

What is Splunk?

Splunk is a search server that indexes all your log files. 
If you need to search and troubleshoot log files, you need Splunk.  It
handles any log format, including [...]]]></description><content:encoded><![CDATA[<p> I don't think our website makes it painfully clear why you'd want Splunk.<br />  <b>Here is my view why you will want Splunk. </b></p>
<hr />
<p><b>What is Splunk?</b></p>
<ul>
<p><font>Splunk is a search server that indexes all your log files.</font> </p>
<p>If you need to search and troubleshoot log files, you need Splunk.  It<br />
handles any log format, including syslog, Apache, Jboss, mysql,<br />
oracle, router data, etc.  It parses and indexes in real time.</p>
</ul>
<p><b>Grep works fine. Why do I need Splunk?</b></p>
<p><ul>
<code><b>grep</b></code> is totally fine for small, simple, local files, but <font><code><b>grep</b></code> doesn't<br />
work on 20GB of log files, across a dozen servers</font>; doesn't group<br />
multiline log messages together; doesn't unify timestamps across<br />
files; doesn't automatically find related log events; doesn't show<br />
histograms of log events; doesn't search gigabytes in seconds; doesn't<br />
have a cool ajax web interface similar to google.
</ul>
</p>
<p><b>What are multiline log messages?</b></p>
<p><ul>
As an example, java exceptions look like this:</p>
<ul>
[source:java]java.lang.reflect.UndeclaredThrowableException<br />
	at $Proxy231.getAllAttributes(Unknown Source)<br />
	at com.collation.proxy.clientproxy.common.Module.getModelObject(Module.java:326)<br />
	at com.collation.proxy.clientproxy.server.action.ChangeHistoryModule.getDependencies(ChangeHistoryModule.java:402)<br />
	at com.collation.proxy.clientproxy.server.action.ChangeHistoryModule.getIdsWithDependencies(ChangeHistoryModule.java:386)<br />
	...<br />
[/source]
</ul>
<p><font>You can't use<br />
<code><b>grep</b></code> to search for java proxy exceptions because<br />
"Exception" and "proxy" don't occur on the same line!</font> The same<br />
would apply to sql, router data, email, or any other multiline event.<br />
<font>Splunk automatically groups<br />
multiline events into single events</font>, so the above exception<br />
would become one event.  Splunk does this with advanced heuristics and<br />
machine learning algorithms, as well as customizeable groupping rules.
</p>
</ul>
<p><b>What about unifying timestamps?</b></p>
<ul>
<p>Most log files have timestamps embedded in them.  Splunk understands<br />
dozens and dozens of timestamp formats, unifying them across<br />
timezones.  Some log files write events out as GMT (Greenwich Mean<br />
Time) some as local time such as PST (Pacific Standard Time).  Some<br />
logs can come from servers on the east coast, some from the west<br />
coast, or beyond.  <font>By<br />
normalizing all these timeszones in dozens of timestamp formats,<br />
Splunk allows you to say "What happened at 11:57pm", world-wide,<br />
across all my log files, across all my servers.</font> "I got an error<br />
at 1:15am yesterday.  Show me the log events from all my logs just<br />
before 1:15&amp;#8243;.
</ul>
</p>
<p><b>OK, one more.  What are related log events?</b></p>
<ul>
<p>Suppose you see suspecious activity or an error.  Just ask Splunk to<br />
find logs related to that activity.  It'll find logs that have the<br />
same IP, UserID, URL, codes, etc.  If there was a problem with an IP,<br />
Splunk will show you all the related events for that IP; same for<br />
UserID, URL, or any other code.  You can even <font>ask Splunk to show you events sorted<br />
by how unexpected they are!</font></p>
</ul>
<p><b>How much does Splunk cost?</b></p>
<p><ul>
The Splunk Personal Server is <font>Free</font>.  Give it a try.</p>
</ul>
<p><b>How can I get Splunk?</b></p>
<p><ul>
Go to: <a href="http://www.splunk.com?ac=kilroy">www.splunk.com</a>
</ul></p>
<img src="http://feeds.feedburner.com/~r/splunkdev/~4/vJXrGOyVVyU" height="1" width="1"/>]]></content:encoded><feedburner:origLink>http://blogs.splunk.com/david/2005/09/30/one-geeks-reasons-for-splunk/</feedburner:origLink></item></channel>
</rss>
