<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4670756400590062347</id><updated>2024-09-10T14:19:46.391+00:00</updated><category term="Manchester"/><category term="NorthGrid"/><category term="Lancaster"/><category term="Sheffield"/><category term="Liverpool"/><category term="upgrade"/><category term="SL4"/><category term="atlas"/><category term="manchester bdii"/><category term="64bit"/><category term="cvmfs"/><category term="dpm"/><category term="installation"/><category term="manchester dpm optimization"/><category term="network"/><category term="system"/><category term="1.7.4"/><category term="1.8.2"/><category term="5.5"/><category term="APEL"/><category term="Availability"/><category term="EMI-3"/><category term="Lancaster SL4"/><category term="Manchester BDII CE"/><category term="Manchester HammerCloud results"/><category term="Security"/><category term="VOMS"/><category term="apel nagios alert remove"/><category term="atlas athena"/><category term="atlas file system tweak xfs sl5 jobs efficiency manchester"/><category term="batch"/><category term="database"/><category term="decomissioning"/><category term="dell"/><category term="fabric"/><category term="file"/><category term="file system xfs sl5 read ahead tweak kernel liverpool"/><category term="files"/><category term="firewall"/><category term="infrastructure"/><category term="jobmanager"/><category term="leaks"/><category term="log"/><category term="machine room"/><category term="machines"/><category term="manchester apel sl5 glite installation"/><category term="manchester file systems worker nodes evaluation"/><category term="manchester network upgrade improved rates perfsonar debugging"/><category term="manchester new hardware computing nodes storage"/><category term="manchester scripts system administration monitoring sharing"/><category term="manchester squid mrtg snmp atlas monitoring"/><category term="memory"/><category term="mysql"/><category term="optimization"/><category term="parsing"/><category term="poweredge"/><category term="settings"/><category term="syncronisation"/><category term="torque"/><title type='text'>Northgrid-tech</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default?start-index=26&amp;max-results=25'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>150</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-7714531575447506110</id><published>2015-02-17T11:31:00.000+00:00</published><updated>2015-02-24T12:09:15.557+00:00</updated><title type='text'>Replacing the Condor Defrag Daemon</title><content type='html'>I&#39;ve replaced the standard DEFRAG daemon released with Condor with a simpler version that contains a proportional integral (PI) controller. 

I hoped this would give us better control over multicore slots. Preliminary results with the proportional part of the controller show that it fails to keep accurate control over the provision of slots. It is subject to hunting due to the long time lags between the onset of drainin and the eventual change in the controlled variable (which is &#39;running mcore jobs&#39;). The rate of provision was unexpectedly stable at first, considering the simplicity of the  algorithm employed, but degraded over time as the controlled variable became more random.&lt;br /&gt;
&lt;br /&gt;
The graph below shows the very preliminary picture, with a temporary period of stable control shown by the green line on the right of the plot. The setpoint is 250.
&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDrUGuTV-hUVY9xZFh7ei1LcuQThZil3kPeNDDlS6p4m1TXLVcaBcLGKD4GKTH8VvtylP4IGQR5PkqBoYXzuE-_espftgNz8h9-m_uVDGBQ2YNOYamhzQGL5o3di5r6jZOXO4fmI-wlwo/s1600/plot.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDrUGuTV-hUVY9xZFh7ei1LcuQThZil3kPeNDDlS6p4m1TXLVcaBcLGKD4GKTH8VvtylP4IGQR5PkqBoYXzuE-_espftgNz8h9-m_uVDGBQ2YNOYamhzQGL5o3di5r6jZOXO4fmI-wlwo/s1600/plot.png&quot; height=&quot;160&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I have also now included an Integral component to the controller, and I&#39;m in the process of tuning the reset rate on this. I hope to show the results of this test soon.&lt;br /&gt;
</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/7714531575447506110/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/7714531575447506110' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/7714531575447506110'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/7714531575447506110'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2015/02/replacing-condor-defrag-daemon.html' title='Replacing the Condor Defrag Daemon'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDrUGuTV-hUVY9xZFh7ei1LcuQThZil3kPeNDDlS6p4m1TXLVcaBcLGKD4GKTH8VvtylP4IGQR5PkqBoYXzuE-_espftgNz8h9-m_uVDGBQ2YNOYamhzQGL5o3di5r6jZOXO4fmI-wlwo/s72-c/plot.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-3234621758948980729</id><published>2014-11-17T16:52:00.000+00:00</published><updated>2014-11-17T16:52:26.909+00:00</updated><title type='text'>Condor Workernode Heath Script</title><content type='html'>This is a script that makes some checks on the worker node and &quot;turns it off&quot; if it fails any of them.  To implement this, I made use a a Condor feature; startd_cron jobs. I put this in my /etc/condor_config.local file on my worker nodes.

&lt;pre&gt;
ENABLE_PERSISTENT_CONFIG = TRUE
PERSISTENT_CONFIG_DIR = /etc/condor/ral
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs
StartJobs = False
RalNodeOnline = False

&lt;/pre&gt;

I use the prefix &quot;Ral&quot; here because I inherited some of this material from Andrew Lahiffe at RAL! Basically, it&#39;s just to de-conflict names. I should have used &quot;Liv&quot; right from the start, but I&#39;m not changing it now.

Anyway, the first section says to keep a persistent record of configuration settings; it adds new configuration settings called &quot;StartJobs&quot; and “RalNodeOnline”; it&#39;s sets them initially to False; and it makes the START configuration setting dependant upon them both being set.  Note: the START setting is very important because the node won&#39;t start jobs unless it is True. 

I also need this. It tells the system (startd) to run a cron script every three minutes.

&lt;pre&gt;
STARTD_CRON_JOBLIST=TESTNODE
STARTD_CRON_TESTNODE_EXECUTABLE=/usr/libexec/condor/scripts/testnodeWrapper.sh
STARTD_CRON_TESTNODE_PERIOD=300s

# Make sure values get over
STARTD_CRON_AUTOPUBLISH = If_Changed
&lt;/pre&gt;

The testnodeWrapper.sh script looks like this:

&lt;pre&gt;#!/bin/bash

MESSAGE=OK

/usr/libexec/condor/scripts/testnode.sh &gt; /dev/null 2&gt;&amp;1
STATUS=$?

if [ $STATUS != 0 ]; then
  MESSAGE=`grep ^[A-Z0-9_][A-Z0-9_]*=$STATUS\$ /usr/libexec/condor/scripts/testnode.sh | head -n 1 | sed -e &quot;s/=.*//&quot;`
  if [[ -z &quot;$MESSAGE&quot; ]]; then
    MESSAGE=ERROR
  fi
fi

if [[ $MESSAGE =~ ^OK$ ]] ; then
  echo &quot;RalNodeOnline = True&quot;
else
  echo &quot;RalNodeOnline = False&quot;
fi
echo &quot;RalNodeOnlineMessage = $MESSAGE&quot;

echo `date`, message $MESSAGE &gt;&gt; /tmp/testnode.status
exit 0

&lt;/pre&gt;

This just wraps an existing script which I reuse from out TORQUE/MAUI cluster. The existing script just returns a non-zero code if any error happens. To add a bit of extra info, I also lookup the meaning of the code. The important thing to notice is that it echoes out a line to set the RalNodeOnline setting to false. This is then used in the setting of START. Note: on TORQUE/MAUI, the script ran as “root”; here it runs as “condor”. I had to use sudo for some of the sections which (e.g.) check disks etc. because condor could not get smartctl settings etc.

Right, so I think that&#39;s it. When a node fails the test, START goes to False and the node won&#39;t run more jobs.

Oh, there&#39;s another thing to say. I use two settings to control START. As well as RalNodeOnline, I have the StartJobs setting. I can control this independently, so I can turn a node offline whether or not it has an error. This is useful for stopping the node to (say) rebuild it. It&#39;s done on the server, like this.

&lt;pre&gt;
condor_config_val -verbose -name r21-n01 -startd -set &quot;StartJobs = false&quot;
condor_reconfig r21-n01
condor_reconfig -daemon startd r21-n01
&lt;/pre&gt;

</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/3234621758948980729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/3234621758948980729' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3234621758948980729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3234621758948980729'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/11/condor-workernode-heath-script.html' title='Condor Workernode Heath Script'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-238939633774915760</id><published>2014-10-14T15:23:00.000+00:00</published><updated>2014-10-14T15:23:31.285+00:00</updated><title type='text'>Tired of full /var ?</title><content type='html'>This is how I prevent /var from getting full on any of our servers. I wrote these two scripts, spacemonc.py and spacemond.py.

spacemonc.py is a client, and it is installed on each grid system and worker node as a cronjob: 
&lt;br /&gt;
&lt;pre&gt;# crontab -l | grep spacemonc.py
50 18 * * * /root/bin/spacemonc.py
&lt;/pre&gt;
Because it&#39;s going to be an (almost) single threaded server, I use puppet to make it run at a random time on each system (I say &quot;almost&quot; because it actually uses method level locking to hold each thread in a sleep state, so it&#39;s actually a queueing server, I think; it won&#39;t drop simultaneous incoming connections, but it&#39;s unwise to allow too many of them to occur at once.)
&lt;br /&gt;
&lt;pre&gt;        cron { &quot;spacemonc&quot;:
          #ensure =&amp;gt; absent,
          command =&amp;gt; &quot;/root/bin/spacemonc.py&quot;,
          user    =&amp;gt; root,
          hour    =&amp;gt; fqdn_rand(24),
          minute  =&amp;gt; fqdn_rand(60),
        }
&lt;/pre&gt;
And it&#39;s pretty small:

&lt;br /&gt;
&lt;pre&gt;/usr/bin/python

import xmlrpclib
import os
import subprocess
from socket import gethostname

proc = subprocess.Popen([&quot;df | perl -p00e &#39;s/\n\s//g&#39; | grep -v ^cvmfs  | grep -v hepraid[0-9][0-9]*_[0-9]&quot;], stdout=subprocess.PIPE, shell=True)
(dfReport, err) = proc.communicate()

s = xmlrpclib.ServerProxy(&#39;http://SOMESERVEROROTHER.COM.ph.liv.ac.uk:8000&#39;)

status = s.post_report(gethostname(),dfReport)
if (status != 1):
  print(&quot;Client failed&quot;);
&lt;/pre&gt;

The strange piece of perl in the middle is to stop a bad habit in df of breaking lines that have long fields (I hate that; ldapsearch and qstat also do it.) I don&#39;t want to know about cvmfs partitions, nor raid storage mounts.

&lt;br /&gt;
&lt;br /&gt;

spacemond.py is installed as a service; you&#39;ll have to pinch a /etc/init.d script to start and stop it properly (or do it from the command line to start with.) And the code for spacemond.py is pretty small, too:

&lt;br /&gt;
&lt;pre&gt;#!/usr/local/bin/python2.4

import sys
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler
import time
import smtplib
import logging

if (len(sys.argv) == 2):
  limit = int(sys.argv[1])
else:
  limit = 90

# Maybe put logging in some time
logging.basicConfig(level=logging.DEBUG,
  format=&#39;%(asctime)s %(levelname)s %(message)s&#39;,
  filename=&quot;/var/log/spacemon/log&quot;,
  filemode=&#39;a&#39;)

# Email details
smtpserver = &#39;hep.ph.liv.ac.uk&#39;
recipients = [&#39;sjones@hep.ph.liv.ac.uk&#39;,&#39;sjones@hep.ph.liv.ac.uk&#39;]
sender = &#39;root@SOMESERVEROROTHER.COM.ph.liv.ac.uk&#39;
msgheader = &quot;From: root@SOMESERVEROROTHER.COM.ph.liv.ac.uk\r\nTo: YOURNAME@hep.ph.liv.ac.uk\r\nSubject: spacemon report\r\n\r\n&quot;

# Test the server started
session = smtplib.SMTP(smtpserver)
smtpresult = session.sendmail(sender, recipients, msgheader + &quot;spacemond server started\n&quot;)
session.quit()

# Restrict to a particular path.
class RequestHandler(SimpleXMLRPCRequestHandler):
  rpc_paths = (&#39;/RPC2&#39;,)

# Create server
server = SimpleXMLRPCServer((&quot;SOMESERVEROROTHER.COM&quot;, 8000), requestHandler=RequestHandler)
server.logRequests = 0
server.register_introspection_functions()

# Class with a method to process incoming reports
class SpaceMon:
  def post_report(address,hostname,report):
    full_messages = []
    full_messages[:] = []            # Always empty it

    lines = report.split(&#39;\n&#39;)
    for l in lines[1:]:
      fields = l.split()
      if (len(fields) &amp;gt;= 5):
        fs = fields[0]
        pc = fields[4][:-1]
        ipc = int(pc)
        if (ipc  &amp;gt;= limit ):
          full_messages.append(&quot;File system &quot; + fs + &quot; on &quot; + hostname + &quot; is getting full at &quot; + pc + &quot; percent.\n&quot;)
    if (len(full_messages) &amp;gt; 0):
      session = smtplib.SMTP(smtpserver)
      smtpresult = session.sendmail(sender, recipients, msgheader + (&quot;&quot;).join(full_messages))
      session.quit()
      logging.info((&quot;&quot;).join(full_messages))
    else:
      logging.info(&quot;Happy state for &quot; + hostname )
    return 1

# Register and serve
server.register_instance(SpaceMon())
server.serve_forever()
&lt;/pre&gt;
And now I get an email if any of my OS partitions is getting too full.

It&#39;s surpising how small server software can be when you use a framework like XMLRPC. In the old days, I would have needed 200 lines of parsing code and case statements. Goodbye to all that.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/238939633774915760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/238939633774915760' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/238939633774915760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/238939633774915760'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/10/tired-of-full-var.html' title='Tired of full /var ?'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-4931708846395936542</id><published>2014-07-03T17:32:00.002+00:00</published><updated>2014-07-03T17:37:59.361+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="APEL"/><category scheme="http://www.blogger.com/atom/ns#" term="EMI-3"/><category scheme="http://www.blogger.com/atom/ns#" term="installation"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><title type='text'>APEL EMI-3 upgrade</title><content type='html'>Here are some notes from Manchester upgrade to EMI-3 APEL. The new APEL is much simpler as it is a bunch of python scripts with a couple of key=value configuration files, rather than java scripts with XML files. It doesn&#39;t have YAIM to configure it but since it is much easier to install and configure it doesn&#39;t really matter anymore. As an added bonus I found that it&#39;s also much faster when it publishes and doesn&#39;t require any tedious tuning of how many records at the time to publish.&lt;br /&gt;
&lt;br /&gt;
So Manchester starting point to upgrade was&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;EMI-2 APEL node&lt;/li&gt;
&lt;li&gt;EMI-2 APEL parsers on EMI-3 cream CEs&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;We have 1 batch system per CE so I haven&#39;t tried a configuration in which there is only 1 batch system and multiple CEs &lt;/li&gt;
&lt;/ul&gt;
&lt;li&gt;In few months we may move to ARC-CE so configuration was done mostly manually &lt;/li&gt;
&lt;/ul&gt;
I didn&#39;t preserve the old local APEL database since all the records are in the central APEL one anyway.&amp;nbsp; So the steps to carrie out were the following:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Install a new EMI-3 APEL node&lt;/li&gt;
&lt;li&gt;Configure it&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Upgrade the CEs parsers to EMI-3 and point them the new node&lt;/li&gt;
&lt;li&gt;Disable the old EMI-2 APEL node and backup its DB&lt;/li&gt;
&lt;li&gt;Run the parsers and fill the new APEL node DB&lt;/li&gt;
&lt;li&gt;Publish all records for the previous month from the new APEL machine&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;b&gt;Install a new EMI-3 APEL node&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Installed a vanilla VM with&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;EMI-3 repositories&lt;/li&gt;
&lt;li&gt;Mysql DB&lt;/li&gt;
&lt;li&gt;Host certificates&lt;/li&gt;
&lt;li&gt;ca-policy-egi-core &lt;/li&gt;
&lt;/ul&gt;
I did this with puppet since all the bits and pieces were already there for other type of services I just put together the profile for this machine. Then manually I&#39;ve installed the rpms for APEL&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;yum install --nogpg emi-release&lt;/li&gt;
&lt;li&gt;yum install apel-ssm apel-client apel-lib&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;b&gt;Configure EMI-3 APEL node&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I followed the instructions on the &lt;a href=&quot;https://twiki.cern.ch/twiki/pub/EMI/EMI3APELClient/APEL_Publisher_System_Administrator_Guide.pdf&quot;&gt;official EMI-3 APEL server guide&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
There are no tips here I&#39;ve only changed the obvious fields Like site_name and password plus few others like the top BDII because we have a local one and the location of the hostcertificate because we have a different name.&lt;br /&gt;
&lt;br /&gt;
I didn&#39;t install install the publisher cron job at this stage because the machine was not ready yet to publish &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Upgrade the CEs parsers to EMI-3 and point them the new node&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The CEs as I said are already on EMI-3, only the APEL parsers were still EMI-2 so I disabled the EMI-2 cron job&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;rm /etc/cron.d/glite-apel-pbs-parser&amp;nbsp;&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
Installed the EMI-3 APEL&amp;nbsp; parsers rpm&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;yum install apel-parser&lt;/li&gt;
&lt;/ul&gt;
Configured the parsers following the instructions on the &lt;a href=&quot;https://twiki.cern.ch/twiki/pub/EMI/EMI3APELClient/APEL_Parsers_System_Administrator_Guide.pdf&quot;&gt;official EMI-3 APEL parser guide&lt;/a&gt; setting the obvious parameters and installing also the cron job after a trial parsing test.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;NOTE:&lt;/b&gt; the parser configuration file for me is a bit confusing regarding the batch system name it states&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: x-small;&quot;&gt;&lt;i&gt;# Batch system hostname.&amp;nbsp; This does not need to be a definitive hostname,&lt;br /&gt;# but it should uniquely identify the batch system.&lt;br /&gt;# Example: pbs.gridpp.rl.ac.uk&lt;br /&gt;lrms_server = &lt;/i&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
It seems you can use any name. You are of course better off using your batch system server name. We have one for each CE so the configuration file on each contains that. In the database this will identify the records from each machine CE. I&#39;m not sure about what happens with 1 batch system and several CEs. Following literally one should put only the batch system but then there is no distinction between CEs.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Disable the old EMI-2 APEL node and backup its DB&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Just removed the old cron job the machine is still running but it isn&#39;t doing anything while waiting to be decomissioned.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Run the parsers and fill the new APEL node DB&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
You will need to publish an entire month prior to when you are 
installing. For example for us it was publish all the June records, but 
since I didn&#39;t want to republish everything we had in the log files I 
moved the batch system and blah log files prior to mid May to a backup 
subdirectory and parsed only the log files for end of May June. May days
 were needed because some jobs that finished in June early days had 
started in May and one wants the complete record. The first jobs to 
finish in June in Manchester started on the 25th of May so you may want 
to go back a bit with the parsing.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Publish all records for the previous month from the new APEL machine&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Finally on the new machine now filled with the June records plus some May I&#39;ve done a bit of DB clean up as suggested by the APEL team. If you don&#39;t do this step the APEL team will do it centrally before stitching the old EMI-2 record and the new ones&lt;span style=&quot;color: #1f497d; font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11.0pt;&quot;&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Delete from JobRecords where EndTime&amp;lt;&quot;2014-06-01&quot;;&lt;/li&gt;
&lt;li&gt;Delete from SuperSummaries where Month=&quot;5&quot;; &lt;/li&gt;
&lt;/ul&gt;
After all this I modified the configuration file (/etc/apel/client.cfg) to publish a gap from the 25th of May until the day before I published i.e. 1st of July. I then modified again to put back &quot;latest&quot;. I finally installed the cron job also on the new APEL to publish regularly every day.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/4931708846395936542/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/4931708846395936542' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/4931708846395936542'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/4931708846395936542'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/07/apel-emi-3-upgrade.html' title='APEL EMI-3 upgrade'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-6781479077224692006</id><published>2014-05-07T15:03:00.001+00:00</published><updated>2014-05-08T09:29:02.710+00:00</updated><title type='text'>Planning for SHA-2</title><content type='html'>&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;h2&gt;
Timeline &lt;/h2&gt;
&lt;br /&gt;
The voms servers at CERN will be transferred to new hosts that use the newer SHA-2 certificate standard. The changes are described in this post:&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;https://operations-portal.egi.eu/broadcast/archive/id/1102&quot;&gt;CERN VOMS service will move to new hosts&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
The picture below lays out the timeline for the change.&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlShZAIw3_9J1Vgp_5ILTS0UxW4NLNCQHW9A8J-sfGf7kH9ZGTPLM5Dm6f_MeoW79pyqFUUTI4vx9WTZz5DS4q5QnUM3vrwJhsBE1Naju1f_Ryi-NuKXwzw5aHtebvsof2GgLE4iHEsFM/s1600/Timeline2b.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlShZAIw3_9J1Vgp_5ILTS0UxW4NLNCQHW9A8J-sfGf7kH9ZGTPLM5Dm6f_MeoW79pyqFUUTI4vx9WTZz5DS4q5QnUM3vrwJhsBE1Naju1f_Ryi-NuKXwzw5aHtebvsof2GgLE4iHEsFM/s1600/Timeline2b.png&quot; height=&quot;171&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Timeline for Cern Voms Server Changes&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
The picture shows no change to the BNL server, vo.racf.bnl.gov, as none has been announced AFAIK. The changes will be to those servers with the cern.ch domain name.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
New VOMS Server Hosts&lt;/h2&gt;
&lt;br /&gt;
The VOs associated with these changes are alice, atlas, cms, lhcb and ops. Sites supporting any of those will have to make a plan to update. &lt;br /&gt;
&lt;br /&gt;
The new hosts have been set up already and entered against the related VOs in the ops portal.&amp;nbsp; The&amp;nbsp; table below summarises the current set up (ignoring&amp;nbsp; vo.racf.bnl.gov) as advertised in the operations portal (as of 7th May 2014).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table border=&quot;1&quot; style=&quot;width: 300px;&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;th&gt;VO&lt;/th&gt;
  &lt;th&gt;Vomses Port&lt;/th&gt;
  &lt;th&gt;Old Server&lt;/th&gt;
  &lt;th&gt;Is admin?&lt;/th&gt;
  &lt;th&gt;New Server&lt;/th&gt;
  &lt;th&gt;IsAdmin?&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;atlas&lt;/td&gt;&lt;td&gt;15001&lt;/td&gt;&lt;td&gt;lcg-voms.cern.ch&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;lcg-voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;atlas&lt;/td&gt;&lt;td&gt;15001&lt;/td&gt;&lt;td&gt;voms.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;alice&lt;/td&gt;&lt;td&gt;15000&lt;/td&gt;&lt;td&gt;lcg-voms.cern.ch&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;lcg-voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;alice&lt;/td&gt;&lt;td&gt;15000&lt;/td&gt;&lt;td&gt;voms.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cms&lt;/td&gt;&lt;td&gt;15002&lt;/td&gt;&lt;td&gt;lcg-voms.cern.ch&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;lcg-voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cms&lt;/td&gt;&lt;td&gt;15002&lt;/td&gt;&lt;td&gt;voms.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lhcb&lt;/td&gt;&lt;td&gt;15003&lt;/td&gt;&lt;td&gt;lcg-voms.cern.ch&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;lcg-voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lhcb&lt;/td&gt;&lt;td&gt;15003&lt;/td&gt;&lt;td&gt;voms.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ops&lt;/td&gt;&lt;td&gt;15009&lt;/td&gt;&lt;td&gt;lcg-voms.cern.ch&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;lcg-voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ops&lt;/td&gt;&lt;td&gt;15009&lt;/td&gt;&lt;td&gt;voms.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;voms2.cern.ch&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;br /&gt;
&lt;i&gt;Notes: The IsAdmin flag tells whether the server could be used to download used to create the DN grid-map file. The port numbers are unaffected by the change.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
VOMS Server RPMS&lt;/h2&gt;
As described in the announcement (see link at the top), 
a set of rpms have been created, one per WLCG-related VO:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;wlcg-voms-alice&lt;/li&gt;
&lt;li&gt;wlcg-voms-atlas&lt;/li&gt;
&lt;li&gt;
    wlcg-voms-cms&lt;/li&gt;
&lt;li&gt;
    wlcg-voms-lhcb&lt;/li&gt;
&lt;li&gt;
    wlcg-voms-ops&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
The rpms are hosted in the yum repository &lt;a href=&quot;http://linuxsoft.cern.ch/wlcg/&quot;&gt;WLCG repository&lt;/a&gt;. To install, e.g.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;$ cd /etc/yum.repos.d/&lt;br /&gt;$ wget http://linuxsoft.cern.ch/wlcg/wlcg-sl6.repo&lt;/span&gt;&lt;br /&gt;
&lt;h2&gt;
Local Measures at Liverpool&lt;/h2&gt;
At Liverpool, the configuration of the following servers will need to be changed:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Argus&lt;/li&gt;
&lt;li&gt;Cream CE&lt;/li&gt;
&lt;li&gt;DPM SE&lt;/li&gt;
&lt;li&gt;WN and &lt;br /&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;UI (eventually)&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
There will be a gap of some weeks (see the picture) between the deadline for sites to update their services which consume certificates&amp;nbsp; (e.g. Argus, Cream CE, DPM SE, and WN etc.) and the deadline for sites to update their&amp;nbsp; UIs. This is to prevent the use&amp;nbsp; of new-style certificates that cannot be interpreted.&lt;br /&gt;
&lt;br /&gt;
So, to effect this change, Liverpool will apply the RPMS on our consuming service nodes in early May. As soon as the all-sites deadline has passed (2nd June) Liverpool will update its UIs in a similar manner.&lt;br /&gt;
&lt;br /&gt;
If all goes well, Liverpool will remove reference to the old servers after the final deadline, 1st July. The plan in this case is to effect the change using the traditional yaim/site-info.def/vo.d method as these changes will need to be permanently maintained.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
Effects on Approved VOs, VomsSnooper etc.&lt;/h2&gt;
For tracking proposes, the &lt;a href=&quot;https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs&quot;&gt;GridPP Approved VOs&lt;/a&gt; document will attempt to remain synchronised with the operations portal, but the VomsSnooper process is asynchronous so there may be discrepancies around the deadlines. Sites are advised to watch out for these race conditions.&lt;br /&gt;
&lt;br /&gt;
Note: while the servers are being changed (i.e from now until 2nd June for certificate consuming services, and from 2nd June to 1 July (for consuming producing services, e.g. UIs) there can no canonical form of the VOMS records because different sites have their own implementation schedule and may use different settings temporarily, as described in my post above.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/6781479077224692006/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/6781479077224692006' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/6781479077224692006'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/6781479077224692006'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/05/planning-for-sha-2-voms-servers-at-cern.html' title='Planning for SHA-2'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlShZAIw3_9J1Vgp_5ILTS0UxW4NLNCQHW9A8J-sfGf7kH9ZGTPLM5Dm6f_MeoW79pyqFUUTI4vx9WTZz5DS4q5QnUM3vrwJhsBE1Naju1f_Ryi-NuKXwzw5aHtebvsof2GgLE4iHEsFM/s72-c/Timeline2b.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-1197723090009274948</id><published>2014-04-28T11:11:00.002+00:00</published><updated>2014-05-08T10:33:01.019+00:00</updated><title type='text'>Snakey - a mindless way to reboot the cluster</title><content type='html'>&lt;style type=&quot;text/css&quot;&gt;P { margin-bottom: 0.08in; }&lt;/style&gt;


&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I&#39;m fed up with all the book-keeping when I need to reboot or
rebuild our cluster. 
&lt;br /&gt;
&lt;br /&gt;
First I need to set a subset of nodes offline. Then I have to
monitor them until some are drained. Then, as soon as any is drained,
I have to reboot it by hand, then wait for it to build, then test it
and finally put it back online, Then I choose another set (maybe a
rack) and go through the same thing over and over until the cluster
is done.&lt;br /&gt;
&lt;br /&gt;
So, to cut all that, I&#39;ve written a pair of perl scripts, called
snakey.pl and post_snakey.pl. I run each (at the same time) in a terminal and they do
all that work for me, so I can do other things, like Blog Posts. Start snakey.pl first.&lt;br /&gt;
&lt;br /&gt;
Note: all this assumes the use of the test nodes suite written by
Rob Fay, at Liverpool.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Part 1 – Snakey&lt;/b&gt;
&lt;br /&gt;
&lt;br /&gt;
This perl script, called snakey.pl,
reads a large list, and puts a selection offline with testnodes. It
drains them, and reboots them once drained. For each one that gets
booted, another from the list is offlined. In this way, it &quot;snakes&quot;
through the selected part of the cluster. Our standard
buildtools+puppet+yaim system takes care of the provisioning. 
&lt;br /&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;b&gt;Part 2 – Post Snakey&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Another script, post_snakey.pl, tells
if the nodes have been rebooted by snakey, and if they pass the
testnodes test script. Any that do are put back on , so they come
online. The scripts have some safety locks to stop havoc breaking
out. They usually just stop if anything weird is seen. &lt;br /&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;br /&gt;
&lt;b&gt;Part
3 – Source Code&lt;/b&gt;&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
You&#39;ve seen all the nice blurb, so
here&#39;s the source code. I&#39;ve had to fix it up because HTML knackers the &quot;&amp;lt;&quot;, &quot;&amp;gt;&quot; and &quot;&amp;amp;&quot; chars - I hope I haven&#39;t broken it.&lt;br /&gt;
&lt;br /&gt;
Note: not the cleanest code I&#39;ve ever written, but it gets the job done.&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
----- snakey.pl ----------------------&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;span style=&quot;font-size: x-small;&quot;&gt;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;#!/usr/bin/perl&lt;br /&gt;&lt;br /&gt;use strict;&lt;br /&gt;use Fcntl &#39;:flock&#39;; &lt;br /&gt;use Getopt::Long;&lt;br /&gt;&lt;br /&gt;sub initParams();&lt;br /&gt;&lt;br /&gt;my %parameter;&lt;br /&gt;&lt;br /&gt;initParams();&lt;br /&gt;&lt;br /&gt;my @nodesToDo;&lt;br /&gt;&lt;br /&gt;open(NODES,&quot;$parameter{&#39;NODES&#39;}&quot;) or die(&quot;Cannot open file of nodes to reboot, $!\n&quot;);&lt;br /&gt;while(&lt;nodes&gt;) {&lt;br /&gt;&amp;nbsp; chomp($_);&lt;br /&gt;&amp;nbsp; push(@nodesToDo,$_); &lt;br /&gt;}&lt;br /&gt;close(NODES);&lt;br /&gt;&lt;br /&gt;checkOk(@nodesToDo);&lt;br /&gt;&lt;br /&gt;my @selection = selectSome($parameter{&#39;SLICE&#39;}); &lt;br /&gt;foreach my $n(@selection) { &lt;br /&gt;&amp;nbsp; print &quot;Putting $n offline\n&quot;; &lt;br /&gt;&amp;nbsp; putOffline($n);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;while( $#selection &amp;gt; -1) {&lt;br /&gt;&lt;br /&gt;&amp;nbsp; my $drainedNode = &#39;&#39;;&lt;br /&gt;&amp;nbsp; while($drainedNode eq &#39;&#39;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sleep( 600 );&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $drainedNode = checkIfOneHasDrained(@selection);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; &lt;br /&gt;&amp;nbsp; @selection = remove($drainedNode,@selection);&lt;br /&gt;&lt;br /&gt;&amp;nbsp; print(&quot;Rebooting $drainedNode\n&quot;);&lt;br /&gt;&amp;nbsp; my $status = rebootNode($drainedNode);&lt;br /&gt;&amp;nbsp; print(&quot;status -- $status\n&quot;);&lt;br /&gt;&lt;br /&gt;&amp;nbsp; my @nextOne = selectSome(1);&lt;br /&gt;&amp;nbsp; if ($#nextOne == 0) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $nextOne = $nextOne[0];&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print &quot;Putting $nextOne offline\n&quot;; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; putOffline($nextOne);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; push(@selection,$nextOne);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub putOffline() {&lt;br /&gt;&amp;nbsp; my $node = shift();&lt;br /&gt;&amp;nbsp; open(TN,&quot;/root/scripts/testnodes-exemptions.txt&quot;) or die(&quot;Could not open testnodes.exemptions.txt, $!\n&quot;);&lt;br /&gt;&amp;nbsp; while(&lt;tn&gt;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $l = $_;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; chomp($l);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $l =~ s/#.*//;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $l =~ s/\s*//g;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($node =~ /^$l$/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print (&quot;Node $node is already in testnodes-exemptions.txt\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; close(TN);&lt;br /&gt;&amp;nbsp; open(TN,&quot;&amp;gt;&amp;gt;/root/scripts/testnodes-exemptions.txt&quot;) or die(&quot;Could not open testnodes.exemptions.txt, $!\n&quot;);&lt;br /&gt;&amp;nbsp; flock(TN, LOCK_EX) or die &quot;Could not lock /root/scripts/testnodes-exemptions.txt, $!&quot;;&lt;br /&gt;&amp;nbsp; print (TN &quot;$node # snakey.pl put this offline &quot; . time() . &quot;\n&quot;);&lt;br /&gt;&amp;nbsp; close(TN) or die &quot;Could not write /root/scripts/testnodes-exemptions.txt, $!&quot;;&lt;br /&gt;}&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub remove() {&lt;br /&gt;&amp;nbsp; my $drained = shift();&lt;br /&gt;&amp;nbsp; my @poolOfNodes = @_;&lt;br /&gt;&lt;br /&gt;&amp;nbsp; my @newSelection = ();&lt;br /&gt;&amp;nbsp; foreach my $n (@poolOfNodes) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($n !~ /$drained/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; push(@newSelection,$n);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; die(&quot;None removed\n&quot;) unless($#newSelection == ($#poolOfNodes -1));&lt;br /&gt;&amp;nbsp; return @newSelection;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub checkIfOneHasDrained(@) {&lt;br /&gt;&amp;nbsp; my @nodesToCheck = @_; &lt;br /&gt;&amp;nbsp; foreach my $n (@nodesToCheck) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $hadReport = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $state = &quot;&quot;;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $jobCount = 0;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; open(PBSNODES,&quot;pbsnodes $n|&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while(&lt;pbsnodes&gt;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $l = $_;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chomp($l);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($l =~ /state = (.*)/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $state = $1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $hadReport = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/jobs = (.*)/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $jobs = $1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my @jobs = split(/,/,$jobs); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $jobCount = $#jobs + 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(PBSNODES);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;Result of check on $n: hadReport - $hadReport, state - $state, jobCount - $jobCount\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (($hadReport) &amp;amp;&amp;amp; ($state eq &#39;offline&#39;) &amp;amp;&amp;amp; ($jobCount ==0)) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return $n;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; return &quot;&quot;;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub selectSome($) {&lt;br /&gt;&amp;nbsp; my $max = shift;&lt;br /&gt;&amp;nbsp; my @some = (); &lt;br /&gt;&amp;nbsp; for (my $ii = 0; $ii &amp;lt; $max; $ii++) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (defined($nodesToDo[0])) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; push(@some,shift(@nodesToDo));&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } &lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; return @some;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub checkOk(){&lt;br /&gt;&amp;nbsp; my @nodes = @_;&lt;br /&gt;&amp;nbsp; &lt;br /&gt;&amp;nbsp; foreach my $n (@nodes) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $actualNode = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $state&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = &quot;&quot;;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; open(PBSNODES,&quot;pbsnodes $n|&quot;) or die(&quot;Could not run pbsnodes, $!\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while(&lt;pbsnodes&gt;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/state = (.*)/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $state = $1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $actualNode = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(PBSNODES);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (! $actualNode) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; die(&quot;Node $n was not an actual one!\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($state =~ /offline/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; die (&quot;Node $n was already offline!\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; return @nodes;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub initParams() {&lt;br /&gt;&lt;br /&gt;&amp;nbsp; GetOptions (&#39;h|help&#39;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; =&amp;gt; &amp;nbsp; \$parameter{&#39;HELP&#39;},&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &#39;n:s&#39;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; =&amp;gt; &amp;nbsp; \$parameter{&#39;NODES&#39;} ,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &#39;s:i&#39;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; =&amp;gt; &amp;nbsp; \$parameter{&#39;SLICE&#39;} ,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; );&lt;br /&gt;&lt;br /&gt;&amp;nbsp; if (defined($parameter{&#39;HELP&#39;})) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print &amp;lt;&lt;text br=&quot;&quot;&gt;&lt;br /&gt;Abstract: A tool to drain and boot a bunch of nodes&lt;br /&gt;&lt;br /&gt;&amp;nbsp; -h&amp;nbsp; --help&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Prints this help page&lt;br /&gt;&amp;nbsp; -n&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; nodes&amp;nbsp;&amp;nbsp;&amp;nbsp; File of nodes to boot&lt;br /&gt;&amp;nbsp; -s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; slice&amp;nbsp;&amp;nbsp;&amp;nbsp; Size of slice to offline at once&lt;br /&gt;&lt;br /&gt;TEXT&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(0);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp; if (!defined($parameter{&#39;SLICE&#39;})) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $parameter{&#39;SLICE&#39;} = 5;&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp; if (!defined($parameter{&#39;NODES&#39;})) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; die(&quot;Please give a file of nodes to reboot\n&quot;);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp; if (! -s&amp;nbsp; $parameter{&#39;NODES&#39;} ) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; die(&quot;Please give a real file of nodes to reboot\n&quot;);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub rebootNode($) {&lt;br /&gt;&amp;nbsp; my $nodeToBoot = shift();&lt;br /&gt;&amp;nbsp; my $nodeToCheck = $nodeToBoot;&lt;br /&gt;&amp;nbsp; my $pbsnodesWorked = 0;&lt;br /&gt;&amp;nbsp; my $hasJobs&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = 0;&lt;br /&gt;&amp;nbsp; open(PBSNODES,&quot;pbsnodes $nodeToCheck|&quot;);&lt;br /&gt;&amp;nbsp; while(&lt;pbsnodes&gt;)&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/state =/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $pbsnodesWorked = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/^\s*jobs = /) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $hasJobs = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; close(PBSNODES);&lt;br /&gt;&amp;nbsp; if (! $pbsnodesWorked) { return 0; }&lt;br /&gt;&amp;nbsp; if (&amp;nbsp; $hasJobs&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) { return 0; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp; open(REBOOT,&quot;ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=10 $nodeToBoot reboot|&quot;);&lt;br /&gt;&amp;nbsp; while(&lt;reboot&gt;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print;&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; return 1;&lt;br /&gt;}&lt;/reboot&gt;&lt;/pbsnodes&gt;&lt;/text&gt;&lt;/pbsnodes&gt;&lt;/pbsnodes&gt;&lt;/tn&gt;&lt;/nodes&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
----- post-snakey.pl ----------------------&lt;/div&gt;
&lt;div style=&quot;margin-bottom: 0in;&quot;&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: x-small;&quot;&gt;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;#!/usr/bin/perl&lt;br /&gt;&lt;br /&gt;use strict;&lt;br /&gt;use Fcntl &#39;:flock&#39;; &lt;br /&gt;use Getopt::Long;&lt;br /&gt;&lt;br /&gt;my %offlineTimes;&lt;br /&gt;&lt;br /&gt;while ( 1 ) {&lt;br /&gt;&amp;nbsp; %offlineTimes = getOfflineTimes();&lt;br /&gt;&amp;nbsp; my @a=keys(%offlineTimes);&lt;br /&gt;&amp;nbsp; my $count = $#a;&lt;br /&gt;&lt;br /&gt;&amp;nbsp; if ($count == -1 ) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;No work to do\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(0);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; &lt;br /&gt;&amp;nbsp; foreach my $n (keys(%offlineTimes)) {&lt;br /&gt;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $uptime = -1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; open(B,&quot;ssh -o ConnectTimeout=2 -o BatchMode=yes $n cat /proc/uptime 2&amp;gt;&amp;amp;1|&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while() {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/([0-9\.]+)\s+[0-9\.]+/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $uptime = $1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(B);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($uptime == -1) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;Refusing to remove $n because it may not have been rebooted\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $offlineTime = $offlineTimes{$n};&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $timeNow = time();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($timeNow - $uptime &amp;lt;= $offlineTime ) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;Refusing to remove $n. &quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf(&quot;Last reboot - %6.3f&amp;nbsp; days ago. &quot;, $uptime / 24 / 60 /60);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf(&quot;Offlined&amp;nbsp;&amp;nbsp;&amp;nbsp; - %6.3f&amp;nbsp; days ago.\n&quot;, ($timeNow - $offlineTime)&amp;nbsp; / 24 / 60 /60);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;$n has been rebooted\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; open(B,&quot;ssh -o ConnectTimeout=2 -o BatchMode=yes $n ./testnode.sh|&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while() { }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(B);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my $status = $? &amp;gt;&amp;gt; 8;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ($status == 0) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;$n passes testnode.sh; will remove from exemptions\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; removeFromExemptions($n); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(&quot;$n is not passing testnode.sh - $status\n&quot;);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; sleep 567;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub getOfflineTimes() {&lt;br /&gt;&amp;nbsp; my %offlineTimes = ();&lt;br /&gt;&amp;nbsp; open(TN,&quot;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;span style=&quot;font-size: x-small;&quot;&gt;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;&amp;nbsp; while(&lt;tn&gt;) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (/(\S+)\s+\# snakey.pl put this offline (\d+)/) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $offlineTimes{$1} = $2;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; close(TN);&lt;br /&gt;&amp;nbsp; return %offlineTimes;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#-----------------------------------------&lt;br /&gt;sub removeFromExemptions($) {&lt;br /&gt;&lt;br /&gt;&amp;nbsp; my $node = shift();&lt;br /&gt;&lt;br /&gt;&amp;nbsp; open(TN,&quot;&lt;/tn&gt;&amp;nbsp; my @lines = &lt;tn&gt;; &lt;br /&gt;&amp;nbsp; close( TN ); &lt;br /&gt;&amp;nbsp; open(TN,&quot;&amp;gt;/root/scripts/testnodes-exemptions.txt&quot;) or die(&quot;Could not open testnodes.exemptions.txt, $!\n&quot;);&lt;br /&gt;&amp;nbsp; flock(TN, LOCK_EX) or die &quot;Could not lock /root/scripts/testnodes-exemptions.txt, $!&quot;;&lt;br /&gt;&amp;nbsp; foreach my $line ( @lines ) { &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print TN $line unless ( $line =~ m/$node/ ); &lt;br /&gt;&amp;nbsp; } &lt;br /&gt;&amp;nbsp; close(TN) or die &quot;Could not write /root/scripts/testnodes-exemptions.txt, $!&quot;;&lt;br /&gt;}&lt;/tn&gt;&lt;/span&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/1197723090009274948/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/1197723090009274948' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1197723090009274948'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1197723090009274948'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/04/snakey-mindless-way-to-reboot-cluster.html' title='Snakey - a mindless way to reboot the cluster'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-5589472405016627508</id><published>2014-04-15T09:40:00.001+00:00</published><updated>2014-04-15T09:40:29.285+00:00</updated><title type='text'>Kernel Problems at Liverpool</title><content type='html'>&lt;h3&gt;
&lt;b&gt;Introduction&lt;/b&gt;&lt;/h3&gt;
Liverpool recently updated its cluster to SL6. In doing so, a problem occurred whereby the kernel would experience lockups during normal operations. The signs are 
unresponsiveness, drop-outs in Ganglia and (later) many &quot;task...blocked&amp;nbsp; for 120 seconds&quot; msgs in 
/var/log/m.. and dmesg. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;/h3&gt;
&lt;h3&gt;
&lt;/h3&gt;
&lt;h3&gt;
&lt;b&gt;Description&lt;/b&gt;&lt;/h3&gt;
Kernels in the range 2.6.32-431* exhibited a type of deadlock when run on certain hardware with BIOS dated after 8th March 2010.&lt;br /&gt;
&lt;br /&gt;
This problem occured on Supermicto hardware, main boards:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;X8DTT-H &lt;/li&gt;
&lt;li&gt;X9DRT&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Notes: &lt;br /&gt;
&lt;br /&gt;
1) No hardware with BIOS dated 8th March 2010 or before showed this defect, even on the same board type.&lt;br /&gt;
&lt;br /&gt;
2) The oldest kernel of the 2.6.32-358 range is solid. This is corroborated by operational experience with the 358 range.&lt;br /&gt;
&lt;br /&gt;
3) All current kernels in the 2.6.32-431 range exhibited the problem on our newest hardware, and a few nodes of the older hardware that had had unusual BIOS updates.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;Testing&lt;/b&gt;&lt;/h3&gt;
The lock-ups 
are hard to reproduce, but after a great deal of trail and error,&amp;nbsp; a ~ 90% effective predictor was found.
&lt;br /&gt;
&lt;br /&gt;
The procedure is to:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Build the system completely new in the usual way 
and&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;When yaim gets to &quot;config_user&quot;, use a script (stress.sh) to run 36 threads 
of gzip and one of iozone.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
On a susceptible node, this is reasonably 
certain to make it lock up after a minute. The signs are 
unresponsiveness and (later) &quot;task...blocked&amp;nbsp; for 120 seconds&quot; msgs in 
/var/log/m.. and dmesg.&lt;br /&gt;
&lt;br /&gt;
I&amp;nbsp; observed that if the procedure is not followed &quot;exactly&quot;, it is 
unreliable as a predictor. In particular, if you stop Yaim and try 
again, the predictor is useless.&lt;br /&gt;
&lt;br /&gt;
To test that, I isolated the 
config_users script from Yaim, and ran it separately along with the 
stress.sh script. Result: useless - no lock-ups were seen.&lt;br /&gt;
&lt;br /&gt;
Note: This result was rather unexpected because the isolated 
config_users.sh script works in the same way as the original.
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;Unsuccessful Theories&lt;/b&gt;&lt;/h3&gt;
A great many theories were tested and rejected or not pursued further (APIC problems, disk problems, BIOS differences,various kernels, examination of kernel logs, much googling etc. etc.) Eventually, a seemingly successful theory was stumbled upon which I describe below.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;The Successful Theory&lt;/b&gt;&lt;/h3&gt;
All our nodes had unusual vm settings:&lt;br /&gt;
&lt;br /&gt;
# grep dirty /etc/sysctl.conf
&lt;br /&gt;
vm.dirty_background_ratio = 100
&lt;br /&gt;
vm.dirty_expire_centisecs = 1800000
&lt;br /&gt;
vm.dirty_ratio = 100
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These custom settings facilitate the storage of atlas &quot;short files&quot; in 
RAM. Basically, they force files to remain off disk for a long time, allowing very fast access.&lt;br /&gt;
&lt;br /&gt;
The modification had been tested almost exhaustively  for several years on earlier
 kernels - but perhaps some change (or latent bug?) in the kernel had 
invalidated them somehow.&lt;br /&gt;
&lt;br /&gt;
We came up with the idea that the issue 
originates in the memory operations that occur prior to 
Yaim/config_users. This would explain why anything but the exact 
activity created by the procedure might well not trigger the defect. We thought this could&amp;nbsp; tally with the idea of the ATLAS &quot;short 
file&quot; modifications in sysctl.conf. The theory is that these mods 
set up the problem during the memory/read/write operations (i.e. the 
asynchronous OS loading and flushing of the page cache).
&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;To test this, I used the predictor on susceptible nodes , but without applying the ATLAS &quot;short file&quot; patch.&amp;nbsp; Default vm settings were adopted instead.&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;Result&lt;/b&gt;&lt;/h3&gt;
Very satisfying at last - absolutely no sign on the defect. As the ATLAS &quot;short file&quot; patch is not very beneficial given the current data traffic, we have decided to go back to default &quot;vm.dirty&quot; settings and monitor the situation carefully.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/5589472405016627508/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/5589472405016627508' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/5589472405016627508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/5589472405016627508'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/04/kernel-problems-at-liverpool.html' title='Kernel Problems at Liverpool'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-1170179546760843263</id><published>2014-02-26T15:44:00.001+00:00</published><updated>2014-02-26T15:44:08.805+00:00</updated><title type='text'>Central Argus Banning at Liverpool</title><content type='html'>&lt;h3&gt;
&lt;b&gt;Introduction&lt;/b&gt;&lt;/h3&gt;
Liverpool uses an ARGUS server, hepgrid9.ph.liv.ac.uk,&amp;nbsp; for user authentication from the CEs and WNs. A requirement came down from above to implement central banning and this is how we went about it. Most of this came from Ewan&#39;s TB_SUPPORT email (title: NGI Argus requests for NGI_UK) and from this description here:&amp;nbsp; &lt;br /&gt;
&lt;pre wrap=&quot;&quot;&gt;&amp;nbsp;&lt;/pre&gt;
&lt;pre wrap=&quot;&quot;&gt;&lt;a class=&quot;moz-txt-link-freetext&quot; href=&quot;http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview&quot;&gt;http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview&lt;/a&gt; &lt;/pre&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;Central Banning Architecture&lt;/b&gt;&lt;/h3&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZlkIercHspRUQ1az279Zv4YWkPLziW4J37uLx_L3_GCyTwPX0eACZQQ2mdXFL57aAunVF3loaadAVrOZ0F_vKRHMkZ27vA1kNewY8x-ar-jNu4riwlLgfQgdnrJNK2h-WsPcxmKYXMRA/s1600/ARGUS.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZlkIercHspRUQ1az279Zv4YWkPLziW4J37uLx_L3_GCyTwPX0eACZQQ2mdXFL57aAunVF3loaadAVrOZ0F_vKRHMkZ27vA1kNewY8x-ar-jNu4riwlLgfQgdnrJNK2h-WsPcxmKYXMRA/s1600/ARGUS.png&quot; height=&quot;320&quot; width=&quot;164&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
The ban policies flow from the central WLCG server through the NGI one and down to the site. This is a feature of ARGUS.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;Setup at Liverpool&lt;/b&gt;&lt;/h3&gt;
&lt;br /&gt;
When we build (or change) our ARGUS server, we use a script (argus.pol.sh) to load our argus policies from a file (argus.pol). The script looks like this now we&#39;ve added central banning:&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: x-small;&quot;&gt;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;#!/bin/bash&lt;br /&gt;/usr/bin/pap-admin rap&lt;br /&gt;/usr/bin/pap-admin apf /root/scripts/argus.pol&lt;br /&gt;&lt;br /&gt;pap-admin add-pap ngi argusngi.gridpp.rl.ac.uk &quot;/C=UK/O=eScience/OU=CLRC/L=RAL/CN=argusngi.gridpp.rl.ac.uk&quot;&lt;br /&gt;pap-admin enable-pap ngi&lt;br /&gt;pap-admin set-paps-order ngi default&lt;br /&gt;pap-admin set-polling-interval 3600&lt;br /&gt;&lt;br /&gt;/etc/init.d/argus-pdp reloadpolicy&lt;br /&gt;/etc/init.d/argus-pepd clearcache&lt;br /&gt;touch /root/scripts/done_argus.pol.sh&lt;br /&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
The first few lines just load our standard site policies. The last bit flushes some buffers. The middle bit is the part you need.&lt;br /&gt;
&lt;br /&gt;
Basically, it adds polices from the NGI ARGUS server. We&#39;ve also reduced the polling interval. When you run the script, you&#39;ll connect the local ARGUS server to the NGI one and periodically download the remote (central) banning policies. &lt;br /&gt;
&lt;br /&gt;
Note: Ewan thinks the caching delay is too much - it was 4 hours. So we changed /etc/argus/pdp/pdp.ini, setting &quot;retentionInterval = 21&quot;, i.e. 21 minutes.&lt;br /&gt;
&lt;br /&gt;
After running the script, it&#39;s best to restart the Java daemons.&lt;br /&gt;
&lt;h3&gt;
Testing&lt;/h3&gt;
It&#39;s best to tell Ewan and Orlin about this as they can send tests over. To check if your site &quot;looks&quot; OK, try this:&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;pap-admin lp --all&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
And you should see the &quot;remote&quot; policies, e.g.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;ngi (argusngi.gridpp.rl.ac.uk:8150):&lt;br /&gt;&lt;br /&gt;resource &quot;.*&quot; BLAH BLAH BLAH&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/1170179546760843263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/1170179546760843263' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1170179546760843263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1170179546760843263'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2014/02/central-argus-banning-at-liverpool.html' title='Central Argus Banning at Liverpool'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/01633352566579646751</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZlkIercHspRUQ1az279Zv4YWkPLziW4J37uLx_L3_GCyTwPX0eACZQQ2mdXFL57aAunVF3loaadAVrOZ0F_vKRHMkZ27vA1kNewY8x-ar-jNu4riwlLgfQgdnrJNK2h-WsPcxmKYXMRA/s72-c/ARGUS.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-6700788331250996218</id><published>2013-05-03T14:38:00.003+00:00</published><updated>2013-05-03T14:38:28.091+00:00</updated><title type='text'>A thing of beauty Digital R81</title><content type='html'>We needed to make space for a couple of new racks and decided to get rid of our &#39;workbench&#39;. The workbench consisted of an old Digital VAX system which was mostly stripped of it&#39;s innards to leave a few sturdy steel frames. Here are some nostalgic pics of the last remaining unit being gutted. Note the gorgeous circuit boards so easily accessible. And yes, that last one is a 500MB harddrive (so I&#39;m told). The motor seems more suited to a washing machine. They don&#39;t make &#39;em like they used to.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;

&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiENWJ08E_j4q3H3vsZscapIalHZnm3WRPCFaV5pON45EFMXLcyCJiaSbAIhRkhfj_lBposFPQLu5fETu0Co8-cJWXLPwLdds9OW5LJQJ-n4T44-OmtGgDaUcUlBEYfMtWuVWCxO7fjRJc/s1600/IMG_3614.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiENWJ08E_j4q3H3vsZscapIalHZnm3WRPCFaV5pON45EFMXLcyCJiaSbAIhRkhfj_lBposFPQLu5fETu0Co8-cJWXLPwLdds9OW5LJQJ-n4T44-OmtGgDaUcUlBEYfMtWuVWCxO7fjRJc/s320/IMG_3614.jpg&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6FPXIBowdYoD04Dw-9i9s_WBQwmSrknXFB8qPEQGh43yAmKVZS8oG5UwZ3QJ1UvuKW0P_alGli_GSZ0Lg7zK0qBJKdapP8WYu0s3P3vGxUZ_lrn5wzs7WnbqQuulGHTMBmx0q4FgKa8/s1600/IMG_3615.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6FPXIBowdYoD04Dw-9i9s_WBQwmSrknXFB8qPEQGh43yAmKVZS8oG5UwZ3QJ1UvuKW0P_alGli_GSZ0Lg7zK0qBJKdapP8WYu0s3P3vGxUZ_lrn5wzs7WnbqQuulGHTMBmx0q4FgKa8/s320/IMG_3615.jpg&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-COBGAHcqohyIctMzt9TwoPO7r370RpNdAivKKIKgbq7Qy1XgcqhDtfKY1ZENV2wNvJcxbCi7uLTcFEgC2Fj9uw3e2FXtTQXxJ1l2eX4qpbboOS9SOuVMr-4Q7mP0x_-jhPcvcZfWAh4/s1600/IMG_3616.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-COBGAHcqohyIctMzt9TwoPO7r370RpNdAivKKIKgbq7Qy1XgcqhDtfKY1ZENV2wNvJcxbCi7uLTcFEgC2Fj9uw3e2FXtTQXxJ1l2eX4qpbboOS9SOuVMr-4Q7mP0x_-jhPcvcZfWAh4/s320/IMG_3616.jpg&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqwkWN4oPsYMVKxgEGFTgl_f-Yi65zq8N2lWZah0WMQfuyAaSCKjqDFLRZ-JMwrzfDtOyB444LZxOfOOSUrxeLxi2vIr58qAq0O68-oqqQDKtBSPwRrhI_F9HoDkoTmxyDrSyCgX6zwY0/s1600/IMG_3617.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqwkWN4oPsYMVKxgEGFTgl_f-Yi65zq8N2lWZah0WMQfuyAaSCKjqDFLRZ-JMwrzfDtOyB444LZxOfOOSUrxeLxi2vIr58qAq0O68-oqqQDKtBSPwRrhI_F9HoDkoTmxyDrSyCgX6zwY0/s320/IMG_3617.jpg&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-yPUDWJkJhG_wEVez3xd0mMVKzqo27zbtp4-sJmKrI42ZOZ8Rv5Afej-AysmAtK6Kcx7nhzG2LvbO9h_AnioVaOSPlv7ub-A8P7FN6l9ynchotHjlpUakALyqPXDz0dbmmlgPfYr0BZQ/s1600/IMG_3621.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-yPUDWJkJhG_wEVez3xd0mMVKzqo27zbtp4-sJmKrI42ZOZ8Rv5Afej-AysmAtK6Kcx7nhzG2LvbO9h_AnioVaOSPlv7ub-A8P7FN6l9ynchotHjlpUakALyqPXDz0dbmmlgPfYr0BZQ/s320/IMG_3621.jpg&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOQv2JARmzg1mr9Xhn5VuPE1tXzbPW283OhS1HUQPTuG9FEqSxgLQZTPn6Na1ylT6dpw_vLQOCxuSeO75XkiT8xCIDvtv-C3SNGwHyUqeWx4F5pJP5BCljyseeyzlGIIpY8mjqLVw_aV0/s1600/IMG_3624.jpg&quot; imageanchor=&quot;1&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOQv2JARmzg1mr9Xhn5VuPE1tXzbPW283OhS1HUQPTuG9FEqSxgLQZTPn6Na1ylT6dpw_vLQOCxuSeO75XkiT8xCIDvtv-C3SNGwHyUqeWx4F5pJP5BCljyseeyzlGIIpY8mjqLVw_aV0/s320/IMG_3624.jpg&quot; /&gt;&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/6700788331250996218/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/6700788331250996218' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/6700788331250996218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/6700788331250996218'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2013/05/a-thing-of-beauty-digital-r81.html' title='A thing of beauty Digital R81'/><author><name>Peter</name><uri>http://www.blogger.com/profile/05855046025692405834</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiENWJ08E_j4q3H3vsZscapIalHZnm3WRPCFaV5pON45EFMXLcyCJiaSbAIhRkhfj_lBposFPQLu5fETu0Co8-cJWXLPwLdds9OW5LJQJ-n4T44-OmtGgDaUcUlBEYfMtWuVWCxO7fjRJc/s72-c/IMG_3614.jpg" height="72" width="72"/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-8135080596394812320</id><published>2013-02-27T12:05:00.002+00:00</published><updated>2013-03-03T14:28:11.976+00:00</updated><title type='text'>ss tool to debug sockets</title><content type='html'>See: &lt;a href=&quot;http://gridpp-storage.blogspot.co.uk/2013/02/ss-tool-to-debug-sockets.html&quot;&gt;http://gridpp-storage.blogspot.co.uk/2013/02/ss-tool-to-debug-sockets.html&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/8135080596394812320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/8135080596394812320' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/8135080596394812320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/8135080596394812320'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2013/02/ss-tool-to-debug-sockets.html' title='ss tool to debug sockets'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-680070176319651234</id><published>2013-02-20T14:37:00.000+00:00</published><updated>2013-02-20T14:38:53.229+00:00</updated><title type='text'>Sonar tests to BNL</title><content type='html'>Follow the link &lt;a href=&quot;http://gridpp-storage.blogspot.co.uk/2013/02/sonar-test-to-bnl.html&quot;&gt;http://gridpp-storage.blogspot.co.uk/2013/02/sonar-test-to-bnl.html&lt;/a&gt;&lt;br /&gt;
</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/680070176319651234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/680070176319651234' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/680070176319651234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/680070176319651234'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2013/02/sonar-tests-to-bnl_20.html' title='Sonar tests to BNL'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-98278585678146285</id><published>2012-09-24T12:55:00.003+00:00</published><updated>2012-09-24T13:03:20.418+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="manchester network upgrade improved rates perfsonar debugging"/><title type='text'>Manchester network improvements in Graphs</title><content type='html'>As I posted &lt;a href=&quot;http://northgrid-tech.blogspot.co.uk/2012/04/big-upgrade-in-pictures.html&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://northgrid-tech.blogspot.co.uk/2012/08/10gbe-network-cards-installation-in.html&quot;&gt;here&lt;/a&gt; we have upgraded the network infrastructure within the Manchester Tier2. Below are some of the measured benefits of this upgrade so far.&lt;br /&gt;
&lt;br /&gt;
Here is the improvement of the outgoing traffic in the atlas sonar
      tests between Manchester and BNL after we upgraded the cisco
      blades and replaced the rack switches with the 10G ones&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLm-BUyVdQAtcPJKRix4D3h_xaVKJawAdQTR_H2Hl2oHxv1UagMyBST14T08ZPLmjTgguPzu-YHKhbmxDUc8-32acilegvzFfFViEPURnCT7Td4661ApYACgiIlQa_SJCEsFU54EYWjKZd/s1600/manchester-jan-apr-2012.png&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;200&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLm-BUyVdQAtcPJKRix4D3h_xaVKJawAdQTR_H2Hl2oHxv1UagMyBST14T08ZPLmjTgguPzu-YHKhbmxDUc8-32acilegvzFfFViEPURnCT7Td4661ApYACgiIlQa_SJCEsFU54EYWjKZd/s200/manchester-jan-apr-2012.png&quot; width=&quot;150&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Here instead is the throughput improvement after I have enabled
      the 10Gbps interface on the perfsonar machine. The test case is
      Oxford which also has the 10Gbps interfaces enabled.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSVBQFZwmVIjEmxLdK7BRE4I_n5e_qn-Y702ED591jA1NfWCyUWh-lPRLpiQiAWHJDC2jiWgqvBR8ZJKyfAH4htUiGo2jEeZb2TEwvXndi-s6JlRY4qBwM0cpFSqkhi6tTmqFHJhikSZ-L/s1600/man-ox-perfsonar.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;145&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSVBQFZwmVIjEmxLdK7BRE4I_n5e_qn-Y702ED591jA1NfWCyUWh-lPRLpiQiAWHJDC2jiWgqvBR8ZJKyfAH4htUiGo2jEeZb2TEwvXndi-s6JlRY4qBwM0cpFSqkhi6tTmqFHJhikSZ-L/s400/man-ox-perfsonar.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
and here more general rates with different sites. The 10Gbs is
      evident with sites that have enabled it.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioOeBjTcVAGAIYBCXRKJh2DAbIU7OecqZlbP6dnb7OOA36VxPL05qeRm1D1KWFM1oksU9Qs2-fGbTXFfamc1YrZVv_Mw2NXsFmnpziWN4cpY__Wc2WFSYLRDF-QRWMNd0-6f5g3rPgF2JX/s1600/Screen+shot+2012-09-23+at+19.21.07.png&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;180&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioOeBjTcVAGAIYBCXRKJh2DAbIU7OecqZlbP6dnb7OOA36VxPL05qeRm1D1KWFM1oksU9Qs2-fGbTXFfamc1YrZVv_Mw2NXsFmnpziWN4cpY__Wc2WFSYLRDF-QRWMNd0-6f5g3rPgF2JX/s400/Screen+shot+2012-09-23+at+19.21.07.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
The perfsonar tests have helped also to debug the poor atlas FTS
      rates the UK had with FZK
      (&lt;a class=&quot;moz-txt-link-freetext&quot; href=&quot;https://ggus.eu/ws/ticket_info.php?ticket=84008&quot;&gt;https://ggus.eu/ws/ticket_info.php?ticket=84008&lt;/a&gt;) in particular
      Manchester (and Glasgow) had&amp;nbsp; tried already to investigate last
      year with iperf within atlas what the problem was without much
      success due to measures not being taken systematically. This year
      the problem was finally pinned on FZK firewall and the improvement
      given by bypassing it is below here.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJNUulIMWFuaYsVLapFJWmWgGJy87FS5lskFggkTl_s2k8fdS_n8kG4iIQXy-PvBjJb_aWK-k9Q29T_gaz1lnE4e8rL89tLLscsUw4kKT-bn2Y1YzWpjcMymkhgqM-rB73rerKMIugIKSs/s1600/Screen+shot+2012-09-23+at+19.16.04.png&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;142&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJNUulIMWFuaYsVLapFJWmWgGJy87FS5lskFggkTl_s2k8fdS_n8kG4iIQXy-PvBjJb_aWK-k9Q29T_gaz1lnE4e8rL89tLLscsUw4kKT-bn2Y1YzWpjcMymkhgqM-rB73rerKMIugIKSs/s400/Screen+shot+2012-09-23+at+19.16.04.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
which reflects also in the improved rates in the atlas sonar tests
      between Manchester and FZK since also the data servers subnets
      bypass the firewall now.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaqZaBdQUlAxjIidKGSrzUzv7ZjrZnbGlt0Wa4K7bILaiaEXcpxhyphenhyphendeoQC5hVMX50d4kyh2Cs08JJNvKmWostT0t1_o5Rk_tcCjiW2zRzd8ID7REOpc7AubgXnJDgHeiDBmXFSO_eXbNo0/s1600/MAN-FZK-SONAR.png&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;200&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaqZaBdQUlAxjIidKGSrzUzv7ZjrZnbGlt0Wa4K7bILaiaEXcpxhyphenhyphendeoQC5hVMX50d4kyh2Cs08JJNvKmWostT0t1_o5Rk_tcCjiW2zRzd8ID7REOpc7AubgXnJDgHeiDBmXFSO_eXbNo0/s200/MAN-FZK-SONAR.png&quot; width=&quot;150&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Finally here is a increased throughput in data distribution to/from other sites as an atlas T2D. August rates were down due to a combination of problems with the storage but there is a growing trend since the rack switches and the cisco blades were upgraded. &lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4xPumt8WOHuMpA_69zYEwKUlp6xpzHeg4pUyfGt7YZggTdshXWFvWpg-gclJNSEL0jKnM1T3B-qSZtNKUHvEkmwVnLS4NJxSL_97wQwjB_7Am1QKpf18TkSPx58D2GVU6wOkrZ-6fzw-y/s1600/Screen+shot+2012-09-23+at+19.40.26.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;203&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4xPumt8WOHuMpA_69zYEwKUlp6xpzHeg4pUyfGt7YZggTdshXWFvWpg-gclJNSEL0jKnM1T3B-qSZtNKUHvEkmwVnLS4NJxSL_97wQwjB_7Am1QKpf18TkSPx58D2GVU6wOkrZ-6fzw-y/s400/Screen+shot+2012-09-23+at+19.40.26.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/98278585678146285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/98278585678146285' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/98278585678146285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/98278585678146285'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/09/manchester-network-improvements-in.html' title='Manchester network improvements in Graphs'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLm-BUyVdQAtcPJKRix4D3h_xaVKJawAdQTR_H2Hl2oHxv1UagMyBST14T08ZPLmjTgguPzu-YHKhbmxDUc8-32acilegvzFfFViEPURnCT7Td4661ApYACgiIlQa_SJCEsFU54EYWjKZd/s72-c/manchester-jan-apr-2012.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-1322692210267914931</id><published>2012-08-09T09:28:00.000+00:00</published><updated>2012-09-24T12:50:12.698+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="network"/><title type='text'>10GBE Network cards installation in Manchester</title><content type='html'>This is a collection of recipes I used to install the 10GBE cards. As I said in a previous post we chose to go 10GBASE-T so we bought X520-T2. They use the same &lt;a href=&quot;http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html&quot;&gt;chipset&lt;/a&gt; as the X520-DA2 so many things are in common. &lt;br /&gt;
&lt;br /&gt;
The new DELL R610 and C6100 were delivered with the cards already installed. Although due to the fact that DA2 and T2 share the same chipset the C6100 were delivered with the wrong connectors so we are now waiting for a replacement. For the old Viglen WNs and storage we bought additional cards that have to be inserted one by one.&lt;br /&gt;
&lt;br /&gt;
I started the installation process from the R610 because a) they had the cards and b) the perfsonar machines are R610. The aim is to use these cards as primaries and kickstart from them. By default pxe booting is not enabled. So one has to get &lt;a href=&quot;http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19186&quot;&gt;bootutil from the intel site &lt;/a&gt;. What one downloads is for some reason a windows executable but once it is unpacked there are directories for other operating systems. The easiest thing to do is what Andrew has done to zip the unpacked directory and use bootutil from the machine without fussing around with USBs or boot disks. Said that it needs the kernel source to compile. You need to make sure you install the same version as the running kernel.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;yum install kernel-devel(-running-kernel-version)&lt;running -kernel-version=&quot;-kernel-version&quot;&gt; &lt;/running&gt;&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;unzip APPS.zip&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;cd APPS/BootUtils/Linux_x86/&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;chmod 755 ./install&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;./install&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;./bootutil64e -BOOTENABLE=pxe -ALL&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;./bootutil64e&amp;nbsp; -UP=Combo -FILE=../BootIMG.FLB -ALL&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The first bootutil command enables pxe the second updates the firmware.&lt;br /&gt;
After this you can reboot and enter the bios to rearrange the order of the network devices to boot from. When this is&amp;nbsp; done you can put the 10GBE interface mac address in the dhcp and reinstall from there.&lt;br /&gt;
&lt;br /&gt;
At kickstart time there are some problems with the machine changing the order of the cards you can solve that using ipappend 2 and ksdevice=bootif as suggested in the &lt;a href=&quot;http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/sn-booting-from-pxe-x86.html&quot;&gt;RH docs&lt;/a&gt; in the pxelinux.cfg files. Thanks to Ewan for pointing that out.&lt;br /&gt;
&lt;br /&gt;
Still the machine might not come back up with the interface working. There might be two problems here:&lt;br /&gt;
&lt;br /&gt;
1) X520-T2 interface take longer to wake up than their little 1GBE sisters. It is necessary to insert a delay after the /sbin/ip command in the network scripts. To do this I didn&#39;t have to hack anything, I could just set&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;LINKDELAY=10&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
in the ifcfg-eth* configuration files and it worked.&lt;br /&gt;
&lt;br /&gt;
2) It is not guarantueed&amp;nbsp; to have the 10GBE interface as eth0. There are many ways to stop this from happening. &lt;br /&gt;
&lt;br /&gt;
One is to make sure HWADRR if ifcfg-eth0 is assigned the mac address value of the card the administrator want and not what the system decides. It can be done at kickstart time but this might mean to have a kickstart file for each machine which we are trying to get away from.&lt;br /&gt;
&lt;br /&gt;
Dan and Chris suggested this might be corrected with &lt;a href=&quot;http://www.linuxfromscratch.org/blfs/view/development/chapter07/network.html&quot;&gt;udev &lt;/a&gt;The recipe they gave me was this&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;cat /etc/udev/rules.d/70-persistent-net.rules &lt;br /&gt;
KERNEL==&quot;eth*&quot;,  ID==&quot;0000:01:00.0&quot;, NAME=&quot;eth0&quot; &lt;br /&gt;
KERNEL==&quot;eth*&quot;,  ID==&quot;0000:01:00.1&quot;, NAME=&quot;eth1&quot; &lt;br /&gt;
KERNEL==&quot;eth*&quot;,  ID==&quot;0000:04:00.0&quot;, NAME=&quot;eth2&quot; &lt;br /&gt;
KERNEL==&quot;eth*&quot;,  ID==&quot;0000:04:00.1&quot;, NAME=&quot;eth3&quot; &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
and uses the pci device ID value which is the same for the same machine types (R610, C6100...). You can get the ID values using &lt;b&gt;lspci | grep Eth&lt;/b&gt;. Not essential but if &lt;b&gt;lspci&lt;/b&gt; returns something like &lt;b&gt;Unknown device 151c (rev01)&lt;/b&gt; in the description it is just the pci database that is not up to date use &lt;b&gt;update-pciid&lt;/b&gt; to update the database. There are other recipes around if you don&#39;t like this one, but this simplifies a lot the maintenance of the interfaces naming scheme.&lt;br /&gt;
&lt;br /&gt;
The udev recipe doesn&#39;t work if HWADDR are set in the ifcfg-eth* files.&amp;nbsp; If they are you need to remove them to make udev work. A quick way to do this in every file is&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;sed -i -r &#39;/^HWADDR.*$/d&#39; ifcfg-eth* &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
in the post kickstart and then install the udev file.&lt;br /&gt;
&lt;br /&gt;
10GBE cards might need different TCP tuning in /etc/sysctl.conf for now I took the perfsonar machine one which is similar to something already discussed long time ago.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;net.core.rmem_max = 33554432&lt;br /&gt;net.core.wmem_max = 33554432&lt;br /&gt;net.ipv4.tcp_rmem = 4096 87380 16777216&lt;br /&gt;net.ipv4.tcp_wmem = 4096 87380 16777216&lt;br /&gt;net.core.netdev_max_backlog = 30000&lt;br /&gt;net.ipv4.tcp_no_metrics_save = 1&lt;br /&gt;net.ipv4.tcp_congestion_control = htcp&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The effects of moving to 10GBE can be seen very well in the perfsonar tests.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJLOOAXfFdN4_aTPLBXgTtIccTzZr8JAaQ_J-agkhcIbtbsBITkujFHwS33DkuV8epC5t4mhyphenhyphencqQf47jAqOqZRLJQnfn-YdekHcBpUp_Aifo1zeJJXcBfn7tNjFvOnfILRK3Q1vEY71aTR/s1600/Screen+shot+2012-08-08+at+22.35.03.png&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; float: left; margin-bottom: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;234&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJLOOAXfFdN4_aTPLBXgTtIccTzZr8JAaQ_J-agkhcIbtbsBITkujFHwS33DkuV8epC5t4mhyphenhyphencqQf47jAqOqZRLJQnfn-YdekHcBpUp_Aifo1zeJJXcBfn7tNjFvOnfILRK3Q1vEY71aTR/s640/Screen+shot+2012-08-08+at+22.35.03.png&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/1322692210267914931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/1322692210267914931' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1322692210267914931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/1322692210267914931'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/08/10gbe-network-cards-installation-in.html' title='10GBE Network cards installation in Manchester'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJLOOAXfFdN4_aTPLBXgTtIccTzZr8JAaQ_J-agkhcIbtbsBITkujFHwS33DkuV8epC5t4mhyphenhyphencqQf47jAqOqZRLJQnfn-YdekHcBpUp_Aifo1zeJJXcBfn7tNjFvOnfILRK3Q1vEY71aTR/s72-c/Screen+shot+2012-08-08+at+22.35.03.png" height="72" width="72"/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-2982182923876960804</id><published>2012-07-20T11:37:00.000+00:00</published><updated>2012-12-10T08:59:31.982+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="atlas"/><category scheme="http://www.blogger.com/atom/ns#" term="batch"/><category scheme="http://www.blogger.com/atom/ns#" term="files"/><category scheme="http://www.blogger.com/atom/ns#" term="leaks"/><category scheme="http://www.blogger.com/atom/ns#" term="log"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="memory"/><category scheme="http://www.blogger.com/atom/ns#" term="parsing"/><category scheme="http://www.blogger.com/atom/ns#" term="settings"/><category scheme="http://www.blogger.com/atom/ns#" term="system"/><category scheme="http://www.blogger.com/atom/ns#" term="torque"/><title type='text'>Jobs with memory leaks containment</title><content type='html'>This week some sites suffered from extreme memory hungry jobs using
    up to 16GB of memory and killing the nodes. These were most likely
    due to memory leaks. The user cancelled all of them before he was
    even contacted but not before he created some annoyance. &lt;br /&gt;
&lt;br /&gt;
We have had some discussion about how to fix this and atlas so far
    has asked not to limit on memory because their jobs use for brief
    periods of time more than what is officially requested. And this is
    true most of their jobs do this infact. According to the logs the
    production jobs use up to ~3.5GB mem and slightly less than 5GB
    vmem. See plot below for one random day (other days are similar).&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIK9nnVe_Ep-BbupnzvOSje3qJHL8kBnaaUQhGjER-ZLhleorwvsrF0W7Q0bG5gF6OuBgVGwVv9OwnuHMBLv47N1eyf_9rJXrs8sBbUOYVQ8K6xJ_E3Mp6uf3MJpyKkBdy0H-XsJsf-Jat/s1600/Screen+shot+2012-07-16+at+19.39.47.png&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; float: left; margin-bottom: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;191&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIK9nnVe_Ep-BbupnzvOSje3qJHL8kBnaaUQhGjER-ZLhleorwvsrF0W7Q0bG5gF6OuBgVGwVv9OwnuHMBLv47N1eyf_9rJXrs8sBbUOYVQ8K6xJ_E3Mp6uf3MJpyKkBdy0H-XsJsf-Jat/s400/Screen+shot+2012-07-16+at+19.39.47.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;To avoid killing everything but still putting a barrier against the
    memory leaks what I&#39;m going to do in Manchester is to limit for mem to 4GB and a limit for vmem to 5GB. &lt;br /&gt;
&lt;br /&gt;
If you are worried about memory leaks you might want to go through a
    similar check. If you are not monitoring your memory consumption on
    a per job basis you can parse your logs. For PBS I used this command
    to produce the plot above&lt;br /&gt;
&lt;br /&gt;
&lt;small&gt;&lt;b&gt;&lt;span style=&quot;font-size: small;&quot;&gt;grep atlprd
        /var/spool/pbs/server_priv/accounting/20120716| awk &#39;{ print
        $17, $19, $20}&#39;| grep status=0|cut -f3,4 -d&#39;=&#39;| sed
        &#39;s/resources_used.vmem=//&#39;|sort -n|sed &#39;s/kb//g&#39;&lt;/span&gt;&lt;/b&gt;&lt;/small&gt;&lt;br /&gt;
&lt;small&gt;&lt;b&gt;&lt;span style=&quot;font-size: small;&quot;&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;i&gt;&lt;br /&gt;
      &lt;/i&gt;&lt;/small&gt;&lt;br /&gt;
numbers are already sorted in numerical order so the last one is the
    highest (mem,vmem) a job has used that day. &lt;b&gt;atlprd&lt;/b&gt; is the
    atlas production group which you can replace with other groups.&amp;nbsp;
    Atlas users jobs have up to a point similar usage and then every day
    you might find a handful crazy numbers like 85GB vmem and 40GB mem.
    These are the jobs we aim at killing.&lt;br /&gt;
&lt;br /&gt;
I thought the batch system was simplest way because it is only two commands in PBS but after lot of reading and a week of testing it is not possible to over allocate memory without affecting the scheduling and ending up with less jobs on the nodes. This is what I found out:&lt;br /&gt;
&lt;br /&gt;
There are various memory parameters that can be set in PBS:&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;(p)vmem: &lt;/b&gt;virtual memory. PBS doesn&#39;t interpret vmem as the almost unlimited address space. If you set this value it will interpret it for scheduling purposes as memory+swap available. It might be different with later versions but that&#39;s what happens in torque 2.3.6.&lt;br /&gt;
&lt;b&gt;(p)mem: &lt;/b&gt;physical memory: that&#39;s you RAM.&lt;br /&gt;
&lt;br /&gt;
when there is a p in front it means per process rather than per job&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;&lt;queue&gt;&lt;/queue&gt;&lt;/b&gt;If you set them what happens is as follows:&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;ALL:&lt;/b&gt; if a job arrives without memory settings the batch system will assign these limits as allocated memory for the job not only as a limit the job doesn&#39;t have to exceed.&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;ALL:&lt;/b&gt; if a job arrives with memory resources settings that exceed the limits it will be rejected.&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;(p)vmem,pmem:&lt;/b&gt; if a job exceeds the settings at run time it will be killed as these parameters set limits at OS level.&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;mem:&lt;/b&gt; if a job exceeds this limit at run time it will not get killed. This is due to a change in the libraries apparently.&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;
To check how the different parameters affect the jobs you can submit directly to pbs this csh command and play with the parameters&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: small;&quot;&gt;&lt;b&gt;echo &#39;csh -c limit&#39; | qsub -l vmem=5000000kb,pmem=1GB,mem=2GB,nodes=1:ppn=2 &lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
If you want to set these parameters you have to do the following&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;qmgr&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;qmgr: set queue long resources_max.vmem = 5gb&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;qmgr: set queue long resources_max.mem = 4gb&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;qmgr: set queue long resources_max.pmem = 4gb&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
These settings will affect the whole queue so if you are worried
    about other VOs you might want to check what sort of memory usage
    they have. Although I think only CMS might have a similar usage. I know for sure Lhcb uses less. And as said above this will affect the scheduling.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Update 02/08/2012 &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
RAL and Nikhef use a maui parameter to correct the&amp;nbsp; the over allocation problem&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;NODEMEMOVERCOMMITFACTOR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1.5
&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
this will cause maui to allocate up to 1.5 times more memory than there is on the nodes. So if a machine has 2GB memory a 1.5 factor allows to allocate 3GB. Same with other memory parameters described above. The factor can of course be tailored to your site.&lt;br /&gt;
&lt;br /&gt;
On the atlas side there is a memory parameter that can be set in panda. It sets
 a ulimit on vmem on a per process basis in the panda wrapper. It didn&#39;t
 seem to have an effect on the memory seen by the batch system but that 
might be because forked processes are double counted by PBS which opens a whole different can of worms.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/2982182923876960804/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/2982182923876960804' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2982182923876960804'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2982182923876960804'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/07/atlas-jobs-with-memory-leaks-containment.html' title='Jobs with memory leaks containment'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIK9nnVe_Ep-BbupnzvOSje3qJHL8kBnaaUQhGjER-ZLhleorwvsrF0W7Q0bG5gF6OuBgVGwVv9OwnuHMBLv47N1eyf_9rJXrs8sBbUOYVQ8K6xJ_E3Mp6uf3MJpyKkBdy0H-XsJsf-Jat/s72-c/Screen+shot+2012-07-16+at+19.39.47.png" height="72" width="72"/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-3719002593096779845</id><published>2012-04-05T17:14:00.000+00:00</published><updated>2012-09-24T12:59:52.912+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="infrastructure"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="network"/><category scheme="http://www.blogger.com/atom/ns#" term="upgrade"/><title type='text'>The Big Upgrade in pictures</title><content type='html'>&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;float: left; margin-right: 1em; text-align: left;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYEr_0qeUq8aBHc1L4jI2ZeMHSI22ajncPrYzbTR-xGwwEa4GsNGtxX2n9qp5cBCA9HUQfJ50wtOpsUxa3mD_DiK4wk2EIdM2VIZ8vtHmgzUyo1ODXNPeip-uZ8QEAknGWQrw-YboGCTKU/s1600/120405-1.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYEr_0qeUq8aBHc1L4jI2ZeMHSI22ajncPrYzbTR-xGwwEa4GsNGtxX2n9qp5cBCA9HUQfJ50wtOpsUxa3mD_DiK4wk2EIdM2VIZ8vtHmgzUyo1ODXNPeip-uZ8QEAknGWQrw-YboGCTKU/s400/120405-1.JPG&quot; width=&quot;266&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;New Cisco blades, engines and power supplies&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;
&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSPGsQQFBbUjv8Ub38_vhPbnZSwzz0ou1gKtFfcrIiTH9SKsXBlzf6VdVQu844-V2tE1P4LVZ573xHIhKuq5XOaPJvgbb2JOkim3WifLJcDG6ZjnZa06JLOO7Waut9UvkeYHyf-6Tj06Sn/s400/120405-2.JPG&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; width=&quot;265&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;DELLs boxes among which new switches&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2vFSP-xMwFa_49imGJ_QZP2zcIiBqTWRzJ5jDJAx5tLnCn1okdKEiKokE0L7aIAN2CfPDnfQ3MAdUMM9ArOOTaKjETg8bnKPFqQ2SFioSTi197YVj33t0ekfe1o6TZs0yh3Q3vkO8btJH/s1600/120405-4.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: right; float: right; margin-bottom: 1em; margin-left: 1em;&quot;&gt;&lt;/a&gt;&lt;/div&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2jfMOPbgigEzlc3qwfLuGgtJvj0pUn7pkjrzKCFZ_KDkAcTQC4RCA1a2T59iqYD2uzSiOz4ffJwSmQuV3AzEn9leGTGkZd4LJ0-KNQ3Zu2d5Hh9pFs8f7cJnGkhbgrEemRBc81_30JEcq/s1600/120405-5.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2jfMOPbgigEzlc3qwfLuGgtJvj0pUn7pkjrzKCFZ_KDkAcTQC4RCA1a2T59iqYD2uzSiOz4ffJwSmQuV3AzEn9leGTGkZd4LJ0-KNQ3Zu2d5Hh9pFs8f7cJnGkhbgrEemRBc81_30JEcq/s400/120405-5.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Aerial view of the old cabling&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2vFSP-xMwFa_49imGJ_QZP2zcIiBqTWRzJ5jDJAx5tLnCn1okdKEiKokE0L7aIAN2CfPDnfQ3MAdUMM9ArOOTaKjETg8bnKPFqQ2SFioSTi197YVj33t0ekfe1o6TZs0yh3Q3vkO8btJH/s1600/120405-4.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2vFSP-xMwFa_49imGJ_QZP2zcIiBqTWRzJ5jDJAx5tLnCn1okdKEiKokE0L7aIAN2CfPDnfQ3MAdUMM9ArOOTaKjETg8bnKPFqQ2SFioSTi197YVj33t0ekfe1o6TZs0yh3Q3vkO8btJH/s400/120405-4.JPG&quot; width=&quot;266&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Frontal view of the mess&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiyXzj5YDEyxcIgBvicxHVs_e-4DgfCIAg8wh9iqnFMv3v0j1V4L_q7Zr6s-2aCWBjljA3X6EaadrRnzHFtbcOL8YwDSLQUnicX9yqVkNSRN0Noo7rBo-UoV-Gh0JATn-LrISbmNhVl4jD/s1600/120405-6.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiyXzj5YDEyxcIgBvicxHVs_e-4DgfCIAg8wh9iqnFMv3v0j1V4L_q7Zr6s-2aCWBjljA3X6EaadrRnzHFtbcOL8YwDSLQUnicX9yqVkNSRN0Noo7rBo-UoV-Gh0JATn-LrISbmNhVl4jD/s400/120405-6.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Cables unplugged from the cisco&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjP-jaYuuni_ZMlOF08mnTEyV3x5yUlpVNJ4qrjZcseFC8BI2loEygptVXo33dPrKQ24h-O_xGZySL-556BQUbv6PhmQwCzofQu5bTy6izcuCqv5OuGZJOqaY_HNwNCJaP4pDTrqQ_8Ap3a/s1600/120405-7.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;265&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjP-jaYuuni_ZMlOF08mnTEyV3x5yUlpVNJ4qrjZcseFC8BI2loEygptVXo33dPrKQ24h-O_xGZySL-556BQUbv6PhmQwCzofQu5bTy6izcuCqv5OuGZJOqaY_HNwNCJaP4pDTrqQ_8Ap3a/s400/120405-7.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Cisco old blades with services racks still connected&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8vSiWZWG1b0zmRG7ELPhLMVqP-uExggD75BQmHoNKe7dbf16QdcikHa3Hz1zMrn32XqD6snko9vbxCgmx5S-Q4Fy9ie4dE-yPwcHg6fbP2XtJv_zgfP6ZMkpkSUyFfgepSm3UhUFDx0ue/s1600/120405-13.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;238&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8vSiWZWG1b0zmRG7ELPhLMVqP-uExggD75BQmHoNKe7dbf16QdcikHa3Hz1zMrn32XqD6snko9vbxCgmx5S-Q4Fy9ie4dE-yPwcHg6fbP2XtJv_zgfP6ZMkpkSUyFfgepSm3UhUFDx0ue/s400/120405-13.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;New cat6a cisco cabling aerial view nice and tidy&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4o7ceTQLAIl_WtmwBILyVcA0gEPf1VcTM26iVXYhjbTc4IULjMYG8D5u0Mo-vVxT3XsbiVUhA9ZEPltSTUfv6XJ6_UW6yHvBBvJIkP-lijOayRZtmuO_cOrT5-1qleYWuWPofHI_f8ueX/s1600/120405-14.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4o7ceTQLAIl_WtmwBILyVcA0gEPf1VcTM26iVXYhjbTc4IULjMYG8D5u0Mo-vVxT3XsbiVUhA9ZEPltSTUfv6XJ6_UW6yHvBBvJIkP-lijOayRZtmuO_cOrT5-1qleYWuWPofHI_f8ueX/s400/120405-14.JPG&quot; width=&quot;266&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Frontal view of the new cisco blades and cabling nice and tidy&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTXffohRnQHddvsg4x31zCDpqK2l4ukLJUV42lZQ2yFOvwpdzOfkJoIwNtM9lDQ7EnyLszB1hNXAbKIES-vT5fIKiu4MdKoGszMBbh8KhtH8qd4G2jRw2mWKwjETROsvKGFUMH8pvNVXG0/s1600/120405-10.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTXffohRnQHddvsg4x31zCDpqK2l4ukLJUV42lZQ2yFOvwpdzOfkJoIwNtM9lDQ7EnyLszB1hNXAbKIES-vT5fIKiu4MdKoGszMBbh8KhtH8qd4G2jRw2mWKwjETROsvKGFUMH8pvNVXG0/s400/120405-10.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Old and new rack switches front view&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibZLpHuqAmBhqVOk_OSC9wWrlkxJfVt-1Q4h9qpyrZxSHcet88CGTNn8euDQbLNIGJTd32xtsPfz6Or8XzN_VdBBFhUL6JNMlBmSu0FjoUt39j8Sgl5MHFEDsrNB074JukoHiQfqtDVIk9/s1600/120405-11.JPG&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;265&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibZLpHuqAmBhqVOk_OSC9wWrlkxJfVt-1Q4h9qpyrZxSHcet88CGTNn8euDQbLNIGJTd32xtsPfz6Or8XzN_VdBBFhUL6JNMlBmSu0FjoUt39j8Sgl5MHFEDsrNB074JukoHiQfqtDVIk9/s400/120405-11.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Old and new rack switches rear view&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDI6bXi_N8bjbXdYoDY7kwcxxjtNp2CZHEI4X1WTZ0taUhSnkymHkQ-erJvDJjWJRZLof1jIP8S077xZUbferSRmPYR6ybcLO0_CPs0UXvngRwEfSPbfTXQBpItzx9cGBLJveZ7KYe61TV/s1600/120405-8.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDI6bXi_N8bjbXdYoDY7kwcxxjtNp2CZHEI4X1WTZ0taUhSnkymHkQ-erJvDJjWJRZLof1jIP8S077xZUbferSRmPYR6ybcLO0_CPs0UXvngRwEfSPbfTXQBpItzx9cGBLJveZ7KYe61TV/s400/120405-8.JPG&quot; width=&quot;266&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Emptying and reorganising the racks&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx-5g7rQ_dKWhcPzqt2OjpBrzDu-cL_v5tUBvHPYO9-ConYMYO9YXC3MLdPfwnw65KD3anzAEOzfE8rOiT6-vYwkk1Zp2wI-eMkbJbhFu1Yo9t-KSMRScZZl_yiXfLcmZVnxDgw6Xh49Gj/s1600/120405-9.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx-5g7rQ_dKWhcPzqt2OjpBrzDu-cL_v5tUBvHPYO9-ConYMYO9YXC3MLdPfwnw65KD3anzAEOzfE8rOiT6-vYwkk1Zp2wI-eMkbJbhFu1Yo9t-KSMRScZZl_yiXfLcmZVnxDgw6Xh49Gj/s400/120405-9.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Empty racks ready to be filled with new machines&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgloVTRS3C9kjc4GiR6azbilJ2_stTYzehN_c7-dw0ojC6xaq4ILcuI51cwZFRd6IWfgdPaL-PoaKf-lq5J5JIqL1VZE7HdDL2q-KcY41o052yyoopr2Ooq_EMtxcfmVkwFOtq66DB6sqyn/s1600/120405-9a.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgloVTRS3C9kjc4GiR6azbilJ2_stTYzehN_c7-dw0ojC6xaq4ILcuI51cwZFRd6IWfgdPaL-PoaKf-lq5J5JIqL1VZE7HdDL2q-KcY41o052yyoopr2Ooq_EMtxcfmVkwFOtq66DB6sqyn/s400/120405-9a.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Old DELLs cemetery&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;float: right; margin-left: 1em; text-align: right;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3DLFx_Fun6DBcoL4sMo1Ysx60WYU8_DIbVwko98fHpUhbKAwG39rdTd8RmLy00qZgFDGvcAjSQdyxuysClgu09Nkadpt30lhebP89wUsjcV18mu71hIit-8No98hQRP1Ny7aNK0dD3nA5/s1600/120405-9b.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;266&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3DLFx_Fun6DBcoL4sMo1Ysx60WYU8_DIbVwko98fHpUhbKAwG39rdTd8RmLy00qZgFDGvcAjSQdyxuysClgu09Nkadpt30lhebP89wUsjcV18mu71hIit-8No98hQRP1Ny7aNK0dD3nA5/s400/120405-9b.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Old cables cemetery. All the cat5e cables going under the floor from the racks to the cisco half of the cables from the rack switches to the machines and all the patch cables in front of the cisco shown above have gone.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;float: left; margin-right: 1em; text-align: left;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvYdF6mzSo2LvVAhowSfOTTNE1_8jmM6NNk4PsGo79hvOLgTXjYdJkrQZCrVeZcXbPwhl_AWjeyh4nhgwK0mrowVQKwKp3TeCKiZmt7JB7i-G-wVo5nJfRNKn7gGQ7YE82edImzX9XFiS-/s1600/120405-15.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;265&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvYdF6mzSo2LvVAhowSfOTTNE1_8jmM6NNk4PsGo79hvOLgTXjYdJkrQZCrVeZcXbPwhl_AWjeyh4nhgwK0mrowVQKwKp3TeCKiZmt7JB7i-G-wVo5nJfRNKn7gGQ7YE82edImzX9XFiS-/s400/120405-15.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;All the racks but two have now the new switches but the machines are still connected with cat5e cables. Upgrading the network cards will be done in Phase two one rack at the time to minimize service disruption.&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;br /&gt;
&lt;br /&gt;
The downtime lasted 6 days. Everybody who was involved did a great job and the choice of 10GBASE-T was a good one because the ports auto-negotiation is allowing us to run at 3 different speeds on the same switches: PDU 100Mbps, old WN and storage at 1Gbps, and the connection with the cisco is 10Gbps. We also kept one of the old cisco blades for connections that don&#39;t require 10Gbps such as the out-of-band management cables plus two racks of servers that will be upgraded at a later stage are still connected at 1Gbps to the cisco. And we finished perfectly in time for the start of data taking (and Easter). :)</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/3719002593096779845/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/3719002593096779845' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3719002593096779845'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3719002593096779845'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/04/big-upgrade-in-pictures.html' title='The Big Upgrade in pictures'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYEr_0qeUq8aBHc1L4jI2ZeMHSI22ajncPrYzbTR-xGwwEa4GsNGtxX2n9qp5cBCA9HUQfJ50wtOpsUxa3mD_DiK4wk2EIdM2VIZ8vtHmgzUyo1ODXNPeip-uZ8QEAknGWQrw-YboGCTKU/s72-c/120405-1.JPG" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-520103267050604934</id><published>2012-03-31T13:49:00.000+00:00</published><updated>2012-09-24T12:49:26.187+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="decomissioning"/><category scheme="http://www.blogger.com/atom/ns#" term="dell"/><category scheme="http://www.blogger.com/atom/ns#" term="machines"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="poweredge"/><category scheme="http://www.blogger.com/atom/ns#" term="upgrade"/><title type='text'>So long and thanks for all the fish</title><content type='html'>&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;http://www.hep.manchester.ac.uk/computing/tier2/images/cluster.jpg&quot; imageanchor=&quot;1&quot; style=&quot;clear: left; float: left; margin-bottom: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;212&quot; src=&quot;http://www.hep.manchester.ac.uk/computing/tier2/images/cluster.jpg&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
In 2010 we had already decommissioned half of the original mythical 2000 (1800 for us) EM64T CPUs Dell cluster that allowed us to be the 4th of the top 10 countries in EGEE in 2007. &lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.hep.manchester.ac.uk/computing/tier2/images/Tier2-2007.png&quot; imageanchor=&quot;1&quot; style=&quot;clear: right; float: right; margin-bottom: 1em; margin-left: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;240&quot; src=&quot;http://www.hep.manchester.ac.uk/computing/tier2/images/Tier2-2007.png&quot; width=&quot;320&quot; /&gt;&amp;nbsp;&lt;/a&gt;&lt;a href=&quot;http://www.hep.manchester.ac.uk/computing/tier2/images/Tier2-2007.png&quot; imageanchor=&quot;1&quot; style=&quot;clear: right; float: right; margin-bottom: 1em; margin-left: 1em;&quot;&gt; &lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This year we are  decommissioning the last 430 machines that served us so well for  6 years and 2 months. So... so long and thanks for all the fish.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/520103267050604934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/520103267050604934' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/520103267050604934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/520103267050604934'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/03/so.html' title='So long and thanks for all the fish'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-5499049933033260333</id><published>2012-01-14T12:09:00.004+00:00</published><updated>2012-07-20T11:42:26.689+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="database"/><category scheme="http://www.blogger.com/atom/ns#" term="dpm"/><category scheme="http://www.blogger.com/atom/ns#" term="file"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="optimization"/><category scheme="http://www.blogger.com/atom/ns#" term="syncronisation"/><category scheme="http://www.blogger.com/atom/ns#" term="system"/><title type='text'>DPM database file systems synchronization</title><content type='html'>The synchronisation of the DPM database with the data servers file systems has been a long standing issue.&amp;nbsp; Last week we had a crash that made more imperative to check all the files and I eventually wrote a bash script that makes use of the &lt;a href=&quot;https://www.gridpp.ac.uk/wiki/DPM-admin-tools#GridPP_DPM_administration_toolkit&quot;&gt;GridPP DPM admin tools.&lt;/a&gt; I don&#39;t think this should be the final version but I&#39;m quicker with bash than with python and therefore I&amp;nbsp; started with that. Hopefully later in the year I&#39;ll have more time to write a cleaner version in python that can be inserted in the admin tools based on this one. It does the following:&lt;br /&gt;
&lt;br /&gt;
1) Create a list of files that are in the DB but not on disk&lt;br /&gt;
2) Create a list of files that are on disk but not in the DB&lt;br /&gt;
3) Create a list of SURLs from the list of files in the DB but not on disk to declare lost (this is mostly for atlas but could be used by LFC administrators for other VOs)&lt;br /&gt;
4) If not in dry run mode proceed to delete the orphan files and the orphan entries in the DB. &lt;br /&gt;
5) Print stats of how many files were in either list.&lt;br /&gt;
&lt;br /&gt;
Although I put few protections this script should be run with care and &lt;b&gt;unless in dry run mode&lt;/b&gt; shouldn&#39;t be run automatically &lt;b&gt;AT ALL&lt;/b&gt;. However in dry run mode it will tell you how many files are lost and it is a good metric to monitor regularly as well as when there is a big crash.&lt;br /&gt;
&lt;br /&gt;
If you want to run it, it has to run on the data servers where there is access to the file system. As it is now it requires a modified version of /opt/lcg/etc/DPMINFO that point to the head node rather than localhost because one of the admin tools used does a direct mysql query. For the same reason it also requires &lt;b&gt;dpminfo user&lt;/b&gt; to have mysql select privileges from the data servers. This is the part that really could benefit from a rewriting in python and perhaps a proper API use as the other tool does. I also had to heavily parse the output of the tools which weren&#39;t created exactly for this purpose and this could also be avoided in a python script. There are no options but all the variables that could be options to customize the script with your local settings (head node, fs mount point, dry_run) are easily found at the top.&lt;br /&gt;
&lt;br /&gt;
To create the lists it takes really little time no more than 3 minutes on my system but it depends mostly on how busy is your head node.&lt;br /&gt;
&lt;br /&gt;
If you want to do a cleanup instead it is proportional to how many files have been lost and can take several hours since it does one DB operation per file. The time to delete the orphan files also depends on how many and how big they are but should take less than DB cleanup.&lt;br /&gt;
&lt;br /&gt;
The script is here: &lt;a href=&quot;http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-synchronise-disk-db.sh&quot;&gt;http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-synchronise-disk-db.sh&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/5499049933033260333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/5499049933033260333' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/5499049933033260333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/5499049933033260333'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2012/01/dpm-database-file-systems.html' title='DPM database file systems synchronization'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-755506275777766572</id><published>2011-11-30T11:05:00.032+00:00</published><updated>2012-07-20T11:43:15.146+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="1.7.4"/><category scheme="http://www.blogger.com/atom/ns#" term="1.8.2"/><category scheme="http://www.blogger.com/atom/ns#" term="5.5"/><category scheme="http://www.blogger.com/atom/ns#" term="dpm"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="mysql"/><category scheme="http://www.blogger.com/atom/ns#" term="upgrade"/><title type='text'>DPM upgrade 1.7.4 -&gt; 1.8.2 (glite 3.2)</title><content type='html'>Last week I upgraded our DPM installation. It was a major change because I upgraded not only the DPM version but also the hardware and the backend mysql version.&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
I didn&#39;t take any measures this time before and after. I knew that becoming an alpha site in atlas was taking its toll on the old hardware and many of the timeouts were from gridftp but there had been a reappearance of the mysql ones I talked about in &lt;a href=&quot;http://northgrid-tech.blogspot.com/2011/06/dpm-optimization-next-round.html&quot;&gt;previous posts&lt;/a&gt; at the level that even restarting the service was hard.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;[ ~]# service mysqld restart &lt;/span&gt; &lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;&lt;br /&gt;Timeout error occurred trying to stop MySQL Daemon. &lt;/span&gt; &lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;&lt;br /&gt;Stopping MySQL:                                            [FAILED] &lt;/span&gt; &lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;&lt;br /&gt;Timeout error occurred trying to start MySQL Daemon.  &lt;/span&gt;&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
So I decided that the situation had become unsustainable and it was time to move to better hardware and software versions.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;* Hardware:&lt;/span&gt; 2 cpu, 4GB mem, 2x250 GB raid1 -&amp;gt; 4 cores (HT on = 8 job slots), 24GB mem, 2x2TB raid1&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
There is no why here it was ok when we had limited access but the recent  load was really too much for the old machine even with all the tuning.  Suspected bad blocks on disks could be possible but no red leds nor  hardware errors were reported by the machine.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;* Mysql: &lt;/span&gt;5.0.77 -&amp;gt; 5.5.10 &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Why mysql 5.5? Because InnoDB is the default engine and they have  improved performance and instrumentation. On top of other things that we  might actually start to use. A good blog article about the 5 reasons to  move is this one: &lt;a href=&quot;http://ronaldbradford.com/blog/five-reasons-to-upgrade-to-mysql-5-5-2010-12-15/&quot;&gt;5 good reasons to upgrade to mysql 5.5&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
MySQL 5.5 is not in EPEL yet, but I found this CentOS  community site that has the &lt;a href=&quot;http://www.webtatic.com/packages/mysql55/&quot;&gt;rpms and the instructions to install them&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
After the installation I&#39;ve also optimized the database partially with what I had already &lt;a href=&quot;http://northgrid-tech.blogspot.com/2011/06/dpm-optimization-next-round.html&quot;&gt;done in July&lt;/a&gt;, partly running a handy script &lt;a href=&quot;http://www.techerator.com/2011/08/optimize-your-mysql-server-with-the-mysql-tuner-script/&quot;&gt;mysqltuner.pl&lt;/a&gt;.  This last one helps with variable you might not even know and even if  you know them it tells you if they are too small. You need to be patient  and let pass few hours before run it again.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;* DPM:&lt;/span&gt; 1.7.4 -&amp;gt; 1.8.2&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Why DPM 1.8.2 from glite 3.2? I would have gone for the UMD release or  even the EMI one but then glite 3.2 was moved to production earlier than  those and since I waited for this release since at least April I didn&#39;t  think about it twice when I saw the escape route. It was really good timing too as it happened when I really couldn&#39;t postpone an upgrade anymore. You can find more info in the &lt;a href=&quot;http://glite.cern.ch/R3.2/sl5_x86_64/glite-SE_dpm_mysql/1.8.2-3/&quot;&gt;release notes&lt;/a&gt;. Among other reasons to upgrade: &lt;a href=&quot;https://savannah.cern.ch/bugs/?71041&quot;&gt;srmv2.2 in 1.7.4 has a memory leak&lt;/a&gt; which wasn&#39;t noticeable until the load was contained but for us exploded in October and is the reason I had to restart it every two days in the past few weeks.&lt;br /&gt;
&lt;br /&gt;
Below the steps I took to reinstall the head node&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 130%; font-weight: bold;&quot;&gt;On the old head node&lt;/span&gt;&lt;span style=&quot;font-size: 130%;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
* Set the site in downtime, drain the queues and kill all the remaining jobs.&lt;br /&gt;
&lt;br /&gt;
* Turn off all the dpm and bdii services on the old head node&lt;br /&gt;
&lt;br /&gt;
* Make a dump of the current database for backup&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;mysqldump -C -Q -u root -p -B dpm_db cns_db &amp;gt; dpm.sql-20111125.gz&lt;/span&gt;&lt;span style=&quot;font-size: 85%;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
* Download dpm-drop-requests-tables.sql supplied by Jean Philippe last July&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;wget http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-drop-requests-tables.sql&lt;/span&gt;&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
* Drop the requests tables. This step is really useful to avoid painful reload times as I said in &lt;a href=&quot;http://northgrid-tech.blogspot.com/2011/06/dpm-optimization.html&quot;&gt;this other post about DPM optimization&lt;/a&gt; and because it drastically reduces the size of ibdata1 when you reload which has also benefits (my ibdata1 was reduced from 26GB to 1.7GB). Still you need to plan because it might take few hours depending on the system. On my old hardware it took around 7 hours.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%; font-weight: bold;&quot;&gt;mysql -p &amp;lt; dpm-drop-requests-tables.sql  &lt;/span&gt;&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
* Dump reduced version of the database  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-size: 100%;&quot;&gt;mysqldump -C -Q -u root -p -B dpm_db cns_db &amp;gt; dpm.sql-20111125-v2.gz&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Copy both to a WEB server where they can be downloaded from in a later stage.&lt;br /&gt;
&lt;br /&gt;
* Update the local repository for DPM head node and DPM disk servers. Since it is still glite I just had to rsync the latest mirror to the static area.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 130%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;On the new head node&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
* Install the new machines with a DPM head node profile. This was again easy since it is still glite no changes were required in cfengine.&lt;br /&gt;
&lt;br /&gt;
* Most of the following is not standard and I put it in a script. If you have problems with users IDs created by &lt;span style=&quot;font-weight: bold;&quot;&gt;avahi&lt;/span&gt; packages you can uninstall them with yum removing all the dependencies and let them be reinstalled by the bdii dependency chain. It should work also uninstalling them with &lt;span style=&quot;font-weight: bold;&quot;&gt;rpm -e --nodeps&lt;/span&gt;. This leaves &lt;span style=&quot;font-weight: bold;&quot;&gt;redhat-lsb&lt;/span&gt; (which is what the bdii depends on) untouched but I haven&#39;t tried this last method. Here are the commands I executed:&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-size: 100%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;# Get the dpm DB file&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;rm -rf dpm.sql-20111125-v2.gz*&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;wget http://ks.tier2.hep.manchester.ac.uk/T2/tmp/dpm.sql-20111125-v2.gz&lt;/span&gt;  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;# Install mysql5.5&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;rpm -Uvh http://repo.webtatic.com/yum/centos/5/latest.rpm&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;yum -y remove libmysqlclient5 mysql mysql-*&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;yum -y clean all&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;yum -y install mysql55 mysql55-server libmysqlclient5 --enablerepo=webtatic&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;service mysql stop&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;rm -rf /var/lib/mysql/*&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;# Get the local my.cnf&lt;br /&gt;cfagent -vq&lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;service mysqld start&lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;# Install the DPM rpms&lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;yum -y remove cups avahi avahi-compat-libdns_sd avahi-glib &lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;yum -y install glite-SE_dpm_mysql lcg-CA&lt;/span&gt;  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;# Modify sql scripts for mysql5.5&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;cd&lt;br /&gt;/opt/lcg/share/DPM/&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;for a in create_dp*.sql; do sed -i.old &#39;s/TYPE/ENGINE/g&#39; $a;done&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;grep ENGINE *&lt;/span&gt;  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;# Run YAIM and upload old DB &lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;cd&lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql&lt;/span&gt;  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;mysql -u root -p -C &amp;lt; /root/dpm.sql-20111125-v2.gz &lt;/span&gt;  &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;&lt;br /&gt;# NECESSARY FOR THE FINAL UPDATES&lt;/span&gt; &lt;span style=&quot;font-weight: bold;&quot;&gt;&lt;br /&gt;/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql&lt;/span&gt;  &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* You will need to install the dpm-contrib-admintool rm because it is not in the glite repository it might be in the EMI one. Last time I heard it made it to ETICS. If you can&#39;t find it there&#39;s still the &lt;a href=&quot;http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/&quot;&gt;sysadmin repo version&lt;/a&gt; and related notes on the &lt;a href=&quot;https://www.gridpp.ac.uk/wiki/DPM-admin-tools#GridPP_DPM_administration_toolkit&quot;&gt;GridPP wiki&lt;/a&gt; (Sam or Wahid welcome to leave an update on this one).&lt;br /&gt;
&lt;br /&gt;
* To upgrade the disk servers I just updated the repository, upgraded the rpms and rerun yaim.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/755506275777766572/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/755506275777766572' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/755506275777766572'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/755506275777766572'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/11/dpm-upgrade-174-182-glite-32.html' title='DPM upgrade 1.7.4 -&gt; 1.8.2 (glite 3.2)'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-779340858148694419</id><published>2011-09-28T14:40:00.006+00:00</published><updated>2011-09-28T14:50:46.547+00:00</updated><title type='text'>10 Years Of GridPP: I was there. And you?</title><content type='html'>&lt;a href=&quot;http://www.gridpp.ac.uk/gridpp1/&quot; title=&quot;GridPP 1&quot;&gt;&lt;img src=&quot;http://www.gridpp.ac.uk/pics/gridpp-group.jpg&quot; width=&quot;500&quot; height=&quot;375&quot; alt=&quot;Door To Madness&quot;&gt;&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/779340858148694419/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/779340858148694419' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/779340858148694419'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/779340858148694419'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/09/10-years-of-gridpp-i-was-there-and-you.html' title='10 Years Of GridPP: I was there. And you?'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-2305746187189553152</id><published>2011-09-09T09:21:00.014+00:00</published><updated>2012-07-20T11:43:34.263+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cvmfs"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><category scheme="http://www.blogger.com/atom/ns#" term="upgrade"/><title type='text'>cvmfs upgrade to 2.0.3</title><content type='html'>Last week I upgraded the cvmfs on all the WN to cvmfs-2.0.3. The upgrade for us required two steps.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;1) change of repository:&lt;/span&gt; since Manchester was the first to use the new atlas setup we were pointing to CERN repository. The new setup has now become standard so I just had to remove the override variable CVMFS_SERVER_URL from atlas.cern.ch.local. The file is distributed by cfengine so I just changed it in cvs. &lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;2) rpms upgrade:&lt;/span&gt; I had some initial difficulties because I was following the instructions for atlas T3 - which normally work also for T2 - that suggested to install &lt;span style=&quot;font-weight: bold;&quot;&gt;cvmfs-auto-setup&lt;/span&gt; rpm. This rpm runs &lt;span style=&quot;font-weight: bold;&quot;&gt;service cvmfs restartautofs&lt;/span&gt; and in the instructions it was suggested also to rerun it manually. This on busy machines causes the repositories to disappear and requires a &lt;span style=&quot;font-weight: bold;&quot;&gt;service cvmfs restartclean&lt;/span&gt; which wipes the cache off and is not really recommended in production. In reality none of this is really necessary and a simple&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;yum -y update cvmfs cvmfs-init-scripts&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
is sufficient. I could add the rpms version in cfengine and that was enough. The change from one version to another happens at the first unmount. Forcing this with a restartautofs is counterproductive (thanks to Ian for pointing this out).  &lt;br /&gt;
&lt;br /&gt;
Next week there should be a bug fix version that will take care of slow mount and some slow client tools routines on busy machines. &lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://savannah.cern.ch/bugs/?86349&quot;&gt;http://savannah.cern.ch/bugs/?86349&lt;/a&gt;&lt;br /&gt;
But since the upgrade procedure is so easy and the corrupted files problem &lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://savannah.cern.ch/support/?122564&quot;&gt;http://savannah.cern.ch/support/?122564&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
is fixed in cvmfs &amp;gt;2.0.2 I decided to upgrade anyway on Wednesday to avoid further errors in atlas and possibly lhcb.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;NOTE:&lt;/span&gt; Of course I tested each step on few nodes to check everything worked before rolling out with cfengine on all nodes. Always a good practice not to follow recipes blindly!</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/2305746187189553152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/2305746187189553152' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2305746187189553152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2305746187189553152'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/09/cvmfs-upgrade-to-203.html' title='cvmfs upgrade to 2.0.3'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-2331111094368022698</id><published>2011-07-06T15:52:00.045+00:00</published><updated>2012-07-20T11:44:02.272+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cvmfs"/><category scheme="http://www.blogger.com/atom/ns#" term="installation"/><category scheme="http://www.blogger.com/atom/ns#" term="Manchester"/><title type='text'>cvmfs installation</title><content type='html'>Last week after few months delay I finally installed cvmfs. It&#39;s since 2002-2003 that I advocate the &lt;a href=&quot;http://www.slac.stanford.edu/econf/C0303241/proc/papers/MOAT011.PDF&quot;&gt;use of a shared file system&lt;/a&gt; for the input sandbox with locally cached data. AFS was successfully used in grid and non grid environment by BaBar users and is still used by local non-LHC users in Manchester for small work. So I&#39;m pretty happy that a light weight caching file system is now available for more robust traffic. This is a really good moment to install cvmfs for two reasons:&lt;br /&gt;
&lt;br /&gt;
1) Lhcb asked for it too.&lt;br /&gt;
2) Atlas has moved its condb files from the HOTDISK space token to cvmfs. &lt;br /&gt;
&lt;br /&gt;
And it should reduce drastically errors for both NFS and SE load. &lt;br /&gt;
&lt;br /&gt;
These are my installation notes:&lt;br /&gt;
&lt;br /&gt;
* Install cernvm.repo: you can find it &lt;a href=&quot;http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo&quot;&gt;here&lt;/a&gt; or you can copy the rpms in your local and install from there. I distribute the file with cfengine but otherwise&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;cd /etc/yum.repos.d/&lt;br /&gt;wget http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Install the gpg key: yum didn&#39;t like the key and was giving errors. I don&#39;t know if the problem is only mine (possible) I anyway told the developers and in the meantime I had to remove the key check from the repo file and trust the rpms. But if you want to try it, it might work for you:&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;cd  /etc/pki/rpm-gpg/&lt;br /&gt;wget http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Install the rpms. In the documents there is an additional rpm cvmfs-auto-setup which is not really necessary and was also causing problems due to some migration lines devised for upgrades. Other than that it runs a setup and a restart command that can be run by your configuration tool of choice. S. Traylen also suggested to install SL_no_colorls to avoid ls /cvmfs mounting all the file systems that&#39;s why it&#39;s in the list.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;yum install -y fuse cvmfs−keys cvmfs cvmfs−init−scripts SL_no_colorls&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Install configuration files. Below is what I added. For atlas there is in the docs a nightlies repository but that&#39;s not ready yet and isn&#39;t going to work. The default QUOTA_LIMIT set in default.local can be overridden in the experiment configuration. For each of this files there is a &lt;span style=&quot;font-weight: bold;&quot;&gt;.conf&lt;/span&gt; file and a &lt;span style=&quot;font-weight: bold;&quot;&gt;.local&lt;/span&gt; you should edit only &lt;span style=&quot;font-weight: bold;&quot;&gt;.local&lt;/span&gt;. If they are not there just create them.&lt;br /&gt;
You need to override the CVMFS_SERVER_URL for atlas otherwise you don&#39;t get the new setup. While in cern.ch.local I simply inverted the order of the server to get RAL first and then the other two if RAL fails. I also removed CERNVM_SERVER_URL which appears in cern.ch.conf otherwise it goes to CERN first even though it&#39;s not apparently defined anywhere.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;/etc/cvmfs/default.local &lt;br /&gt;CVMFS_REPOSITORIES=atlas,atlas-condb,lhcb&lt;br /&gt;CVMFS_CACHE_BASE=/scratch/var/cache/cvmfs2&lt;br /&gt;CVMFS_QUOTA_LIMIT=2000&lt;br /&gt;CVMFS_HTTP_PROXY=&quot;http://[YOUR-SQUID-CACHE]:3128&quot;&lt;br /&gt;&lt;br /&gt;/etc/cvmfs/config.d/atlas.cern.ch.local &lt;br /&gt;CVMFS_QUOTA_LIMIT=10000&lt;br /&gt;CVMFS_SERVER_URL=http://cvmfs-stratum-one.cern.ch/opt/atlas-newns&lt;br /&gt;&lt;br /&gt;/etc/cvmfs/config.d/lhcb.cern.ch.local &lt;br /&gt;CVMFS_QUOTA_LIMIT=5000&lt;br /&gt;&lt;br /&gt;/etc/cvmfs/domain.d/cern.ch.local&lt;br /&gt;CVMFS_SERVER_URL=&quot;http://cernvmfs.gridpp.rl.ac.uk/opt/@org@;http://cvmfs-stratum-one.cern.ch/opt/@org@;http://cvmfs.racf.bnl.gov/opt/@org@&quot;&lt;br /&gt;CVMFS_PUBLIC_KEY=/etc/cvmfs/keys/cern.ch.pub&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Create the cache space. By default it&#39;s in /var/cache. However I moved it to the /scratch partition which is bigger.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;mkdir -p /scratch/var/cache/cvmfs2&lt;br /&gt;chown cvmfs:cvmfs /scratch/var/cache/cvmfs2&lt;br /&gt;chmod 2755 /scratch/var/cache/cvmfs2 &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Run the setup. These are the commands the cvmfs-auto-setup would run at installation time. They also configure fuse although that&#39;s only one line added to fuse.conf.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;/usr/bin/cvmfs_config setup&lt;br /&gt;service cvmfs restartautofs&lt;br /&gt;&lt;br /&gt;chkconfig cvmfs on&lt;br /&gt;service cvmfs restart&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Some parameters need to change for squid. Below is what the documentation suggests. I tuned it to the size of my machine. For example the maximum_object_size and cache_mem were too big and I checked which other parameters were already set to evaluate if it was the case to change them.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;collapsed_forwarding on&lt;br /&gt;max_filedesc 8192&lt;br /&gt;maximum_object_size 4096 MB&lt;br /&gt;cache_mem 4096 MB&lt;br /&gt;maximum_object_size_in_memory 32 KB&lt;br /&gt;cache_dir ufs /var/spool/squid 50000 16 256&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Apply changes for &lt;span style=&quot;font-weight: bold;&quot;&gt;Lhcb&lt;/span&gt; the VO_LHCB_SW_DIR needs to point to cvmfs. You can change it in YAIM and rerun it or you can do as I&#39;ve done (still making sure to change YAIM so that freshly installed nodes don&#39;t need this hack). Lhcb with this change is good to go.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;sed -i.sed.bak &#39;s%/nfs/lhcb%/cvmfs/lhcb.cern.ch%&#39; /etc/profile.d/grid-env.sh&lt;br /&gt;mv /etc/profile.d/grid-env.sh.sed.bak /root&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Apply changes for &lt;span style=&quot;font-weight: bold;&quot;&gt;Atlas&lt;/span&gt;. A similar change to VO_ATLAS_SW_DIR is required and you need to set an additional variable that is not handled by YAIM. For now I added it to grid-env.sh but it be better placed in another file not touched by YAIM or a snippet should be added to YAIM to handle the variable. This is enough for the jobs to start using the software area. However you still have to contact the atlas sw team to do their validation tests and enable the condb use. They&#39;ll propose a long way and a short way. I took the short because I didn&#39;t want to go in downtime and jobs were already running using the new setup. &lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;sed -i.sed.2 &#39;s%&quot;/nfs/atlas&quot;%&quot;/cvmfs/atlas.cern.ch/repo/sw&quot;\ngridenv_set         &quot;ATLAS_LOCAL_AREA&quot; &quot;/nfs/atlas/local&quot;%&#39; /etc/profile.d/grid-env.sh&lt;br /&gt;mv /etc/profile.d/grid-env.sh.sed.bak /root&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
* Always for &lt;span style=&quot;font-weight: bold;&quot;&gt;Atlas&lt;/span&gt; remove some installed &lt;span style=&quot;font-weight: bold;&quot;&gt;.conf&lt;/span&gt; files which install a link in /opt which is not necessary anymore. Second file might not exist, but there is an atlas-nightly.cern.ch.conf. This will surely change in future cvmfs releases.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;service cvmfs stop&lt;br /&gt;rm /etc/cvmfs/config.d/atlas.cern.ch.conf&lt;br /&gt;rm /etc/cvmfs/config.d/atlas-condb.cern.ch.conf&lt;br /&gt;service cvmfs start&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;Update 12/7/2011: Using YAIM&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
cfengine only installs the rpms and the configuration files (*.local). All the rest is now carried out by a YAIM function I created (config_cvmfs). I put a tar file &lt;a href=&quot;http://www.sysadmin.hep.ac.uk/svn/fabric-management/cvmfs/cvmfs-yaim.tar&quot;&gt;here&lt;/a&gt;.To make it work I also  added a node description in node-info.d/cvmfs (also in the tar file) that contains it. In this way I don&#39;t have to touch any already existing YAIM files and I can just add -n CVMFS to the YAIM command line we use to configure the WNs. It requires ATLAS_LOCAL_AREA and CVMFS_CACHE_DIR variables to be set in your site-info.def.&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-weight: bold;&quot;&gt;CVMFS docs are here&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://cernvm.cern.ch/portal/node/126&quot;&gt;Release Notes&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;http://cernvm.cern.ch/portal/node/127&quot;&gt;Init Scripts Overview&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;http://cernvm.cern.ch/portal/node/123&quot;&gt;Examples&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;https://cernvm.cern.ch/project/trac/downloads/cernvm/cvmfstech-0.2.70-1.pdf&quot;&gt;Technical Report&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;http://www.gridpp.ac.uk/wiki/RAL_Tier1_CVMFS&quot;&gt;RAL T1&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;https://twiki.cern.ch/twiki/bin/view/Atlas/Tier3CVMFS2SLC5&quot;&gt;Atlas T2/T3 setup&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;https://twiki.cern.ch/twiki/bin/view/Atlas/CernVMFS#Changes_to_CVMFS_Client_Setup_an&quot;&gt;Atlas latest changes&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/2331111094368022698/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/2331111094368022698' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2331111094368022698'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2331111094368022698'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/07/cvmfs-installation.html' title='cvmfs installation'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-7477550209853345563</id><published>2011-06-22T13:01:00.006+00:00</published><updated>2011-07-11T23:46:46.908+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="apel nagios alert remove"/><title type='text'>How to remove apel warnings and avoid nagios alerts</title><content type='html'>Quite few sites have few entries in APEL that don&#39;t quite match. They can appear with two messages&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;OK [ Minor discrepancy in even numbers ]&lt;br /&gt;WARN [ Missing data detected ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;They don&#39;t look good on the Sync page and nagios also sends alerts for this problem which is even more annoying.&lt;br /&gt;&lt;br /&gt;The problem is caused by few records with the wrong time stamp (StartTime=01-01-1970). These records need to be deleted from the local database and the period were they appear republished with the gap publisher. To delete the records connect to your local APEL mysql and run:&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;mysql&gt; delete from LcgRecords where StartTimeEpoch = 0;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then for each month were the entries appear rerun the gap publisher. And finally rerun the publisher in missing records mode to update the SYNC page or you can wait the next proper run if you are not impatient.&lt;br /&gt;&lt;br /&gt;Thanks to Cristina for this useful tip she gave me in &lt;a href=&quot;https://ggus.eu/ws/ticket_info.php?ticket=70801&quot;&gt;this ticket&lt;/a&gt;.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/7477550209853345563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/7477550209853345563' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/7477550209853345563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/7477550209853345563'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/06/how-to-remove-apel-warnings-and-avoid.html' title='How to remove apel warnings and avoid nagios alerts'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-2793338616336205351</id><published>2011-06-14T15:46:00.024+00:00</published><updated>2011-06-23T10:22:05.104+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="manchester dpm optimization"/><title type='text'>DPM optimization next round</title><content type='html'>After I applied 3 of the mysql parameters changes I talk about in &lt;a href=&quot;http://northgrid-tech.blogspot.com/2011/06/dpm-optimization.html&quot;&gt;this post&lt;/a&gt; I didn&#39;t see the improvement I was hoping with atlas jobs time outs.&lt;br /&gt;&lt;br /&gt;This is another set of optimizations I put together after further search&lt;br /&gt;&lt;br /&gt;First of all I started to systematically count the time TIME_WAIT connections every five minutes. I also correlated them in the same log file to the number of concurrent threads the server keeps mostly in sleep mode. You can get the last bit running &lt;span style=&quot;font-weight:bold;&quot;&gt;mysqladmin -p proc stat&lt;/span&gt; or from within a mysql command line. The number of threads was near to the max allowed default value in mysql so I doubled that in my.cnf&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;max_connections=200&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;then I halved the kernel time out for TIME_WAIT connections &lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;sysctl -w net.ipv4.tcp_fin_timeout=30&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;the default value is 60 sec. If you add it to /etc/sysctl.conf it becomes permanent.&lt;br /&gt;&lt;br /&gt;Finally I found this article which explicitly talks about mysql tunings to reduce connection timeouts: &lt;a href=&quot;http://www.mysqlperformanceblog.com/2011/04/19/mysql-connection-timeouts/&quot;&gt;Mysql Connection Timeouts&lt;/a&gt; and I set the following&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;sysctl -w net.ipv4.tcp_max_syn_backlog=8192&lt;br /&gt;sysctl -w net.core.somaxconn=512&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;again add to /etc/sysctl.conf to make it permanent; and added in my.cnf&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;back_log=500&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I calculated my numbers on 500 connections/s because that&#39;s what I have observed when I did all this (I obeserved even larger numbers). Admittedly now they are stable at 330 connections per second but we haven&#39;t had any heavy ramp up since Saturday. Only a mild one but that didn&#39;t cause any time out. I&#39;m waiting for a serious ramp as definitive test. Said that since Saturday we haven&#39;t seen any timeout errors not even the low background that was always present. So there is already an improvement.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Update 16/06/2011&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Today there was an atlas &lt;a href=&quot;http://ks.tier2.hep.manchester.ac.uk/T2/atlas/20110616-atlas-jobs.png&quot;&gt;ramp from almost 0 to &gt;1400 jobs&lt;/a&gt; and no time outs so far.&lt;br /&gt;&lt;br /&gt;Few timeouts were seen yesterday but they were due to authentication between the head node and a couple of data servers which I will have to investigate but they are a handful, nowhere near the scale observed before and not due to mysql. I will still keep things under observation for a while longer. Just in case.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/2793338616336205351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/2793338616336205351' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2793338616336205351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/2793338616336205351'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/06/dpm-optimization-next-round.html' title='DPM optimization next round'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-9170509917256537238</id><published>2011-06-10T07:32:00.032+00:00</published><updated>2011-06-11T16:51:57.645+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="manchester dpm optimization"/><title type='text'>DPM Optimization</title><content type='html'>My quest to optimize DPM  continues. Bottlenecks are like Russian dolls and hide behind each other. After optimizing the data servers increasing the &lt;a href=&quot;http://northgrid-tech.blogspot.com/2010/08/tuning-areca-raid-controllers-for-xfs.html&quot;&gt;block device read ahead&lt;/a&gt;; &lt;a href=&quot;www.gridpp.ac.uk/gridpp24/ChannelBonding.pdf&quot;&gt;enabling lacp&lt;/a&gt; on network channel bonding and multiplying the atlas hotdisk files there is still a problem with mysql on the head node which causes time outs.&lt;br /&gt;&lt;br /&gt;When atlas ramps up there is often a increase of connection in TIME_WAIT. I observed &gt;2600 at times. The mysql database becomes completely unresponsive and causes the time outs. Restarting the database causes the connections to finally close and the database to resume normal activity. Although a restart might alleviate the problem as usual it&#39;s not a cure. So I went on a quest. What follows might not alleviate my specific problem, I haven&#39;t tested in production yet, but it certainly helps with another: DB reload. &lt;br /&gt;&lt;br /&gt;Sam already wrote some performance tuning tips here: &lt;a href=&quot;http://www.gridpp.ac.uk/wiki/Performance_and_Tuning&quot;&gt;Performance and Tuning&lt;/a&gt; most notably the setting of &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_buffer_pool_size&lt;/span&gt;. After a discussion on the DPM user forum and some testing this is what I&#39;d add:&lt;br /&gt;&lt;br /&gt;I set &quot;DPM     REQCLEAN        3m&quot; when I upgraded to DPM 1.7.4 and this, after a reload, has reduced Manchester DB file size from 17GB to 7.6GB. Dumping the db took 7m34s. I then reloaded it with different combinations of suggested my.cnf &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html&quot;&gt;innodb parameters&lt;/a&gt; and the effects of some of them are dramatic.&lt;br /&gt;&lt;br /&gt;The default parameters should definitely be avoided. Reloading a database with the default parameters takes several hours. Last time it took 17-18 hours, this time I interrupted after 4.&lt;br /&gt;&lt;br /&gt;With a combination of the parameters suggested by Maarten the time is drastically reduced. In particular the most effective have been setting &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_buffer_pool_size&lt;/span&gt; and &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_log_file_size&lt;/span&gt;. Below are the results of the upload tests I made in decreasing order of time. I then followed Jean Philippe suggestion to drop the requests tables. Dropping the tables took several minutes and it was slightly faster with a single db file. After I dropped the tables and the indexes ibdata1 size dropped to 1.2GB and using combination 4 below it took &lt;span style=&quot;font-weight:bold;&quot;&gt;1m7s to dump and 5m7s to reload&lt;/span&gt;. With one file per table configuration reloading was slightly faster but after I dropped the requests tables there was no difference and it is also balanced by the fact that deletion seems slower and the effects are probably more visible when the database is bigger so these small tests don&#39;t give any compelling reason in favour nor against for now.&lt;br /&gt;&lt;br /&gt;This are steps that help reducing the time it takes to reload the database:&lt;br /&gt;&lt;br /&gt;1) Enable &lt;span style=&quot;font-style:italic;&quot;&gt;REQCLEAN&lt;/span&gt; in shift.conf (I set it to 3 months to comply with security requirements.)&lt;br /&gt;2) set &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_buffer_pool_size&lt;/span&gt; in my.cnf (I set it at 10% of the machine memory and I couldn&#39;t see much difference eventually when I set it to 22.5% but in production it might be another story with repeated queries for the same input files)&lt;br /&gt;3) set &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_log_file_size&lt;/span&gt;  in my.cnf (didn&#39;t give much thought to this, Maarten value of 50MB seemed good enough. &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/binary-log.html&quot;&gt;Binary log files&lt;/a&gt; need to be removed to enable this and the database restarted but check the docs this might not be a valid strategy if you make heavier use of the binary logs.)&lt;br /&gt;4) set &lt;span style=&quot;font-style:italic;&quot;&gt;innodb_flush_log_at_trx_commit = 2&lt;/span&gt; in my.cnf (although this parameter seems less effective during reload it might be useful in production 2 is slightly safer than 0).&lt;br /&gt;5) Use the &lt;a href=&quot;http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-drop-requests-tables.sql&quot;&gt;script&lt;/a&gt; Jean-Philippe gave me to drop the requests tables before an upgrade.&lt;br /&gt;&lt;br /&gt;Hopefully they will help stop also the time outs.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Tests:&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-style:italic;&quot;&gt;&lt;br /&gt;COMBINATION 1&lt;br /&gt;&lt;br /&gt;innodb_buffer_pool_size = 400MB&lt;br /&gt;# innodb_log_file_size = 50MB&lt;br /&gt;innodb_flush_log_at_trx_commit = 2&lt;br /&gt;# innodb_file_per_table&lt;br /&gt;&lt;br /&gt;real    167m30.226s&lt;br /&gt;user    1m41.860s&lt;br /&gt;sys    0m9.987s&lt;br /&gt;&lt;br /&gt;============================&lt;br /&gt;COMBINATION 2&lt;br /&gt;innodb_buffer_pool_size = 900MB&lt;br /&gt;# innodb_log_file_size = 50MB&lt;br /&gt;# innodb_flush_log_at_trx_commit = 2&lt;br /&gt;# innodb_file_per_table&lt;br /&gt;   &lt;br /&gt;real    155m2.996s&lt;br /&gt;user    1m40.843s&lt;br /&gt;sys    0m9.935s&lt;br /&gt;&lt;br /&gt;===========================&lt;br /&gt;COMBINATION 3&lt;br /&gt;innodb_buffer_pool_size = 900MB&lt;br /&gt;innodb_log_file_size = 50MB&lt;br /&gt;# innodb_flush_log_at_trx_commit = 2&lt;br /&gt;# innodb_file_per_table&lt;br /&gt;&lt;br /&gt;real    49m2.683s&lt;br /&gt;user    1m39.137s&lt;br /&gt;sys    0m9.902s&lt;br /&gt;===========================&lt;br /&gt;COMBINATION 4&lt;br /&gt;innodb_buffer_pool_size = 400MB&lt;br /&gt;innodb_log_file_size = 50MB&lt;br /&gt;innodb_flush_log_at_trx_commit = 2 &lt;-- test also with 0 instead of 2 but it didn&#39;t change the time it took and 2 is slightly safer&lt;br /&gt;# innodb_file_per_table&lt;br /&gt;&lt;br /&gt;real    48m32.398s&lt;br /&gt;user    1m40.638s&lt;br /&gt;sys    0m9.733s&lt;br /&gt;===========================&lt;br /&gt;COMBINATION 5&lt;br /&gt;innodb_buffer_pool_size = 900MB&lt;br /&gt;innodb_log_file_size = 50MB&lt;br /&gt;innodb_flush_log_at_trx_commit = 2&lt;br /&gt;innodb_file_per_table&lt;br /&gt;&lt;br /&gt;real    47m25.109s&lt;br /&gt;user    1m39.230s&lt;br /&gt;sys    0m9.985s&lt;br /&gt;===========================&lt;br /&gt;COMBINATION 6&lt;br /&gt;innodb_buffer_pool_size = 400MB&lt;br /&gt;innodb_log_file_size = 50MB&lt;br /&gt;innodb_flush_log_at_trx_commit = 2&lt;br /&gt;innodb_file_per_table&lt;br /&gt;&lt;br /&gt;real    46m46.850s&lt;br /&gt;user    1m40.378s&lt;br /&gt;sys    0m9.950s&lt;br /&gt;===========================&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/9170509917256537238/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/9170509917256537238' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/9170509917256537238'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/9170509917256537238'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/06/dpm-optimization.html' title='DPM Optimization'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4670756400590062347.post-3355555367352141561</id><published>2011-05-20T07:02:00.014+00:00</published><updated>2011-05-20T08:05:20.684+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="manchester bdii"/><title type='text'>BDII again</title><content type='html'>A couple of weeks ago I upgraded the site BDII and top BDII from a very old version without reinstalling as described &lt;a href=&quot;http://northgrid-tech.blogspot.com/2011/05/bdii-follow-up.html&quot;&gt;in this post&lt;/a&gt;. Few days ago I noticed that not all was working as well as I thought and the BDII was reporting stale numbers in the dynamic attributes causing few problems among which biomed submitting an unhealthy 12k jobs. &lt;br /&gt;&lt;br /&gt;There were two reasons for this:&lt;br /&gt;&lt;br /&gt;1) the unprivileged user that runs the BDII is edguser anymore but ldap. Consequently there were some ownership issues in /opt/glite/var subdirectories and files. This was highlighted in &lt;span style=&quot;font-weight:bold;&quot;&gt;/var/log/bdii/bdii-update.log&lt;/span&gt; by permission denied errors which I overlooked for a bit too long. Permissions should be as follow: &lt;span style=&quot;font-weight:bold;&quot;&gt;/opt/glite/var /opt/glite/var/lock, /opt/glite/var/tmp and /opt/glite/var/cache&lt;/span&gt; should belong to root and anything below them should belong to ldap. You can check if there is anything that doesn&#39;t belong to ldap running &lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;find /opt/glite/var/ ! -user ldap -ls&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;this will include the top directories above which you can ignore.&lt;br /&gt;&lt;br /&gt;2) bdii-update doesn&#39;t use anymore glite-info-wrapper and glite-info-generic which used to write the .ldif files in the same directory tree above. It now writes what it needs in /var/run/bdii databases and one unique file new.ldif file calling directly the scripts in &lt;span style=&quot;font-weight:bold;&quot;&gt;/opt/glite/etc/gip/provider&lt;/span&gt; and &lt;span style=&quot;font-weight:bold;&quot;&gt;/opt/glite/etc/gip/plugin&lt;/span&gt;. I upgraded from an older version and the old providers weren&#39;t deleted but continued to be executed by bdii-update. Some of them still read what now are obsolete .ldif.&lt;chksum&gt; files under &lt;span style=&quot;font-weight:bold;&quot;&gt;/opt/glite/var/cache&lt;/span&gt; tree. I deleted all the .ldif files with an additional numeric extension under /opt/glite/var.&lt;br /&gt;&lt;br /&gt;With these two changes, i.e. fixing the ownership of the directories and deleting osolete .ldif files (or the old providers if one is sure of which ones) the site bdii restarted to update correctly the dynamic attributes.&lt;br /&gt;&lt;br /&gt;Finally a note on making it easier to reinstall: in the previous post I suggested to add manually &lt;span style=&quot;font-weight:bold;&quot;&gt;SLAPD=/usr/sbin/slapd2.4&lt;/span&gt; to change slapd version to the newly installed &lt;span style=&quot;font-weight:bold;&quot;&gt;/opt/bdii/etc/bdii.conf&lt;/span&gt;. However an easier way to maintain the service in case it needs reinstallation is to add &lt;span style=&quot;font-weight:bold;&quot;&gt;SLAPD=/usr/sbin/slapd2.4&lt;/span&gt; to site-info.def so that when YAIM runs it gets added to /etc/sysconfig/bdii and doesn&#39;t need a manual step is the machine is reinstalled.</content><link rel='replies' type='application/atom+xml' href='http://northgrid-tech.blogspot.com/feeds/3355555367352141561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4670756400590062347/3355555367352141561' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3355555367352141561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4670756400590062347/posts/default/3355555367352141561'/><link rel='alternate' type='text/html' href='http://northgrid-tech.blogspot.com/2011/05/bdii-again.html' title='BDII again'/><author><name>Alessandra Forti</name><uri>http://www.blogger.com/profile/11973932320387024088</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixa3uri_WwBKCE9VA3Jkk5eYnU8Q0qRt1GZUDYb_II2qIinPuYneDd0KIYVZsFdVtGh_oetnM7FDJL3ZVasCAvFNwRgPc5PG9mvAtddwwHBGC5YcjN_IaGDn_g3IURFg/s220/patyten_seaOttersSwim.jpg'/></author><thr:total>0</thr:total></entry></feed>