<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4517963319103426980</id><updated>2025-08-08T20:09:42.826+01:00</updated><category term="RHUL"/><category term="cloud"/><category term="cluster"/><category term="newton"/><category term="Alces"/><category term="Benchmark"/><category term="C6100"/><category term="CENTO6"/><category term="CloudStack"/><category term="Clustervision"/><category term="Dell"/><category term="GridPP"/><category term="IC"/><category term="KVM"/><category term="OpenNebula"/><category term="OpenStack"/><category term="R510"/><category term="R710"/><category term="RHEL6"/><category term="SL6"/><category term="Serial Consoles"/><category term="atlas"/><category term="cabling"/><category term="cern"/><category term="hpc"/><category term="htc"/><category term="ipmi"/><category term="lustre"/><category term="move"/><category term="opennms"/><category term="puppet"/><category term="qmul"/><category term="racks"/><category term="razor"/><category term="virtualization"/><title type='text'>LondonGrid</title><subtitle type='html'>LondonGrid is a regional Tier 2 of GridPP, distributed between the Universities of Queen Mary, Imperial College, Royal Holloway, Brunel and UCL.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default?start-index=26&amp;max-results=25'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/06192917778257317141</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>38</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-8086588585167758452</id><published>2014-10-08T10:20:00.000+01:00</published><updated>2014-10-08T10:20:56.997+01:00</updated><title type='text'>XrootD and ARGUS authentication</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
A couple of months ago, I&amp;nbsp; set up a test machine running &lt;a href=&quot;http://xrootd.org/&quot;&gt;XrootD&lt;/a&gt; version 4&amp;nbsp; at QMUL. This was to test three things:&lt;br /&gt;
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;IPv6 (see &lt;a href=&quot;http://gridpp-storage.blogspot.co.uk/2014/07/ipv6-and-xrootd-4.html&quot;&gt;blog post&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;Central authorisation via ARGUS (the subject of this blog post).&lt;/li&gt;
&lt;li&gt;XrootD 4 &lt;/li&gt;
&lt;/ol&gt;
We&amp;nbsp; run StoRM/Lustre on our grid storage, and have run an XrootD server for some time as part of the&amp;nbsp; ATLAS federated storage system, FAX. This&amp;nbsp; allows local (and non local) ATLAS users interactive access, via the xrootd protocol, to files on our grid storage. &lt;br /&gt;
&lt;br /&gt;
For the new machine, I started by following ATLAS&#39;s &lt;a href=&quot;https://twiki.cern.ch/twiki/bin/view/AtlasComputing/FAXposixStorageNew&quot;&gt;Fax for Posix storage sites&lt;/a&gt; instructions. These instructions document how to use VOMS authentication, but not central banning via ARGUS. CMS do however have some &lt;a href=&quot;https://twiki.cern.ch/twiki/bin/view/Main/PosixXrootd&quot;&gt;instructions&lt;/a&gt; on using xrootd-lcmaps to do the authorisation - though with RPMs from different (and therefore potentially incompatible) repositories. It is, however, possible to get them to work. &lt;br /&gt;
&lt;br /&gt;
The following packages are needed (or at least what I have installed): &lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; yum install xrootd4-server-atlas-n2n-plugin&lt;br /&gt;
&amp;nbsp; yum install argus-pep-api-c&amp;nbsp; yum install lcmaps-plugins-c-pep&lt;br /&gt;
&amp;nbsp; yum install lcmaps-plugins-verify-proxy&lt;br /&gt;
&amp;nbsp; yum install lcmaps-plugins-tracking-groupid&lt;br /&gt;
&amp;nbsp; yum install yum install xerces-c&lt;br /&gt;
&amp;nbsp; yum install lcmaps-plugins-basic&lt;br /&gt;
&lt;br /&gt;
Now the packages are installed, xrootd needs to be configured to use them - the appropriate lines in /etc/xrootd/xrootd-clustered.cfg are: &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;xrootd.seclib /usr/lib64/libXrdSec.so&lt;br /&gt;
&amp;nbsp;xrootd.fslib /usr/lib64/libXrdOfs.so&lt;br /&gt;
&amp;nbsp;sec.protocol /usr/lib64 gsi -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/xrd/xrdcert.pem -key:/etc/grid-security/xrd/xrdkey.pem -crl:3 -authzfun:libXrdLcmaps.so -authzfunparms:--osg,--lcmapscfg,/etc/xrootd/lcmaps.cfg,--loglevel,5|useglobals -gmapopt:10 -gmapto:0&lt;br /&gt;
&amp;nbsp;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;acc.authdb /etc/xrootd/auth_file&lt;br /&gt;
&amp;nbsp;acc.authrefresh 60&lt;br /&gt;
&amp;nbsp;ofs.authorize 1&lt;br /&gt;
&lt;br /&gt;
And in /etc/xrootd/lcmaps.cfg it is necessary to change path and argus server (my argus server is obscured in the example below). My config file looks looks like:&lt;br /&gt;
&lt;br /&gt;
################################&lt;br /&gt;
&lt;br /&gt;
# where to look for modules&lt;br /&gt;
#path = /usr/lib64/modules&lt;br /&gt;
path = /usr/lib64/lcmaps&lt;br /&gt;
&lt;br /&gt;
good = &quot;lcmaps_dummy_good.mod&quot;&lt;br /&gt;
bad&amp;nbsp; = &quot;lcmaps_dummy_bad.mod&quot;&lt;br /&gt;
# Note put your own argus host instead of for argushost.mydomain&lt;br /&gt;
pepc&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = &quot;lcmaps_c_pep.mod&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot;--pep-daemon-endpoint-url https://argushost.mydomain:8154/authz&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --resourceid http://esc.qmul.ac.uk/xrootd&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --actionid http://glite.org/xacml/action/execute&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --capath /etc/grid-security/certificates/&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --no-check-certificates&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --certificate /etc/grid-security/xrd/xrdcert.pem&quot;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &quot; --key /etc/grid-security/xrd/xrdkey.pem&quot;&lt;br /&gt;
&lt;br /&gt;
xrootd_policy:&lt;br /&gt;
pepc -&amp;gt; good | bad&lt;br /&gt;
################################################&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then after restarting xrootd, you just need to test that it works.&lt;br /&gt;
&lt;br /&gt;
It seems to work, I was successfully able to ban myself. Unbanning didn&#39;t work instantly, and I resorted to restarting xrootd - though perhaps if I&#39;d had patience, it would have worked eventually.&lt;br /&gt;
&lt;br /&gt;
Overall, whilst it wasn&#39;t trivial to do, it&#39;s not actually that hard, and is one more step along the road to having central banning working on all our grid services. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/8086588585167758452/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/8086588585167758452' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/8086588585167758452'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/8086588585167758452'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2014/10/xrootd-and-argus-authentication.html' title='XrootD and ARGUS authentication'/><author><name>Christopher J. Walker</name><uri>http://www.blogger.com/profile/04786714703492357617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-3857394689671524534</id><published>2013-06-04T15:47:00.002+01:00</published><updated>2013-06-04T15:49:40.328+01:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="C6100"/><category scheme="http://www.blogger.com/atom/ns#" term="CENTO6"/><category scheme="http://www.blogger.com/atom/ns#" term="ipmi"/><category scheme="http://www.blogger.com/atom/ns#" term="R510"/><category scheme="http://www.blogger.com/atom/ns#" term="R710"/><category scheme="http://www.blogger.com/atom/ns#" term="RHEL6"/><category scheme="http://www.blogger.com/atom/ns#" term="Serial Consoles"/><category scheme="http://www.blogger.com/atom/ns#" term="SL6"/><title type='text'>Serial Consoles over ipmi</title><content type='html'>To get Serial Consoles over ipmi working properly with Scientific Linux 6.4 (aka RHEL 6.4 / centos 6.4) I had to modify several setting both in the BIOS and in the OS.&lt;br /&gt;
&lt;h3&gt;
Hardware Configuration&lt;/h3&gt;
For Dell C6100 I set these setting in the BIOS&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;Remote Access = Enabled&lt;br /&gt;Serial Port Number = COM2&lt;br /&gt;Serial Port Mode = 115200 8,n,1&lt;br /&gt;Flow Control = None&lt;br /&gt;Redirection After BIOS POST = Always&lt;br /&gt;Terminal Type = VT100&lt;br /&gt;VT-UTF8 Combo Key Support = Enabled&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Note: &quot;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;Redirection After Boot = Disabled&lt;/span&gt;&quot; is required otherwise I get a 5 minute timeout before booting the kernel. Unfortunately with this set up you get a gap in output while the server attempts to pxeboot. However, you can interact with the BIOS and once Grub starts you will see and be able to interact with the grub and Linux boot processes.&lt;br /&gt;
&lt;br /&gt;
For Dell R510/710 I set these setting in the BIOS&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;Serial Communication = On with Console Redirection via COM2&lt;br /&gt;Serial Port Address = Serial Device1=COM1,Serial Device2=COM2&lt;br /&gt;External Serial Connector = Serial Device1&lt;br /&gt;Failsafe Baud Rate = 115200&lt;br /&gt;Remote Terminal Type = VT100/VT220&lt;br /&gt;Redirection After Boot = Disabled&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Note: With these settings you will be unable to see the progress of the kickstart install on the non default console.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
&lt;b&gt;&lt;span style=&quot;font-weight: normal;&quot;&gt;Grub configuration&lt;/span&gt;&lt;/b&gt;&lt;/h3&gt;
In grub.conf you should have these two lines (they were there by default in my installs).&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;serial --unit=1 --speed=115200&lt;br /&gt;terminal --timeout=5 serial console&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
This allows you access grub via the consoles. The &quot;serial&quot; (ipmi) terminal will be default unless you press a key when asked during the boot process. This is only for grub and not for the rest of the linux boot process&lt;br /&gt;
&lt;h3&gt;
SL6 Configuration&lt;/h3&gt;
The last console specified in the linux kernel boot options is taken to be the default console. However, if the same console is specified twice this can cause issues (e.g. when entering a password the characters are shown on the screen!)&lt;br /&gt;
&lt;br /&gt;
For the initial kickstart pxe boot I append &quot;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;console=tty1 console=ttyS1,115200&lt;/span&gt;&quot; to the linux kernel arguments. Here the serial console over ipmi will be the default during the install process, while the other console should echo the output of the ipmi console.&lt;br /&gt;
&lt;br /&gt;
After install the kernel argument &quot;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;console=ttyS1,115200&lt;/span&gt;&quot; was already added to the kernel boot arguments. I have additionally added &quot;&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;console=tty1&lt;/span&gt;&quot; before this, this may be required to enable interaction with the server via a directly connected terminal if needed.&lt;br /&gt;
&lt;br /&gt;
With the ipmi port set as default (last console specified in the kernel arguments) SL6 will automatically start a getty for ttyS1. If it was not the default console we would have to add a upstart config file in /etc/init/. Note SL6 uses upstart, previous SL5 console configurations in /etc/inittab are ignored!&lt;br /&gt;
&lt;br /&gt;
e.g. &lt;span style=&quot;font-family: inherit;&quot;&gt;ttyS1.conf&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;&quot;&gt;start on stopping rc runlevel [345]&lt;br /&gt;stop on starting runlevel [S016]&lt;br /&gt;&lt;br /&gt;respawn&lt;br /&gt;exec /sbin/agetty /dev/ttyS1 115200 vt100&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/3857394689671524534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/3857394689671524534' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3857394689671524534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3857394689671524534'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2013/06/serial-consoles-over-ipmi.html' title='Serial Consoles over ipmi'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/06832242228255061276</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-377205002143228247</id><published>2013-04-21T22:43:00.000+01:00</published><updated>2013-04-21T22:43:13.388+01:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cabling"/><category scheme="http://www.blogger.com/atom/ns#" term="racks"/><title type='text'>The art of cabling </title><content type='html'>&lt;br /&gt;
The challenge of organising your cables behind your TV&amp;nbsp;is nothing compared to&amp;nbsp;that of a large computing cluster.&lt;br /&gt;
&lt;br /&gt;
One of our standard racks contains 12 Dell R510s servers (for storage) and&amp;nbsp;6 Dell C6100 chases (providing 24 compute nodes) all 36 nodes are connected&amp;nbsp;with a 10 Gb&amp;nbsp;(SFP+), 1 Gb&amp;nbsp;(backup) and 100 Mb&amp;nbsp;(for IPMI) network cable. Connecting to 3 different network switches at the top of the rack. In addition&amp;nbsp;the 18 &quot;boxes&quot; need a total of 36 power connections. A total of 144 cables per&amp;nbsp;rack!&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiP3nQmR75gm-awyG5aUmPiea3AIdGuUvLjl00UsTRpVI9gdirc4E8yfyA06LBwJI1wQg_IGhcZOno7mkhc4RsxUKxz7loO-cSTSdPUpT7nkser4EEsHhLbuTRTOTmRyMzR5URreG8g9os/s1600/photo.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;320&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiP3nQmR75gm-awyG5aUmPiea3AIdGuUvLjl00UsTRpVI9gdirc4E8yfyA06LBwJI1wQg_IGhcZOno7mkhc4RsxUKxz7loO-cSTSdPUpT7nkser4EEsHhLbuTRTOTmRyMzR5URreG8g9os/s320/photo.jpg&quot; width=&quot;218&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: left;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
How to cope? Separate the network cables from the power cables, a possible source of noise.&amp;nbsp;Use different colour cables for the different traffic and add unique id number for each cable.&amp;nbsp;Use lose, removable cable ties.&amp;nbsp;When a cable brakes don&#39;t remove it, just add a new cable.&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
The 10 Gb switches, in our case Dell S4810s, connect using 4 40Gb QSFP+&amp;nbsp;cables to two Dell Z9000 core switches. Having two core switches allows us to&amp;nbsp;take one unit out of service without downtime (we use the VLT protocol and it&amp;nbsp;works!). However this does add cable complexity.&amp;nbsp;The backup 1 gig switches connect to each other in a daisy chain using 10&amp;nbsp;Gb cx4 cables, left over from before our 10 Gb&amp;nbsp;upgrade.&amp;nbsp;Finally the ipmi switches connect to a front-end switch using 1GBaseT cables.&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGzbtfwgdGxkh-UH-AQyLMQ0qn4Ih5w5rOdI3SIXsZfLfWMNnYG55uiRGxezF_3VB-dy2SwALd3znpSpYGkRc9IiXbQd8DX4Fwnga8ApTo8RZuLNBZjDytlf_Q1eaMNtqXwgN8MaH5eGI/s1600/photo.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;276&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGzbtfwgdGxkh-UH-AQyLMQ0qn4Ih5w5rOdI3SIXsZfLfWMNnYG55uiRGxezF_3VB-dy2SwALd3znpSpYGkRc9IiXbQd8DX4Fwnga8ApTo8RZuLNBZjDytlf_Q1eaMNtqXwgN8MaH5eGI/s320/photo.jpg&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both;&quot;&gt;
The picture shows the inter-switch links. Visible are the orange 40Gb connections and blue 10Gb cx4 cables. In addition each 40 Gb&amp;nbsp;cable&amp;nbsp;has an ID indicating which rack it came from and which core switch its&amp;nbsp;going too.&lt;/div&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: left;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ30oy8VHJgR2jsxgLax4QOYyiaXDPWcsx0Z3JtKvNe5Q7iNfJehdONK5SzIXbrhPE-vQ7BAro_2yXpo9Y1Fuf_WPflbKEpD0ZoRu3mo0z6yOyGgs6YmkKtjLjjT5sD9or4VGArc36ilg/s1600/IMAG0112.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;320&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ30oy8VHJgR2jsxgLax4QOYyiaXDPWcsx0Z3JtKvNe5Q7iNfJehdONK5SzIXbrhPE-vQ7BAro_2yXpo9Y1Fuf_WPflbKEpD0ZoRu3mo0z6yOyGgs6YmkKtjLjjT5sD9or4VGArc36ilg/s320/IMAG0112.jpg&quot; width=&quot;180&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: left;&quot;&gt;
We have one rack full of critical, world facing servers. These servers need to&amp;nbsp;be available all the times making it very difficult to reorganise the cabling.&amp;nbsp;As a result over time, as we add and remove servers, the cabling becomes a mess.&amp;nbsp;This is starting to become a risk! We are just going to have to accept some down time&amp;nbsp;to sort it out in the near future.&lt;/div&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: left;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/377205002143228247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/377205002143228247' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/377205002143228247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/377205002143228247'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2013/04/the-art-of-cabling.html' title='The art of cabling '/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/06832242228255061276</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiP3nQmR75gm-awyG5aUmPiea3AIdGuUvLjl00UsTRpVI9gdirc4E8yfyA06LBwJI1wQg_IGhcZOno7mkhc4RsxUKxz7loO-cSTSdPUpT7nkser4EEsHhLbuTRTOTmRyMzR5URreG8g9os/s72-c/photo.jpg" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5568386501482389905</id><published>2013-04-15T10:30:00.000+01:00</published><updated>2013-04-15T10:32:02.878+01:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Benchmark"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="hpc"/><category scheme="http://www.blogger.com/atom/ns#" term="htc"/><category scheme="http://www.blogger.com/atom/ns#" term="KVM"/><category scheme="http://www.blogger.com/atom/ns#" term="virtualization"/><title type='text'>virtualization performance hit</title><content type='html'>&lt;br /&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Like the rest of the world, &amp;nbsp;there is a lot of discussion going about the use of clouds and virtualization in gridpp.&amp;nbsp;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
http://www.gridpp.ac.uk/gridpp30/mcnab-lhcb-vmclouds-march-2013.pdf&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
http://www.admin-magazine.com/HPC/articles/the_cloud_s_role_in_hpc&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Using virtualization will have a performance impact, so using it for our type of computing (hpc/htc) may not be the best solution. However just what impact does it have? A quick search of the web suggests anywhere between 3 to 30%. Most of the overhead appears to be in the kernel and in i/o.&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
http://serverfault.com/questions/261974/how-much-overhead-does-x86-x64-virtualization-have&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
http://www.anandtech.com/show/3827/virtualization-ask-the-experts-1&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
I decided that I wanted to do some of my own tests with the focus on the type of work we do in gridpp.&amp;nbsp;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Testbed: 24 thread westmere processor running at 2.66 GHz + 48 Gig of memory using Scientific Linux 6.3 (basically RHEL6). I&#39;m using the default install of KVM with the virtual image as a local file setup to use all 24 threads.&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Benchmarks: 1) I unpack and make the ROOT analysis package using 24 threads; 2) as 1 but using only one thread. 3) I generate 500,000 Montecarlo events using the HERWIG++ Generator; 4) as 3 but I also include the time taken to unpack and install HERWIG++;&amp;nbsp;5) I run the HEP-SPEC06 benchmark. For tests 1 to 4 i use the TIME command to obtain the real time taken (smaller is better), for 5 I report the hep-spec score (larger is better).&amp;nbsp; I will run the benchmarks on the bare metal install and on the VM on the same hardware and compare the results.&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Results:&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6xb2sH8WKvYGs899Mf-7dvCENrYzmNgQDtahyL1db3QOnzZgg47MuIQHjAk9Mg3-UnNERHMHb-ouVFmVRlqpxternEvm5JgMpwYY8gd04JA0iENotw6s-oli-GuI-A8gCT5tZX6PrQLs/s1600/benchmark1.001.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;180&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6xb2sH8WKvYGs899Mf-7dvCENrYzmNgQDtahyL1db3QOnzZgg47MuIQHjAk9Mg3-UnNERHMHb-ouVFmVRlqpxternEvm5JgMpwYY8gd04JA0iENotw6s-oli-GuI-A8gCT5tZX6PrQLs/s320/benchmark1.001.jpg&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhT8j2AdRRiUI2De6Aw9KH1QMmJ1PDswewizaFP8-xFhViTtrpxWFYNGHaSNm2-JYY8Wq_2FQ_G9If0Fa79noAD3AKAocGZPapiYOOcoBI6TzzeK2Fg1AkU7VmlD_YjP61cTXTxQJYRLFg/s1600/benchmark2.001.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;180&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhT8j2AdRRiUI2De6Aw9KH1QMmJ1PDswewizaFP8-xFhViTtrpxWFYNGHaSNm2-JYY8Wq_2FQ_G9If0Fa79noAD3AKAocGZPapiYOOcoBI6TzzeK2Fg1AkU7VmlD_YjP61cTXTxQJYRLFg/s320/benchmark2.001.jpg&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
Out of the box performance of KVM results in ~3% (CPU intensive) to 20% (sys call intensive) reduction in performance. There is some indication of correlation with ratio of sys time / user time (particular effect with make/tar/gzip?). This is not seen in HEP-SPEC result.&amp;nbsp;SYS time is the CPU time spent within the kernel and from previous studies we expect this to incur a high performance hit in&amp;nbsp;virtualization.&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px; min-height: 14px;&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style=&quot;font-family: Helvetica; font-size: 12px;&quot;&gt;
If I get the time I intend to repeat analysis using optimisations (e.g. guest image on LVM). Repeat analysis using fedora 18 ( ~RHEL 7). Repeat using sandybridge cpu. Look at network performance (eg iozone with lustre).&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5568386501482389905/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5568386501482389905' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5568386501482389905'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5568386501482389905'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2013/04/like-rest-of-world-is-lot-of-discussion.html' title='virtualization performance hit'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/06832242228255061276</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6xb2sH8WKvYGs899Mf-7dvCENrYzmNgQDtahyL1db3QOnzZgg47MuIQHjAk9Mg3-UnNERHMHb-ouVFmVRlqpxternEvm5JgMpwYY8gd04JA0iENotw6s-oli-GuI-A8gCT5tZX6PrQLs/s72-c/benchmark1.001.jpg" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-7734464812929614970</id><published>2013-04-10T14:24:00.000+01:00</published><updated>2013-04-10T14:24:22.632+01:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="atlas"/><category scheme="http://www.blogger.com/atom/ns#" term="cern"/><category scheme="http://www.blogger.com/atom/ns#" term="cloud"/><category scheme="http://www.blogger.com/atom/ns#" term="CloudStack"/><category scheme="http://www.blogger.com/atom/ns#" term="lustre"/><category scheme="http://www.blogger.com/atom/ns#" term="OpenNebula"/><category scheme="http://www.blogger.com/atom/ns#" term="opennms"/><category scheme="http://www.blogger.com/atom/ns#" term="OpenStack"/><category scheme="http://www.blogger.com/atom/ns#" term="puppet"/><category scheme="http://www.blogger.com/atom/ns#" term="qmul"/><category scheme="http://www.blogger.com/atom/ns#" term="razor"/><title type='text'>The Queen Mary Grid Cluster</title><content type='html'>&lt;br /&gt;
The qmul grid/htc cluster is a high throughput (htc) research computing cluster based at Queen Mary, University of London. We&amp;nbsp;primarily serve the scientific grid community and are funded by the griddpp&lt;br /&gt;
collaboration (i.e. uk stfc research council). By high throughput we mean the&amp;nbsp;ability to do lots of individual separate jobs. Our main workload is data analysis&amp;nbsp;for the ATLAS experiment at cern. We are the top site in the UK for this type of work,&amp;nbsp;and one of the leading sites for the ATLAS LHC experiment in the world. We are part of the LondonGrid (hence the post to this blog!)&lt;br /&gt;
&lt;br /&gt;
Our cluster comprises of:&lt;br /&gt;
&lt;br /&gt;
For running the actual jobs&lt;br /&gt;
30 Dell C6100 using X5650s processors, contributing a total of 2880 job slots, and&lt;br /&gt;
60 older streamline nodes using E5420 processors, contributing a total of 480 job slots.&lt;br /&gt;
&lt;br /&gt;
For Storage we run the Lustre parallel file system using&lt;br /&gt;
72 Dell R510s with 1800 TBytes of disk and&lt;br /&gt;
12 older Dell 1950s with MD100 disk arrays with 360TB of disk&lt;br /&gt;
Our actual provision is about 1600TB due to the use of raid 6 and &quot;real&quot; disk sizes.&lt;br /&gt;
&lt;br /&gt;
We have a lot of development work to do over the next year which I hope to&amp;nbsp;describe over the coming month in this blog including...&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;A new monitoring system probably based on opennms.&lt;/li&gt;
&lt;li&gt;A new deployment system, to replace our hand made perl/mason/kickstart system&amp;nbsp;probably using razor and puppet.&lt;/li&gt;
&lt;li&gt;A cloud stack, we&#39;ve been doing scientific computing using the grid software,&amp;nbsp;but this model of computing is likely to be replaced with a cloud type model,&amp;nbsp;we will need to look at the various options (OpenStack, CloudStack or OpenNebula).&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhO_i8o821OLRldu2FdK_aeWeuV4ghVW7lASHW3S16df-LwK9CExj0R5hx-g9bCLNIHWBMQWcNfpYjtxwZd8cJVKkMzKlb5yykMvNqFRGG-lNVvEYci-rV8pjz2PB94tyQoYsVCrsltPMY/s1600/IMAG0106.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhO_i8o821OLRldu2FdK_aeWeuV4ghVW7lASHW3S16df-LwK9CExj0R5hx-g9bCLNIHWBMQWcNfpYjtxwZd8cJVKkMzKlb5yykMvNqFRGG-lNVvEYci-rV8pjz2PB94tyQoYsVCrsltPMY/s1600/IMAG0106.jpg&quot; height=&quot;320&quot; width=&quot;180&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;
The 11 racks of the QMUL cluster&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/7734464812929614970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/7734464812929614970' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/7734464812929614970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/7734464812929614970'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2013/04/the-queen-mary-grid-cluster.html' title='The Queen Mary Grid Cluster'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/06832242228255061276</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhO_i8o821OLRldu2FdK_aeWeuV4ghVW7lASHW3S16df-LwK9CExj0R5hx-g9bCLNIHWBMQWcNfpYjtxwZd8cJVKkMzKlb5yykMvNqFRGG-lNVvEYci-rV8pjz2PB94tyQoYsVCrsltPMY/s72-c/IMAG0106.jpg" height="72" width="72"/><thr:total>0</thr:total><georss:featurename>London Borough of Tower Hamlets, London E1 4NS, UK</georss:featurename><georss:point>51.523410299999988 -0.0405322999999953</georss:point><georss:box>51.520940299999985 -0.0455747999999953 51.52588029999999 -0.035489799999995304</georss:box></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-2468187611342108877</id><published>2011-03-11T12:41:00.003+00:00</published><updated>2011-03-11T12:44:07.181+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Alces"/><category scheme="http://www.blogger.com/atom/ns#" term="cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Dell"/><category scheme="http://www.blogger.com/atom/ns#" term="GridPP"/><category scheme="http://www.blogger.com/atom/ns#" term="newton"/><category scheme="http://www.blogger.com/atom/ns#" term="RHUL"/><title type='text'>RHUL cluster expands</title><content type='html'>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;http://farm6.static.flickr.com/5051/5516797353_5400b5b13c_m.jpg&quot;&gt;&lt;img style=&quot;float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 180px; height: 240px;&quot; src=&quot;http://farm6.static.flickr.com/5051/5516797353_5400b5b13c_m.jpg&quot; alt=&quot;&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;Yesterday, RHUL took delivery of new storage and compute nodes to beef up its Tier2 cluster.&lt;br /&gt;The GridPP and CIF funded kit was supplied by Dell and is being installed and configured by Alces.&lt;br /&gt;The extra 6.3 kHS06 and 420 TB will more than double the capacity of cluster.&lt;br /&gt;Once  the installation is complete and accepted, work to integrate it with  the existing cluster and bring up the gLite services will begin.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/2468187611342108877/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/2468187611342108877' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2468187611342108877'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2468187611342108877'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2011/03/rhul-cluster-expands.html' title='RHUL cluster expands'/><author><name>Simon George</name><uri>http://www.blogger.com/profile/10363160113556218890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh_bogjU-Lc1R2vUKgJsehblTLNHVFSGYEXAaMIhdQQ6Y85_hky_8VMSdSn0AvJ3JpIGyKbl50Ef4OdCCS52uaZ8i6BiVoGmTfKpSpFd7E_ArWH8_pf7ZkRTlcZvdCuw/s220/Simon2_small.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://farm6.static.flickr.com/5051/5516797353_5400b5b13c_t.jpg" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-1936649762288487674</id><published>2010-02-19T17:29:00.002+00:00</published><updated>2010-02-19T17:40:34.034+00:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cluster"/><category scheme="http://www.blogger.com/atom/ns#" term="Clustervision"/><category scheme="http://www.blogger.com/atom/ns#" term="IC"/><category scheme="http://www.blogger.com/atom/ns#" term="move"/><category scheme="http://www.blogger.com/atom/ns#" term="newton"/><category scheme="http://www.blogger.com/atom/ns#" term="RHUL"/><title type='text'>RHUL &#39;Newton&#39; cluster comes home</title><content type='html'>After two years hosted by Imperial College, our &#39;Newton&#39; Grid computing cluster has finally been relocated to Royal Holloway&#39;s new state-of-the-art computer centre. The move was carried out by &lt;a href=&quot;http://www.clustervision.com/&quot; target=&quot;_top&quot;&gt;Clustervision&lt;/a&gt; and everything went smoothly.  Before the cluster goes back into production, analysing LHC data, a software upgrade to SL5 is planned.&lt;br /&gt;&lt;br /&gt;A small part of Newton remains at IC: the racks were donated to become part of the particle physics cluster.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/1936649762288487674/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/1936649762288487674' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1936649762288487674'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1936649762288487674'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2010/02/rhul-newton-cluster-comes-home.html' title='RHUL &#39;Newton&#39; cluster comes home'/><author><name>Simon George</name><uri>http://www.blogger.com/profile/10363160113556218890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh_bogjU-Lc1R2vUKgJsehblTLNHVFSGYEXAaMIhdQQ6Y85_hky_8VMSdSn0AvJ3JpIGyKbl50Ef4OdCCS52uaZ8i6BiVoGmTfKpSpFd7E_ArWH8_pf7ZkRTlcZvdCuw/s220/Simon2_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-2138030839740954842</id><published>2009-07-31T12:14:00.028+01:00</published><updated>2009-08-03T16:08:54.719+01:00</updated><title type='text'>Comparing ATLAS analysis at RHUL using the file-staging and RFIO approaches</title><content type='html'>I have been looking at the performance of the Royal Holloway cluster during Hammercloud tests in which data was accessed directly from the DPM pool nodes using the RFIO protocol and comparing it to the recent UK-wide file-staging test (&lt;a href=&quot;http://gangarobot.cern.ch/hc/540/test/&quot;&gt;540&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;For the RFIO  approach  two identical tests (&lt;a href=&quot;http://gangarobot.cern.ch/hc/537/test/&quot;&gt;537&lt;/a&gt; and &lt;a href=&quot;http://gangarobot.cern.ch/hc/538/test/&quot;&gt;538&lt;/a&gt;) were requested in order to ensure enough jobs arrived on site. The RFIO IOBUFSIZE was set to 4KB.  Job CPU efficiencies and cluster throughput (the product of number of running jobs and average job efficiency) were extracted using  Sam and Dug&#39;s script. The job throughput climbed steadily up to a peak at around 320 running jobs. At this point the throughput started to decline probably compounded by the fact that one of the disk servers lost a disk and became over-loaded.&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj36BgQJedWeHp8PIZc1CLQSD8guZOke3JnWP9EofknESgo60hkatxgOe64nsH4C0CJHPryLUpjhY-UlwlwiDCzTidklol-nO6ji2S4tReQ2UiWiB3DO0ehYKQN4zqDRqRvrgt6wUnc1r0/s1600-h/test537-538-thrpt-2.png&quot;&gt;&lt;img style=&quot;margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 278px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj36BgQJedWeHp8PIZc1CLQSD8guZOke3JnWP9EofknESgo60hkatxgOe64nsH4C0CJHPryLUpjhY-UlwlwiDCzTidklol-nO6ji2S4tReQ2UiWiB3DO0ehYKQN4zqDRqRvrgt6wUnc1r0/s320/test537-538-thrpt-2.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5365698111053006898&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;The CPU efficiency  declined relatively consistently as the number of running jobs increased:&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2h5sBLvVRBZdO1pCKjzO74tVuVEP6d2oMMncHa6-yyWxwBZduRihwD4dSqOA8O5G_U5dq2CmhSSBeJX7OcRqm3q6Kx5f1iArz-JP3EhGrdeXYkHhJ-xAAgmrx6n4He4tbMvkGjLddfUg/s1600-h/test-537-538-eff-1.png&quot;&gt;&lt;img style=&quot;margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 312px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2h5sBLvVRBZdO1pCKjzO74tVuVEP6d2oMMncHa6-yyWxwBZduRihwD4dSqOA8O5G_U5dq2CmhSSBeJX7OcRqm3q6Kx5f1iArz-JP3EhGrdeXYkHhJ-xAAgmrx6n4He4tbMvkGjLddfUg/s320/test-537-538-eff-1.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5365694489141732722&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;Each job was reading data at about 1 MB/s so that at the peak the total bandwidth was around 350 MB/s - roughly 30 MB/s per disk server. The disk servers were working hard, however, the iostat %util values were around 100% with high cpu iowait values.&lt;br /&gt;&lt;br /&gt;So how do these results compare to those obtained when staging files to the worker node prior to analysis? This graph shows the same RFIO throughput  data together with results from the recently run file-staging test:&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5d6L-tXQVj01FPp3ACZYZ2Ezz7N2Gb-kccsqdcBSgTpQnR5_64b24XLasgfdoTn3oPH4sZTX57W6f4RM2PCWgqxIOUVBacO2bWTOosfvHkyTQwZNhhm7DJay3idxVuAYSHk8FDXXFgKE/s1600-h/rfio-filestage-thrpt-1.png&quot;&gt;&lt;img style=&quot;margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 223px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5d6L-tXQVj01FPp3ACZYZ2Ezz7N2Gb-kccsqdcBSgTpQnR5_64b24XLasgfdoTn3oPH4sZTX57W6f4RM2PCWgqxIOUVBacO2bWTOosfvHkyTQwZNhhm7DJay3idxVuAYSHk8FDXXFgKE/s320/rfio-filestage-thrpt-1.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5365693213878740930&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The throughput during file-staging leveled off  earlier - at around 175 running jobs. Similarly the average job efficiency drops more steeply:&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3Yl95J9E8cOaqKLWg_yaL4-yTXThiHp-NrfpgCebIvCvhswPOvjCQWAddkB273aoazyHV6R0hVu86ovDUMxKx7eFe-HxrSfCGseFe_BrU-8KFkXWNvvsDYqWAUCBR63VsGMQtKorbmqc/s1600-h/rfio-filestage-eff-1.png&quot;&gt;&lt;img style=&quot;margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 207px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3Yl95J9E8cOaqKLWg_yaL4-yTXThiHp-NrfpgCebIvCvhswPOvjCQWAddkB273aoazyHV6R0hVu86ovDUMxKx7eFe-HxrSfCGseFe_BrU-8KFkXWNvvsDYqWAUCBR63VsGMQtKorbmqc/s320/rfio-filestage-eff-1.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5365693353147830434&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;The job failure rate for the RFIO tests was  4% compared to 17% for the file-staging test.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/2138030839740954842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/2138030839740954842' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2138030839740954842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2138030839740954842'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2009/07/comparing-atlas-analysis-at-rhul-using.html' title='Comparing ATLAS analysis at RHUL using the file-staging and RFIO approaches'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj36BgQJedWeHp8PIZc1CLQSD8guZOke3JnWP9EofknESgo60hkatxgOe64nsH4C0CJHPryLUpjhY-UlwlwiDCzTidklol-nO6ji2S4tReQ2UiWiB3DO0ehYKQN4zqDRqRvrgt6wUnc1r0/s72-c/test537-538-thrpt-2.png" height="72" width="72"/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5787450502301880601</id><published>2009-05-13T19:55:00.004+01:00</published><updated>2009-05-13T20:06:24.212+01:00</updated><title type='text'>RHUL getting good rates into MCDISK from RAL</title><content type='html'>&lt;div style=&quot;text-align: justify;&quot;&gt;RHUL has regularly got good rates and by that I mean 80-100 MB/s from Fermilab when downloading CMS data. It nice now to see similarly high rates downloading ATLAS data into the MCDISK space token from RAL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIgIPmK3KhIARpeT_BS677wAmlFNmATcLX_2UlzpRkUTBs8g_hu6By_Ws2LqWEuwc4q19rWN4xNEkUW4ghJjO03cj3EByDHT8qJfD23UiHOgMsZzLEhRsbX5lzzD67Ivce-YAkh6YjaII/s1600-h/ral-rhul.png&quot;&gt;&lt;img style=&quot;margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 124px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIgIPmK3KhIARpeT_BS677wAmlFNmATcLX_2UlzpRkUTBs8g_hu6By_Ws2LqWEuwc4q19rWN4xNEkUW4ghJjO03cj3EByDHT8qJfD23UiHOgMsZzLEhRsbX5lzzD67Ivce-YAkh6YjaII/s320/ral-rhul.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5335384959008422946&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5787450502301880601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5787450502301880601' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5787450502301880601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5787450502301880601'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2009/05/rhul-getting-good-rates-into-mcdisk.html' title='RHUL getting good rates into MCDISK from RAL'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIgIPmK3KhIARpeT_BS677wAmlFNmATcLX_2UlzpRkUTBs8g_hu6By_Ws2LqWEuwc4q19rWN4xNEkUW4ghJjO03cj3EByDHT8qJfD23UiHOgMsZzLEhRsbX5lzzD67Ivce-YAkh6YjaII/s72-c/ral-rhul.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-1925143640107016877</id><published>2008-04-16T13:44:00.003+01:00</published><updated>2008-04-16T13:57:31.799+01:00</updated><title type='text'>Exercised space token creation at UCL-HEP</title><content type='html'>Thought it was neat to give it a try and created as a test a small reservation for dteam, following the instructions on the LCG Twiki. All went well and all the tweaks for SL3 / gLite 3.0 worked well. Only oddity was that: &lt;pre&gt;[root@pc55 root]# dpm-reservespace --gspace 10M --lifetime Inf --group lcgdteam --token_desc dteam_10M&lt;br /&gt;send2nsd: NS009 - fatal configuration error: Host unknown: UNUSED&lt;br /&gt;invalid group: lcgdteam&lt;/pre&gt;but: &lt;pre&gt;[root@pc55 root]# dpm-reservespace --gspace 10M --lifetime Inf --gid 2688 --token_desc dteam_10M&lt;/pre&gt;worked well. Perhaps due to the fact that the group id is not the same as the VO name?? (tried also with &#39;dteam&#39; in place of &#39;lcgdteam&#39;, but had the same error.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/1925143640107016877/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/1925143640107016877' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1925143640107016877'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1925143640107016877'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2008/04/exercised-space-token-creation-at-ucl.html' title='Exercised space token creation at UCL-HEP'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5814181703450295735</id><published>2008-03-26T14:41:00.003+00:00</published><updated>2008-03-26T15:19:05.357+00:00</updated><title type='text'>RHUL aircon problems</title><content type='html'>Our machine room aircon system broke down last week and the temperatures have been all over the place.&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8OaBmJZiRmD3gSTbv9uO2bK1ZQQkgEPgR-yxgE852KKR90VmjfcnKQ0L3XxHGeRyo7yZNou22ZU3DLFynDyQSozBtZbRISxeF1exckCpEtr06-69NTlT8-VBmx8yI_c1T4zCotL4vFTwh/s1600-h/last_month.png&quot;&gt;&lt;img style=&quot;margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8OaBmJZiRmD3gSTbv9uO2bK1ZQQkgEPgR-yxgE852KKR90VmjfcnKQ0L3XxHGeRyo7yZNou22ZU3DLFynDyQSozBtZbRISxeF1exckCpEtr06-69NTlT8-VBmx8yI_c1T4zCotL4vFTwh/s320/last_month.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5182069525644667330&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;After a few days of summer clothing and a few nights of temperature alarms, it was diagnosed to be a refrigerant gas leak  from  the chiller on the roof. The bad news is that this takes 2 weeks to fix. Luckily the estates engineer was very efficient and organised the delivery and connection of a backup chiller on the last day of  term, then personally looked  in over Easter to keep an eye on it.&lt;br /&gt;&lt;br /&gt;It has been stable the last few days so I&#39;ve just brought the cluster back up. The site will come out of downtime this evening.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5814181703450295735/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5814181703450295735' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5814181703450295735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5814181703450295735'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2008/03/rhul-aircon-problems.html' title='RHUL aircon problems'/><author><name>Simon George</name><uri>http://www.blogger.com/profile/10363160113556218890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='//blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh_bogjU-Lc1R2vUKgJsehblTLNHVFSGYEXAaMIhdQQ6Y85_hky_8VMSdSn0AvJ3JpIGyKbl50Ef4OdCCS52uaZ8i6BiVoGmTfKpSpFd7E_ArWH8_pf7ZkRTlcZvdCuw/s220/Simon2_small.jpg'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8OaBmJZiRmD3gSTbv9uO2bK1ZQQkgEPgR-yxgE852KKR90VmjfcnKQ0L3XxHGeRyo7yZNou22ZU3DLFynDyQSozBtZbRISxeF1exckCpEtr06-69NTlT8-VBmx8yI_c1T4zCotL4vFTwh/s72-c/last_month.png" height="72" width="72"/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5539324308011814857</id><published>2007-08-07T16:15:00.000+01:00</published><updated>2007-08-07T16:27:55.330+01:00</updated><title type='text'>UCL-HEP APEL accounting fixed</title><content type='html'>After upgrading to gLite r27 on the 4th of July, APEL stopped publishing to the central RGMA registry. The apel-publisher failed with a not handled&lt;br /&gt;&lt;pre&gt;RGMABufferFullException&lt;/pre&gt;To fix this, we had to update to the latest version of the APEL rpm&#39;s (2.0.5-1) on the MON and CE and re-run YAIM on both</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5539324308011814857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5539324308011814857' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5539324308011814857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5539324308011814857'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/08/ucl-hep-apel-accounting-fixed.html' title='UCL-HEP APEL accounting fixed'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-3726395956612776373</id><published>2007-07-20T14:00:00.000+01:00</published><updated>2007-07-20T14:01:31.224+01:00</updated><title type='text'>Imperial SE - dCache removed ~30TB of CMS data</title><content type='html'>As requested by CMS users, this week we have cleaned up around ~30TB (orphaned files) of CMS data from IC dCache. We need to understand why so many orphaned files are generated in dCache.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/3726395956612776373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/3726395956612776373' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3726395956612776373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3726395956612776373'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/07/imperial-se-dcache-removed-30tb-of-cms_20.html' title='Imperial SE - dCache removed ~30TB of CMS data'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-387086785548007204</id><published>2007-07-20T13:19:00.001+01:00</published><updated>2007-07-20T13:24:09.492+01:00</updated><title type='text'>Brunel SE running DPM 1.6.5</title><content type='html'>We were having problems with the storage element at Brunel so I upgraded it to DPM version 1.6.5 (via 1.6.3) this week. The upgrade didn&#39;t go totally smoothly but now things seem a lot better. Thanks to Greig for his usual excellent support.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/387086785548007204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/387086785548007204' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/387086785548007204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/387086785548007204'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/07/brunel-se-running-dpm-165.html' title='Brunel SE running DPM 1.6.5'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-8020664768387399936</id><published>2007-07-20T12:46:00.000+01:00</published><updated>2007-07-20T13:18:22.413+01:00</updated><title type='text'>Brunel running SL4 cluster</title><content type='html'>The worker nodes of dgc-grid-40 are now running the glite worker node release on SL4. It is passing the ops SAM tests and the VO tests that have run recently. There was a problem with LHCb production jobs trying to use edg-brokerinfo rather than glite-brokerinfo which I reported and they have now fixed. CMS production jobs have also completed successfully. Steve Lloyd&#39;s ATLAS tests pass apart from the &#39;New Package&#39; part. Steve&#39;s comment was &quot;My tests are still running release 12.0.6 for which the requirement is SL3 so they shouldn&#39;t really go into SL4 machines...this problem will go away when I switch to release 13.0.X as that&#39;s supposed to work on SL4&quot;. ATLAS production jobs seem to run OK but there seems to be a problem copying the output files back.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/8020664768387399936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/8020664768387399936' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/8020664768387399936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/8020664768387399936'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/07/brunel-running-sl4-cluster.html' title='Brunel running SL4 cluster'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-1899162859683472354</id><published>2007-07-20T12:32:00.000+01:00</published><updated>2007-07-20T12:45:45.295+01:00</updated><title type='text'>RHUL accounting problem</title><content type='html'>There was a problem with the apel accounting at RHUL this week:&lt;br /&gt;&lt;br /&gt;ZoneInfo: /usr/java/j2sdk1.4.2_12/jre/lib/zi/ZoneInfoMappings (Too&lt;br /&gt;many open files)&lt;br /&gt;Thu Jul 19 00:35:06 GMT 2007: apel-pbs-log-parser - WARNING -&lt;br /&gt;Exception opening file /var/spool/PBS/server_priv/accounting/20070713&lt;br /&gt;java.io.FileNotFoundException:&lt;br /&gt;/var/spool/PBS/server_priv/accounting/20070713 (Too many open files)&lt;br /&gt;&lt;br /&gt;we solved it by moving some of the files out of  /var/spool/PBS/server_priv&lt;wbr&gt;/accounting.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/1899162859683472354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/1899162859683472354' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1899162859683472354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/1899162859683472354'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/07/rhul-accounting-problem.html' title='RHUL accounting problem'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5647096105150851679</id><published>2007-06-26T05:53:00.000+01:00</published><updated>2007-06-26T05:56:23.216+01:00</updated><title type='text'>bdii counts</title><content type='html'>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRjAQ6ayjedEEgIg0FAWUHoZZ6IgcSx2X4Ay81lNw-QZBZCfCQLcrrXHCkcFv4sABikdFF8YggMhcP4lzBIiv84cIoP-UCe7674XKVm9ZvFAaXxgaIFh52pXZA4zl-CJ0CZuiGkxOzP_uT/s1600-h/bdii-counts.png&quot;&gt;&lt;img style=&quot;display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRjAQ6ayjedEEgIg0FAWUHoZZ6IgcSx2X4Ay81lNw-QZBZCfCQLcrrXHCkcFv4sABikdFF8YggMhcP4lzBIiv84cIoP-UCe7674XKVm9ZvFAaXxgaIFh52pXZA4zl-CJ0CZuiGkxOzP_uT/s400/bdii-counts.png&quot; border=&quot;0&quot; alt=&quot;&quot;id=&quot;BLOGGER_PHOTO_ID_5080232340810957954&quot; /&gt;&lt;/a&gt;&lt;br /&gt;Promised to monitor the bdii. This is the plot of the bdii count a while ago. I&#39;ll have to redo it for a longer period. It seems clear that it is not the entire site bdii that disappear but only individual entries. Which is very probably correlated with load. We have seen it with the ce mds.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5647096105150851679/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5647096105150851679' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5647096105150851679'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5647096105150851679'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/bdii-counts.html' title='bdii counts'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRjAQ6ayjedEEgIg0FAWUHoZZ6IgcSx2X4Ay81lNw-QZBZCfCQLcrrXHCkcFv4sABikdFF8YggMhcP4lzBIiv84cIoP-UCe7674XKVm9ZvFAaXxgaIFh52pXZA4zl-CJ0CZuiGkxOzP_uT/s72-c/bdii-counts.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-3786284285836533011</id><published>2007-06-19T10:28:00.000+01:00</published><updated>2007-06-19T10:35:31.812+01:00</updated><title type='text'>RB very slow</title><content type='html'>Yesterday I have been wrestling with our RB. I takes several hours for a job to go from waiting to scheduled which means that the matchmaking process is overloaded. I think the reason was that the database was very big (4GB). Exacly 2^32. As suggested &lt;a href=&quot;http://www.gridpp.ac.uk/wiki/IC-HEP#RB_problems&quot;&gt;here&lt;/a&gt; I cleaned the database and it seems better now. The problem is that I never got to the root of what was going wrongly...</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/3786284285836533011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/3786284285836533011' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3786284285836533011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/3786284285836533011'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/rb-very-slow.html' title='RB very slow'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-2280893804332421492</id><published>2007-06-19T10:22:00.000+01:00</published><updated>2007-06-19T10:27:56.316+01:00</updated><title type='text'>dCache failures (dcache-server-1.7.0-36)</title><content type='html'>Again this morning we have pools going down with a memory allocation problem:&lt;br /&gt;--&lt;br /&gt;06/19 00:45:58 Cell(sedsk01_5@sedsk01Domain) : Thread : ping got : java.lang.OutOfMemoryError: Java heap space&lt;br /&gt;--&lt;br /&gt;I think the only way we will solve this will be to get hold on a dCache developer that can have a look. Clearly we did not have this problem when we where running the previous version (release 35).</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/2280893804332421492/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/2280893804332421492' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2280893804332421492'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2280893804332421492'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/dcache-failures-dcache-server-170-36.html' title='dCache failures (dcache-server-1.7.0-36)'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-4983196248231363165</id><published>2007-06-18T13:25:00.000+01:00</published><updated>2007-06-18T13:28:01.631+01:00</updated><title type='text'>dCache pools went down</title><content type='html'>From friday afternoon several dCache pools went down. It ran out of memory, and here is the content of the sedsk01Domain.log file.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-size:78%;&quot;&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at java.lang.Thread.run(Thread.java:595)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) : Storing incomplete file : 0003000000000000006E0B80 with 2756018417&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) : Stacked Exception (Original) for : 0003000000000000006E0B80 &lt;-P---------(0)[0]&gt; 2756018417 si={cms:cms} : CacheException(rc=10006;msg=Pnfs request timed out)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) : Stacked Throwable (Resulting) for : 0003000000000000006E0B80 &lt;-P---------(0)[0]&gt; 2756018417 si={cms:cms} : CacheException(rc=33;msg=Illegal State Transition -P-------- -&gt; -P--------)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) : CacheException(rc=33;msg=Illegal State Transition -P-------- -&gt; -P--------)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at diskCacheV111.repository.CacheRepository2$CacheEntry.setPrimaryState(CacheRepository2.java:107)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at diskCacheV111.repository.CacheRepository2$CacheEntry.setPrecious(CacheRepository2.java:219)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at diskCacheV111.repository.CacheRepository2$CacheEntry.setPrecious(CacheRepository2.java:215)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at diskCacheV111.pools.MultiProtocolPool2$RepositoryIoHandler.run(MultiProtocolPool2.java:1538)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at diskCacheV111.util.SimpleJobScheduler$SJob.run(SimpleJobScheduler.java:64)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:32:13 Cell(sedsk01_1@sedsk01Domain) :  at java.lang.Thread.run(Thread.java:595)&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:35:02 Cell(c-100@sedsk01Domain) : runIO : java.lang.OutOfMemoryError: Java heap space&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:35:02 Cell(c-100@sedsk01Domain) : java.lang.OutOfMemoryError: Java heap space&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:35:02 Cell(c-100@sedsk01Domain) : java.lang.OutOfMemoryError: Java heap space&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: courier new;&quot;&gt;06/15 16:38:25 Cell(c-100@sedsk01Domain) : runIO : java.lang.OutOfMemoryError: Java heap space&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;dCache is started with those parameters:&lt;br /&gt; -server -Xmx512m -XX:MaxDirectMemorySize=512m&lt;br /&gt;&lt;br /&gt;We don&#39;t know what happened.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/4983196248231363165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/4983196248231363165' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/4983196248231363165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/4983196248231363165'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/dcache-pools-went-down.html' title='dCache pools went down'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-7905908706407043898</id><published>2007-06-15T03:47:00.000+01:00</published><updated>2007-06-15T03:52:31.561+01:00</updated><title type='text'>Dataset access problem at IC-HEP</title><content type='html'>&lt;!---mandFontOffStart---&gt;&lt;!---mandFontOffEnd---&gt;Some users are experimenting datasets access problems at IC-HEP. The ticket in question is GGUS 22106. The problem is that our cms users don&#39;t have the problem for the same dataset.&lt;br /&gt;This raises the question on how to debug those problems when  you don&#39;t have users on hand. In this case the only solutions will be to do it interactively with the user.</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/7905908706407043898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/7905908706407043898' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/7905908706407043898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/7905908706407043898'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/dataset-access-problem-at-ic-hep.html' title='Dataset access problem at IC-HEP'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-6332852403826164252</id><published>2007-06-15T03:28:00.000+01:00</published><updated>2007-06-15T03:47:15.891+01:00</updated><title type='text'>SAM Failures in London</title><content type='html'>Summary of SAM failures and solutions&lt;br /&gt;&lt;ul&gt;&lt;li&gt;mars-ce2: CA certificates updated but permissions where wrong for the lt2-lcg group and hence the certs where not readable. Fixed now&lt;/li&gt;&lt;li&gt;hep-ce:&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Update of the images. Missing ssl and uuid libraries caused the lcg-cp tools to fail. Matt solved this&lt;/li&gt;&lt;li&gt;updated the CA but unfortunatly the crl cronjob did not run  since it is being run by mona. Now fixed&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;gw-2 (UCL-CENTRAL): Investigated intermittent failures and discovered that the sam jobs are sometimes killed by sge which has a vmem limit of 2GB. The problem is that python when creating a new thread tries to use the max stack size of the parent process. Since sge set this with a very high value any new thread will thread will try to create a big stack and the vmem limit will be reached. The solution is to change the max stack size in the sge configuration. We tried a ulimit -s 10 in the jobmanager but since then gw-2 is failing the ops test consistently. William has been contacted the revert back this change and make the modification in the sge queue configuration.&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Note: this problem was seen on the ic-hep cluster (ce00) and fixed using the stack size limit.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;ce1.pp (RHUL): gatekeeper problem, it seems I cannot access with the ssh keys I am using at home. Have to check from IC.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;It&#39;s a black week for the availability in London...</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/6332852403826164252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/6332852403826164252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/6332852403826164252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/6332852403826164252'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/06/sam-failures-in-london.html' title='SAM Failures in London'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-2231965384220007836</id><published>2007-05-02T12:03:00.000+01:00</published><updated>2007-05-02T12:05:22.427+01:00</updated><title type='text'>London Tier2 Workshop</title><content type='html'>The London Tier2 Workshop took place on the 16 of April. &lt;br /&gt;It was a good opportunity to see what are the non hep application running on the grid. &lt;br /&gt;The slides of the workshop can be found &lt;a href=&quot;http://www.gridpp.ac.uk/workshops/LT2April07.html &quot;&gt;here&lt;/a&gt;</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/2231965384220007836/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/2231965384220007836' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2231965384220007836'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2231965384220007836'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/05/london-tier2-workshop.html' title='London Tier2 Workshop'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-2088586924389307073</id><published>2007-05-02T10:37:00.000+01:00</published><updated>2007-05-02T10:41:43.138+01:00</updated><title type='text'>New Grid Security Policy Document</title><content type='html'>The new Grid Security Policy Document can be found at &lt;a href=&quot;https://edms.cern.ch/document/428008/4&quot;&gt;here&lt;/a&gt; . It is still a draft, and comments are welcome. See version 5.6</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/2088586924389307073/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/2088586924389307073' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2088586924389307073'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/2088586924389307073'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/05/new-grid-security-policy-document.html' title='New Grid Security Policy Document'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4517963319103426980.post-5389611624694582469</id><published>2007-02-20T10:55:00.000+00:00</published><updated>2007-02-20T11:02:10.677+00:00</updated><title type='text'>RB Wrestling the comeback</title><content type='html'>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjotPgNQTAP8qXxKx6tY5r-QF6trJYz4k1XlMvNWn4BQpDW-oYfSg4YkR16w7lLRdWNiFc-cOl4U8rBsPzCdLCWv3iphLyO4c31IJsvTqHxwL2A8m3E78roocNVb-ShVTWIcjm1B2Tto4h1/s1600-h/backlog.png&quot;&gt;&lt;img style=&quot;margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjotPgNQTAP8qXxKx6tY5r-QF6trJYz4k1XlMvNWn4BQpDW-oYfSg4YkR16w7lLRdWNiFc-cOl4U8rBsPzCdLCWv3iphLyO4c31IJsvTqHxwL2A8m3E78roocNVb-ShVTWIcjm1B2Tto4h1/s400/backlog.png&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5033569910494885666&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;This morning looking at the monitoring our RB does not look happy. You can judge yourself on the plot below. It clearly seems that when the submission rate is too high the workload manager can just not eat the jobs fast enough to reduce the queue length. I have asked help from Maarten, we&#39;ll see what he come up with. I think I will have a look in the rb code to find out what is going on...</content><link rel='replies' type='application/atom+xml' href='http://londongrid.blogspot.com/feeds/5389611624694582469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment/fullpage/post/4517963319103426980/5389611624694582469' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5389611624694582469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4517963319103426980/posts/default/5389611624694582469'/><link rel='alternate' type='text/html' href='http://londongrid.blogspot.com/2007/02/rb-wrestling-comeback.html' title='RB Wrestling the comeback'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjotPgNQTAP8qXxKx6tY5r-QF6trJYz4k1XlMvNWn4BQpDW-oYfSg4YkR16w7lLRdWNiFc-cOl4U8rBsPzCdLCWv3iphLyO4c31IJsvTqHxwL2A8m3E78roocNVb-ShVTWIcjm1B2Tto4h1/s72-c/backlog.png" height="72" width="72"/><thr:total>0</thr:total></entry></feed>