<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:html="http://www.w3.org/1999/html" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>technovelty</title><link>http://www.technovelty.org</link><description>From the keyboard of Ian Wienand</description><language>en</language><ttl>60</ttl><dc:creator>Ian Wienand</dc:creator><admin:generatorAgent rdf:resource="http://roughingit.subtlehints.net/pyblosxom" /><admin:errorReportsTo rdf:resource="mailto:ianw@ieee.org" /><creativeCommons:license>http://creativecommons.org/licenses/by-sa/3.0/</creativeCommons:license><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/technovelty" type="application/rss+xml" /><feedburner:browserFriendly>This is an XML content feed. It is intended to be viewed in a newsreader or syndicated to another site, subject to copyright and fair use.</feedburner:browserFriendly><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item><title>Django toolchain on Debian</title><guid isPermaLink="false">linux/debian/django-toolchain.html</guid><link>http://www.technovelty.org/linux/debian/django-toolchain.html</link><description>Although Django is well packaged for Debian, I've recently come to the conculsion that the packages are really not what ...</description><content:encoded><![CDATA[
<p>Although Django is well packaged for Debian, I've recently come to
the conculsion that the packages are really not what I want.  The
problem is that my server runs Debian stable, while my development
laptop runs unstable, and Django revisions definitely fall into the
"unstable" category.  There really is no way to use a system Django
1.1 on one side, and a system Django 1.0 on the other.</p>

<p>After a bit of work, I think I've got something together that
works, and I post it here in the hope it is useful for someone else.
This info has been gleaned from similar references such as <a
href="http://www.danceric.net/2009/03/26/django-virtualenv-and-mod_wsgi/">this</a>
and <a
href="http://www.saltycrane.com/blog/2009/05/notes-using-pip-and-virtualenv-django/">this</a>.</p>

<p>This is aimed at running a server using Debian stable (5.0) for
production and an unstable environment for development.  You actually
need both to get this running.  This is based on a project called
"project" that lives in <tt>/var/www</tt></p>

<ol>
<li>First step is to install <tt>python-virtualenv</tt> on both.</li>

<li>Create a virtualenv on both, using the <tt>--no-site-packages</tt>
to make it a stand-alone environment.  This is like a chroot for
python.

<div class="codebox">
<pre>
$ virtualenv --no-site-packages project
New python executable in project/bin/python
Installing setuptools............done.
</pre>
</div>

</li>

<li>The <i>unstable</i> environment has a file you'll need to copy
into the <i>stable</i> environment - <tt>bin/activate_this.py</tt>.
The stable version of <tt>python-virtualenv</tt> isn't recent enough
to include this file, but you need it to essentially switch the system
python into the chrooted environment.  This will come in handy later
when setting up the webserver.</li>

<li>There are probably better ways to keep the two environments in
sync, but I simply take a manual approach of doing everything twice,
once in each.  So from now on, do the following in both
environments.</li>

<li>Activate the environment

<div class="codebox">
<pre>
/var/www$ cd project
/var/www/project$ . bin/activate
(project) /var/www/project$
</pre>
</div>

</li>

<li>Use <tt>easy_install</tt> to install <tt>pip</tt>

<div class="codebox">
<pre>
(project) /var/www/project$ easy_install pip
Searching for pip
Reading http://pypi.python.org/simple/pip/
Reading http://pip.openplans.org
Best match: pip 0.4
Downloading http://pypi.python.org/packages/source/p/pip/pip-0.4.tar.gz#md5=b45714d04f8fd38fe8e3d4c7600b91a2
Processing pip-0.4.tar.gz
Running pip-0.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wu9O-U/pip-0.4/egg-dist-tmp-xjSdxq
warning: no previously-included files matching '*.txt' found under directory 'docs/_build'
no previously-included directories found matching 'docs/_build/_sources'
zip_safe flag not set; analyzing archive contents...
pip: module references __file__
Adding pip 0.4 to easy-install.pth file
Installing pip script to /var/www/project/bin

Installed /var/www/project/lib/python2.5/site-packages/pip-0.4-py2.5.egg
Processing dependencies for pip
Finished processing dependencies for pip
</pre>
</div>
</li>

<li>Install <tt>setuptools</tt>, also using <tt>easy_install</tt> (for
some reason, <tt>pip</tt> can't install it).  There is a trick here,
you need to specify at least version 0.6c9 or there will be issues
with the SVN version on Debian stable when you try to get Django in
the next step.

<div class="codebox">
<pre>
(project) /var/www/project$ easy_install setuptools==0.6c9
Searching for setuptools==0.6c9
Reading http://pypi.python.org/simple/setuptools/
Best match: setuptools 0.6c9
Downloading http://pypi.python.org/packages/2.5/s/setuptools/setuptools-0.6c9-py2.5.egg#md5=fe67c3e5a17b12c0e7c541b7ea43a8e6
Processing setuptools-0.6c9-py2.5.egg
Moving setuptools-0.6c9-py2.5.egg to /var/www/project/lib/python2.5/site-packages
Removing setuptools 0.6c8 from easy-install.pth file
Adding setuptools 0.6c9 to easy-install.pth file
Installing easy_install script to /var/www/project/bin
Installing easy_install-2.5 script to /var/www/project/bin

Installed /var/www/project/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg
Processing dependencies for setuptools==0.6c9
Finished processing dependencies for setuptools==0.6c9
</pre>
</div>
</li>

<li>Create a <tt>requirements.txt</tt> with the path to the Django
SVN for <tt>pip</tt> to install, then and then install it.

<div class="codebox">
<pre>
(project) /var/www/project$ cat requirements.txt
-e svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django
(project) /var/www/project$ pip install -r requirements.txt
Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
  Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django

(project) /var/www/project$ pip install -r requirements.txt
Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
  Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django
... so on ...
</pre>
</div>
</li>

<li>Almost there!  You can keep installing more Python requirements
with <tt>pip</tt> if you need, but we've got enough here to
start.</li>

<li>Create a file in <tt>/var/www/project</tt> called
<tt>project-python.py</tt>.  This will be the Python interpreter the
webserver uses, and basically exec's itself into the virtalenv.  The
file should contain the following:

<div class="codebox">
<pre>
activate_this = "/var/www/project/bin/activate_this.py"
execfile(activate_this, dict(__file__=activate_this))

from django.core.handlers.modpython import handler
</pre>
</div>
</li>

<li>Now it's time to start the Django project.  I like to create a new
directory called <tt>project</tt>, which will be the parent directory
kept in the SCM with all the code, media, templates, database (if
using SQLite) etc.  In this way to keep the two environments
up-to-date I simply <tt>svn ci</tt> on one side, and <tt>svn co</tt>
on the other.

<div class="codebox">
<pre>
(project) /var/www/project$ mkdir project
(project) /var/www/project/project$ mkdir db django media www
(project) /var/www/project/project$ cd django/
(project) /var/www/project/project/django$ django-admin startproject myproject
</pre>
</div>

</li>

<li>Last step now is to wire-up Apache to serve it all up.  The magic
is making sure you specify the correct <tt>PythonHandler</tt> that you
made before to use the virtualenv, and include the right paths so you
can find it and all the required Django settings.

<div class="codebox">
<pre>
DocumentRoot /var/www/project

&lt;Location "/"&gt;
    SetHandler python-program
    PythonHandler project-python
    PythonPath "['/var/www/project/','/var/www/project/project/django/'] + sys.path"
    SetEnv DJANGO_SETTINGS_MODULE myproject.settings
    PythonDebug On
&lt;/Location&gt;

Alias /media /var/www/project/project/media
&lt;Location "/media"&gt;
    SetHandler none
&lt;/Location&gt;
&lt;Directory "/var/www/project/project/media"&gt;
    AllowOverride none
    Order allow,deny
    Allow from all
    Options FollowSymLinks Indexes
&lt;/Directory&gt;
</pre>
</div>

</li>

</ol>

<p>With all this, you should be up and running in a basic but stable
environment.  It's easy enough to update packages for security fixes,
etc via <tt>pip</tt> after activating your virtualenv.</p>


]]></content:encoded><category domain="http://www.technovelty.org">linux/debian</category><dc:date>2009-09-11T12:49:00Z</dc:date></item><item><title>SIGTTOU and switching to canonical mode</title><guid isPermaLink="false">linux/tips/sigttou.html</guid><link>http://www.technovelty.org/linux/tips/sigttou.html</link><description>Here's an interesting behaviour that, as far as I can tell, is completley undocumented, sightly consfusing but fairly logical. Your ...</description><content:encoded><![CDATA[
<p>Here's an interesting behaviour that, as far as I can tell, is
completley undocumented, sightly consfusing but fairly logical.  Your
program should receive a SIGTTOU when it is running in the background
and attempts to output to the terminal -- the idea being that you
shouldn't scramble the output by mixing it in while the shell is
trying to operate.  Here's what the bash manual has to say</p>

<div class="codebox">
<pre>
Background processes are those whose process group ID differs from the
terminal's; such processes are immune to key- board-generated signals.
Only foreground processes are allowed to read from or write to the
terminal.  Background processes which attempt to read from (write to)
the terminal are sent a SIGTTIN (SIGTTOU) signal by the terminal
driver, which, unless caught, suspends the process.
</pre>
</div>

<p>So, consider the following short program, which writes some output
and catches any SIGTTOU's, with an optional flag to switch between
canonical and non-canonical mode.</p>

<div class="codebox">
<pre>
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;signal.h&gt;
#include &lt;termios.h&gt;
#include &lt;unistd.h&gt;

static void sig_ttou(int signo) {
   printf("caught SIGTTOU\n");
   signal(SIGTTOU, SIG_DFL);
   kill(getpid(), SIGTTOU);
}

int main(int argc, char *argv[]) {

   signal(SIGTTOU, sig_ttou);

   if (argc != 1) {
      struct termios tty;

      printf("setting non-canoncial mode\n");
      tcgetattr(fileno(stdout), &amp;tty);
      tty.c_lflag &amp;= ~(ICANON);
      tcsetattr(fileno(stdout), TCSANOW, &amp;tty);
   }

   int i = 0;
   while (1) {
      printf("  *** %d ***\n", i++);
      sleep(1);
   }
}
</pre>
</div>

<p>This program ends up operating in an interesting manner.</p>

<ol>
<li>Run in the background, canonical mode : no SIGTTOU and output gets multiplexed with shell.

<div class="codebox">
<pre>
$ ./sigttou &amp;
  *** 0 ***
[1] 26171
$   *** 1 ***
  *** 2 ***
  *** 3 ***
</pre>
</div>

</li>

<li>Run in the background, non-canonical mode : SIGTTOU delivered

<div class="codebox">
<pre>
$ ./sigttou 1 &amp;
[1] 26494
ianw@jj:/tmp$ setting non-canoncial mode
caught SIGTTOU


[1]+  Stopped                 ./sigttou 1
</pre>
</div>
</li>

<li>Run in the background, canonical mode, tostop set via stty :
SIGTTOU delivered, seemingly <i>after</i> a write proceeds

<div class="codebox">
<pre>
$ stty tostop
$ ./sigttou &amp;
[2] 26531
ianw@jj:/tmp$   *** 0 ***
caught SIGTTOU


[2]+  Stopped                 ./sigttou
</pre>
</div>

</li>
</ol>

<p>You can see a practical example of this by comparing the difference
between <code>cat file &amp;</code> and <code>more file &amp;</code>.
The semantics make some sense -- anything switching off canonical mode
is like to be going to really scramble your terminal, so it's good to
stop it and let it's terminal handling functions run.  I'm not sure
why canoncial background is considered useful mixed in with your
prompt, but someone, somewhere must have decided it was so.</p>

<p><b>Update</b>: upon further investigation, it is the switching of
terminal modes that invokes the SIGTTOU.  To follow the logic through
more, see the various users of <a
href="http://lxr.linux.no/#linux+v2.6.31/drivers/char/tty_io.c#L345">tty_check_change</a>
in the tty driver.</p>
]]></content:encoded><category domain="http://www.technovelty.org">linux/tips</category><dc:date>2009-08-21T01:02:00Z</dc:date></item><item><title>Using frozen chocolate to visualise microwave heat distribution</title><guid isPermaLink="false">humor/chocolate.html</guid><link>http://www.technovelty.org/humor/chocolate.html</link><description>My attempt at answering that most important of questions : where should one place their plate in the microwave to ...</description><content:encoded><![CDATA[
<p>My attempt at answering that most important of questions : where
should one place their plate in the microwave to achieve maximal
heating?</p>

<div style="text-align: center; margin: auto"><object type="application/x-shockwave-flash" style="width:640px; height:385px;" data="http://www.youtube.com/v/-4juszqMaYA">
<param name="movie" value="http://www.youtube.com/v/-4juszqMaYA" />
</object></div>
]]></content:encoded><category domain="http://www.technovelty.org">humor</category><dc:date>2009-07-23T02:23:00Z</dc:date></item><item><title>Review : The Race for a New Game Machine</title><guid isPermaLink="false">code/arch/game-machine.html</guid><link>http://www.technovelty.org/code/arch/game-machine.html</link><description>I recently finished The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation ...</description><content:encoded><![CDATA[
<p>I recently finished <a
href="http://www.amazon.com/Race-New-Game-Machine-Playstation/dp/0806531010">The
Race for a New Game Machine: Creating the Chips Inside the XBox 360
and the Playstation 3</a> (David Shippy and Mickie Phipps); an
interesting insight into the processor development process from some
of the lead architects.</p>

<p>The executive summary is : Sony, Toshiba and IBM (STI) decided to
get together to create the core of the Playstation 3 &mdash; the Cell
processor.  Sony, with their graphics and gaming experience, would do
the <i>Synergistic Processing Elements</i>; extremely fast but limited
sub-units specialising in doing 3D graphics and physics work
(i.e. great for games).  IBM would do a Power based core that handled
the general purpose computing requirements.</p>

<p>The twist comes when Microsoft came along to IBM looking for the
Xbox 360 processor, and someone at IBM mentioned the Power core that
was being worked on for the Playstation.  Unsurprisingly, the features
being built for the Playstaion also interested Microsoft, and the next
thing you know, IBM is working on the same core for Microsoft and Sony
at the same time, without telling either side.</p>

<p>This whole chain of events makes for a very interesting story.  The
book is written for a general audience, but you'll probably get the
most out of it if you already have some knowledge of computer
architecture; if you're trying to understand some of the concepts
referred to from the two line descriptions you'll get a bit lost
(H&amp;P it is not).</p>

<p>The only small criticism is that it sometimes falls into reading a
bit like a long LinkedIn recommendation.  However, the book is very
well paced, and throws in just enough technical tidbits amongst the
corporate and personal dramas to make it a very fun read.</p>

<p>One thing that is talked about a bit is the <i>fan-out of four</i>
(FO4) metric used in the designers quest to push the chip as fast as
possible (and, as mentioned many times in the book, faster than what
Intel could do!).  I thought it might be useful to expand on this
interesting metric a bit.</p>

<h4>FO4</h4>

<p>One problem facing chip architects is that, thanks to Moore's Law,
it is hard to find a constant to compare design versus implementation.
For example, you may design an amazing logic-block to factor large
integers into products of prime numbers, but somebody else with better
fabrication facilities might be able to essentially brute-force a
better solution by producing faster hardware using a much less
innovative design.</p>

<p>Some metric is needed that can compare the two designs discounting
who has the better fabrication process.  This is where the FO4 comes
in.</p>

<p>When you change the input to a logic gate, it is not like it
magically flips the output to the correct level instantaneously.
There is a latency while everything settles to its correct level
&mdash; the gate delay.  The more gates connected to the output of a
gate the more current required, which has additional effects on
latency.  The FO4 latency is defined as the time required to flip an
inverter gate connected to (fanned-out) to four other inverter
gates.</p>

<center>
<img src="http://www.technovelty.org/images/fo4.png" alt="Fan-out of four"></img>
</center>

<p>Thus you can describe the latency of other logic blocks in
multiples of FO4 latencies.  As this avoids measuring against
wall-time it is an effective description of the relative efficiency of
logic designs.  For example, you may calculate that your factoriser
has a latency of 100 FO4.  Just because someone else's 200 FO4
factoriser gets a result a few microseconds faster thanks to their
fancy ultra-low-FO4-latency fabrication process, you can still show
that your design, at least a priori, is better.</p>

<p>The book refers several times to efforts to reduce the FO4 of the
processor as much as possible.  The reason this is important in this
context is that the maximum latency on the critical path will
determine the fastest clock speed you can run the processor at.  For
reasons explained in the book high clock speed was a primary goal, so
every effort had to be made to reduce latencies.</p>

<p>All modern processors operate as a production line, with each stage
doing some work and passing it on to the next stage.  Clearly the
slowest stage determines the maximum speed that the production line
can run at (weakest link in the chain and all that).  For example, if
you clock at 1Ghz, that means each cycle takes 1 nanosecond (1s /
1,000,000,000Hz).  If you have a F04 latency of say, 10 picoseconds,
that means any given stage can have a latency of no more than 100 FO4
&mdash; otherwise that stage would not have enough time to settle and
actually produce the correct result.</p>

<p>Thus the smaller you can get the FO4 latencies of your various
stages, the higher you can safely up the clock speed.  One way around
long latencies might be to split-up your logic into smaller stages,
making a much longer pipeline (production line).  For example, split
your 100 FO4 block into two 50 FO4 stages.  You can now clock the
processor higher, but this doesn't necessarily mean you'll get actual
results out the end of the pipeline any faster (as Intel discovered
with the Pentium 4 and it's notoriously long pipelines and
corresponding high clock rates).</p>

<p>Of course, this doesn't even begin to describe the issues with
superscalar design, instruction level parallelism, cache interaction
and the myriad of other things the architects have to consider.</p>

<p>Anyway, after reading this book I guarantee you'll have an
interesting new insight the next time you fire-up Guitar Hero.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/arch</category><dc:date>2009-07-15T09:15:00Z</dc:date></item><item><title>Dig Jazz Applet, V2</title><guid isPermaLink="false">code/gnome/dig-jazz-applet-v2.html</guid><link>http://www.technovelty.org/code/gnome/dig-jazz-applet-v2.html</link><description>It seems the ABC updated the DIG Jazz now-playing list format, breaking V1. Some quick flash disassembly and a bit ...</description><content:encoded><![CDATA[
<p>It seems the ABC updated the <a href="www.abc.net.au/dig/jazz/">DIG
Jazz</a> now-playing list format, breaking V1.  Some quick flash
disassembly and a bit of hacking, and order is restored.  As a bonus,
it now shows the upcoming songs.</p>

<center>
<img src="http://www.technovelty.org/images/dig-jazz-applet-v2.png" alt="DIG Jazz now-playing Gnome applet" />
</center>

<p><a href="http://www.wienand.org/junkcode/dig-jazz-applet/">Source</a> or
<a href="http://www.wienand.org/junkcode/dig-jazz-applet/dig-jazz-applet_2.0-1_all.deb">Debian package</a>.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/gnome</category><dc:date>2009-05-18T13:20:00Z</dc:date></item><item><title>Quickly describing hash utilisation</title><guid isPermaLink="false">code/hash-ratio.html</guid><link>http://www.technovelty.org/code/hash-ratio.html</link><description>I think the most correct way to describe utilisation of a hash-table is using chi-squared distributions and hypothesis and degrees ...</description><content:encoded><![CDATA[
<p>I think the most correct way to describe utilisation of a hash-table
is using chi-squared distributions and hypothesis and degrees of
freedom and a bunch of other things nobody but an actuary remembers.
So I was looking for a quick method that was close-enough but didn't
require digging out a statistics text-book.</p>

<p>I'm sure I've re-invented some well-known measurement, but I'm not
sure what it is.  The idea is to add up the total steps required to
look-up all elements in the hash-table, and compare that to the
theoretical ideal of a uniformly balanced hash-table.  You can then
get a ratio that tells you if you're in the ball-park, or if you
should try something else.  A diagram should suffice.</p>

<img src="http://www.technovelty.org/images/hash-utilisation.png"
alt="Scheme for acquiring a hash-utilisation ratio" />

<p>This seems to give quite useful results with a bare minimum of
effort, and most importantly no tricky floating point math.  For
example, on the standard Unix <tt>words</tt> with a 2048 entry
hash-table, the standard DJB hash came out very well (as expected)</p>

<div class="codebox">
<pre>
Ideal 2408448
Actual 2473833
----
Ratio 0.973569
</pre>
</div>

<p>To contrast, a simple "add each character" type hash:</p>

<div class="codebox">
<pre>
Ideal 2408448
Actual 6367489
----
Ratio 0.378241
</pre>
</div>

<p>Example code is <a
href="http://www.wienand.org/junkcode/python/hash-ratio.py">hash-ratio.py</a>.
I expect this measurement is most useful when you have a largely
static bunch of data for which you are attempting to choose an
appropriate hash-function.  I guess if you are really trying to hash
more or less random incoming data and hence only have a random sample
to work with, you can't avoid doing the "real" statistics.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code</category><dc:date>2009-05-07T06:37:00Z</dc:date></item><item><title>Relocation truncated to fit - WTF?</title><guid isPermaLink="false">code/c/relocation-truncated.html</guid><link>http://www.technovelty.org/code/c/relocation-truncated.html</link><description>If you code for long enough on x86-64, you'll eventually hit an error such as: (.text+0x3): relocation truncated to fit: ...</description><content:encoded><![CDATA[
<p>If you code for long enough on x86-64, you'll eventually hit an
error such as:</p>

<div class="codebox">
<pre>
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `array' defined in foo section in ./pcrel8.o
</pre>
</div>

<p>Here's a little example that might help you figure out what you've
done wrong.</p>

<p>Consider the following code:</p>

<div class="codebox">
<pre>
<b>$ cat foo.s</b>
.globl foovar
  .section   foo, "aw",@progbits
  .type foovar, @object
  .size foovar, 4
foovar:
   .long 0

.text
.globl _start
 .type function, @function
_start:
  movq $foovar, %rax
</pre>
</div>

<p>In case it's not clear, that would look something like:</p>

<div class="codebox">
<pre>
int foovar = 0;

void function(void) {
  int *bar = &amp;foovar;
}
</pre>
</div>

<p>Let's build that code, and see what it looks like</p>

<div class="codebox">
<pre>
<b>$ gcc -c foo.s</b>

<b>$ objdump --disassemble-all ./foo.o</b>

./foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 &lt;_start&gt;:
   0:		 48 c7 c0 00 00 00 00	mov    $0x0,%rax

Disassembly of section foo:

0000000000000000 &lt;foovar&gt;:
   0:		 00 00			add    %al,(%rax)
   ...
</pre>
</div>

<p>We can see that the <tt>mov</tt> instruction has only allocated 4
bytes (<tt>00 00 00 00</tt>) for the linker to put in the address of
<tt>foovar</tt>.  If we check the relocations:</p>

<div class="codebox">
<pre>
<b>$ readelf --relocs ./foo.o</b>

Relocation section '.rela.text' at offset 0x3a0 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  00050000000b R_X86_64_32S      0000000000000000 foovar + 0
</pre>
</div>

<p>The <tt>R_X86_64_32S</tt> relocation is indeed only a 32-bit
relocation.  Now we can tickle this error.  Consider the following
linker script, which puts the <tt>foo</tt> section about 5 gigabytes
away from the code.</p>

<div class="codebox">
<pre>
<b>$ cat test.lds</b>
SECTIONS
{
 . = 10000;
 .text : { *(.text) }
 . = 5368709120;
 .data : { *(.foo) }
}
</pre>
</div>

<p>This now means that we can not fit the address of <tt>foovar</tt>
inside the space allocated by the relocation.  When we try it:</p>

<div class="codebox">
<pre>
<b>$ ld -Ttest.lds ./foo.o</b>
./foo.o: In function `_start':
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `foovar' defined in foo section in ./foo.o
</pre>
</div>

<p>What this means is that the full 64-bit address of <tt>foovar</tt>,
which now lives somewhere above 5 gigabytes, can't be represented
within the 32-bit space allocated for it.</p>

<p>For code optimisation purposes, the default immediate size to the
<tt>mov</tt> instructions is a 32-bit value.  This makes sense
because, for the most part, programs can happily live within a 32-bit
address space, and people don't do things like keep their data so far
away from their code it requires more than a 32-bit address to
represent it.  Defaulting to using 32-bit immediates therefore cuts
the code size considerably, because you don't have to make room for a
possible 64-bit immediate for every <tt>mov</tt>.</p>

<p>So, if you want to <i>really</i> move a full 64-bit immediate into
a register, you want the <tt>movabs</tt> instruction.  Try it out with
the code above - with <tt>movabs</tt> you should get a
<tt>R_X86_64_64</tt> relocation and 64-bits worth of room to patch up
the address, too.</p>

<p>If you're seeing this and you're not hand-coding, you probably want
to check out the <tt>-mmodel</tt> argument to <tt>gcc</tt>.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/c</category><dc:date>2009-03-12T12:20:00Z</dc:date></item><item><title>YUI ButtonGroup Notes</title><guid isPermaLink="false">web/yui-buttongroup.html</guid><link>http://www.technovelty.org/web/yui-buttongroup.html</link><description>Some tips and things to check if your YUI ButtonGroup isn't behaving as you wish it would. Double-check your &lt;body&gt; ...</description><content:encoded><![CDATA[
<p>Some tips and things to check if your <a
href="http://developer.yahoo.com/yui/button/#usingbuttongroup">YUI
ButtonGroup</a> isn't behaving as you wish it would.</p>

<ul>
<li><p>Double-check your <tt>&lt;body&gt;</tt> tag has <tt>class="yui-skin-sam"</tt></p></li>

<li><p>Unlike in the documentation example, you can't just put a call
to <tt>YAHOO.widget.ButtonGroup</tt> pointing to your <tt>div</tt>
anywhere in your HTML and expect it to work.  You've got to wait for
it to be ready with something like:</p>

<div class="codebox">
<pre>
&lt;script type="text/javascript"&gt;
YAHOO.util.Event.onContentReady("my_button_div", function() {
  var oButtonGroup = new YAHOO.widget.ButtonGroup("my_button_div");
});
&lt;/script&gt;
</pre>
</div>
</li>

<li><p>You can easily get an image in each button.  For example, if
your button is defined as:</p>

<div class="codebox">
<pre>
 &lt;span id="my-button-id" class="yui-button yui-radio-button yui-button-checked"&gt;
  &lt;span class="first-child"&gt;
    &lt;button type="button" hidefocus="true"&gt;&lt;/button&gt;
  &lt;/span&gt;
 &lt;/span&gt;
</pre>
</div>

<p>Simply add a CSS class something like:</p>

<div class="codebox">
<pre>
.yui-button#my-button-id button { background:url(http://server/image.jpg) 50% 50% no-repeat; }
</pre>
</div>

</li>

</ul>

<p>Hopefully, this will save someone else a few hours!</p>
]]></content:encoded><category domain="http://www.technovelty.org">web</category><dc:date>2009-03-02T12:36:00Z</dc:date></item><item><title>rdtsc - now even less useful!</title><guid isPermaLink="false">code/arch/rdtsc.html</guid><link>http://www.technovelty.org/code/arch/rdtsc.html</link><description>An interesting extract from the latest IA32 SDM (18.20.5) The TSC, IA32_MPERF, and IA32_FIXED_CTR2 operate at the same, maximum-resolved frequency ...</description><content:encoded><![CDATA[
<p>An interesting extract from the latest <a
href="http://download.intel.com/design/processor/manuals/253669.pdf">IA32
SDM</a> (18.20.5)</p>

<blockquote>

<p>The TSC, IA32_MPERF, and IA32_FIXED_CTR2 operate at the same,
maximum-resolved frequency of the platform, which is equal to the
product of scalable bus frequency and maximum resolved bus
ratio.</p>

</blockquote>

<blockquote>

<p>For processors based on Intel Core microarchitecture, the
scalable bus frequency is encoded in the bit field MSR_FSB_FREQ[2:0]
at (0CDH), see Appendix B, "Model-Specific Registers (MSRs)". The
maximum resolved bus ratio can be read from the following bit
field:</p>

<ul>
<li>If XE operation is disabled, the maximum resolved bus ratio can be
     read in MSR_PLATFORM_ID[12:8]. It corresponds to the maximum
     qualified frequency.</li>

<li>IF XE operation is enabled, the maximum resolved bus ratio is
     given in MSR_PERF_STAT[44:40], it corresponds to the maximum XE
     operation frequency configured by BIOS.</li>
</ul>
</blockquote>

<p>In summary, <tt>TSC increment = (scalable bus frequency) * (maximum
resolved bus ratio)</tt>.  This implies the TSC is incrementing based
on some external bus source (any hardware engineers explain what
happened for Core here?), and is a departure from simply assuming that
the TSC increments once for each CPU cycle.</p>

<p>The interesting bit is that if <tt>XE</tt> operation is disabled,
the bus ratio is assumed to be the maximum <b>qualified</b> frequency.
This seems to mean that if you overclock your CPU and your processor
is running at <i>higher</i> than the qualified frequency, attempts to
measure the CPU speed by counting TSC ticks over a given time may
yeild the wrong results (well, will yield the <i>rated</i> result;
i.e. the speed of the processor you bought out of the box).</p>

<p>While interesting, this divergence is probably has little practical
implications because using the TSC for benchmarking is already fraught
with danger.  You have to be super careful to make sure the compiler
and processor don't reschedule things around you and handle other
architectural nuances.  If you need this level of information, you're
much better using the right tools to get it (my favourite is <a
href="http://perfmon2.sourceforge.net/pfmon_usersguide.html">perfmon2</a>).</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/arch</category><dc:date>2009-02-26T04:54:00Z</dc:date></item><item><title>Converting DICOM images</title><guid isPermaLink="false">linux/tips/converting-dicom.html</guid><link>http://www.technovelty.org/linux/tips/converting-dicom.html</link><description>If you go for an ultrasound or some other imaging procedure, they may give you a CD with the images ...</description><content:encoded><![CDATA[
<p>If you go for an ultrasound or some other imaging procedure, they
may give you a CD with the images that requires some overly
complicated and under-featured Windows viewer.  Chances are these
images are in <a href="http://en.wikipedia.org/wiki/DICOM">DICOM</a>
format, which is like the AVI of the medical world.</p>

<p>Your first clue will be that <tt>file</tt> might report the file as
an unoptimised QuickTime movie, e.g.</p>

<div class="codebox">
<pre>
$ file ./QMAG0001
./QMAG0001: Apple QuickTime movie (unoptimized)
</pre>
</div>

<p>After figuring out the file type wasn't actually anything to do
with QuickTime, I tried some of the <b>many</b> different tools and
methods to convert this to something viewable.  Unfortunatley, the
DICOM viewer in GIMP and ImageMagick (probably the same thing?) didn't
like the files at all, and neither did a range of other tools.  I
finally managed to do it with the <tt>dcm2pnm</tt> tool from the
Debian <a
href="http://packages.debian.org/sid/dcmtk"><tt>dcmtk</tt></a> package
-- just point it at the file and it spits out a PNM which is easily
converted by all graphics tools.</p>

<p>You can also encapsulate a series of images in a DICOM file, like a
little movie.  <tt>dcm2pnm</tt> extracts these easily, but requires
the <tt>--all-frames</tt> options.  An <tt>ffmpeg</tt> recipe to turn
these extracted files into a more easily viewable movie is:</p>

<div class="codebox">
<pre>
$ ffmpeg -qscale 5 -r 20 -b 9600 -i foo.%d.ppm movie.mp4
</pre>
</div>

<p>I certainly can't guarantee this will actually work for you, as
DICOM appears to be an extremely complicated format with many possible
vendor extensions.  But hopefully it's a starting point!</p>
]]></content:encoded><category domain="http://www.technovelty.org">linux/tips</category><dc:date>2009-02-08T03:11:00Z</dc:date></item><item><title>On Complexity</title><guid isPermaLink="false">code/complexity.html</guid><link>http://www.technovelty.org/code/complexity.html</link><description>Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. Alan J. Perlis, Eipgrams on Programming , ...</description><content:encoded><![CDATA[
<blockquote><p>Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it.</p></blockquote>

<p>Alan J. Perlis, <i><a
href="http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html">Eipgrams
on Programming</a></i>, SIGPLAN Notices Vol. 17, No. 9, September
1982, pages 7-13.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code</category><dc:date>2009-02-04T12:23:00Z</dc:date></item><item><title>NoMachine NX - the missing non-manual</title><guid isPermaLink="false">linux/tips/nomachine.html</guid><link>http://www.technovelty.org/linux/tips/nomachine.html</link><description>I've been meaning to try NoMachine NX for a while. Its promise of fast remote X11 sessions sounded exactly like ...</description><content:encoded><![CDATA[
<p>I've been meaning to try <a
href="http://www.nomachine.com/">NoMachine NX</a> for a while.  Its
promise of fast remote X11 sessions sounded exactly like what I wanted
to log into my work desktop remotely (I really like having a remote
desktop with saved state you can just pick up from when using remote
access).  That was pretty much all I knew about the software, so I was
a completely blank slate.</p>

<p>The <a
href="http://www.nomachine.com/documents/getting-started.php">getting
started guide</a> is the perfect example of how <b>not</b> to write a
getting started guide.</p>

<p>Firstly, Section 1 - "Getting started" - gives me a full history of
the product, goes into significant depth about the challenges of
forwarding X11 requests, talks about the caching and compression
implementation, round-trip latency measurement, the details of two-way
proxying system and discusses every other feature of the
software.</p>

<p>My eyes glazed over after about the first paragraph.  That's all
great -- I just want to know what to do!</p>

<p>At this point, I assume that I'm required to run some sort of
daemon at the remote end.  I download and install the server package
(it is explained that the server package requires the client and agent
packages as well, fine).</p>

<p>I'm paging down, looking for something to get me started.  I'm
happy to see Section 7 - "Set up your NX Server environment"
(remember, at this point I though I needed some daemon running in the
background constantly).  It even has some commands commands to type,
so I tap away, running <tt>nxserver --useradd nxtest --system</tt>.
My server binary doesn't even seem to recognise these options.  I give
up, assuming that the server isn't running and nothing will work.  The
getting started guide has abruptly ended and I have no idea what to
do.</p>

<p>As it turns out, it's all completely trivial.  Here's the missing
"getting started guide".</p>

<ul>

<li><a
href="http://www.nomachine.com/select-package.php?os=linux&amp;id=1">Download</a>
and install the client, agent and server packages on the remote end.
You need to have ssh access to this box.</li>

<li>Install the client on your end.</li>

<li>Run <tt>/usr/NX/bin/nxclient</tt>.  It will start a wizard where
you input the remote host name.</li>

<li>The client will, under the hood, ssh to the remote end, open the
tunnel it needs, start the server and do all the magic required to
make things "just work".  A remote desktop will appear.</li>

<li>That's it!</li>
</ul>

<p>Additional tips:</p>

<ul>

<li>It's easy to tunnel this connection (for example, if you have to
bounce through a ssh gateway to your internal network).  Do something
like <tt>/usr/NX/bin/nxssh -o 'Compression=no' -L 2022:remote.host:22
-f -N user@sshgateway.company.com</tt> and then connect the client to
<tt>localhost:2022</tt>.  You don't want to compress this link, as NX
is already doing it.</li>

<li>The only way I can find to make a new session is to start
<tt>nxclient</tt> with the <tt>--wizard</tt> command.</li>

<li>Don't click "Disable encryption of all traffic" if you're
tunneling.  AFAICT this tries to redirect the client to a
non-encrypted port, which obviously won't get through.</li>

</ul>

<p>Other than the documentation, it really works as promised, making
remote X11 usable.  One really nice feature is that it is smart about
the resolution of the remote desktop, filling up your local screen.
Add to that you don't need anything setup but your normal ssh
connection, and it's a great remote desktop solution.</p>
]]></content:encoded><category domain="http://www.technovelty.org">linux/tips</category><dc:date>2009-02-04T04:10:00Z</dc:date></item><item><title>Facebook, API's, photos and IPTC data</title><guid isPermaLink="false">code/web/facebook-photos.html</guid><link>http://www.technovelty.org/code/web/facebook-photos.html</link><description>As a photo management application, Facebook sucks. But it is something that people actually look at (as opposed to Flickr, ...</description><content:encoded><![CDATA[
<p>As a photo management application, Facebook sucks.  But it is
something that people actually look at (as opposed to Flickr, which is
great, but getting people to log-in or follow special guest pass links
is a PITA).</p>

<p>I like to keep all my raw photos locally, using IPTC for comments
(which Flickr reads -- I put them in using some custom scripts and the
Python bindings of <a
href="http://libiptcdata.sourceforge.net/">libiptcdata</a>) and
geo-tagged in the EXIF data (using my google maps <a
href="http://www.wienand.org/map/latlong.html">point locator</a>).  I
figure this way if Flickr goes bust/gets bought by Microsoft all I
need to do is re-upload somewhere else.</p>

<p>I was waiting for Flickr to integrate with Facebook in some good
way, but I then came across the <i>very</i> useful <a
href="http://code.google.com/p/pyfacebook/">pyfacebook</a> bindings,
which, although being a little light on documentation, is a great way
to easily throw my photos into Facebook (it's pending the NEW queue in
Debian, see <a
href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=511279">#511279</a>).</p>

<p>My <a
href="http://www.wienand.org/junkcode/python/fbupload/fbupload.py">fbupload.py</a>
script might be a useful starting point if you want to do the same
thing.  It batches up photos into lots of 60 (the maximum photos in an
album) and automatically creates the albums and uploads the photos,
reading the IPTC data for comments.  The only problem is that you'll
have to sign up for a developer key and start a new application to get
a secret key to talk to the API (if you're still reading this, I'm
sure you can figure it out!).</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/web</category><dc:date>2009-01-09T12:31:00Z</dc:date></item><item><title>Streaming various radio streams to FStream on the iPhone</title><guid isPermaLink="false">toys/iphone-streaming.html</guid><link>http://www.technovelty.org/toys/iphone-streaming.html</link><description>FStream is a really neat streaming radio program for the iPhone. Although it supports various WMA streams, I found that ...</description><content:encoded><![CDATA[
<p>FStream is a really neat streaming radio program for the iPhone.
Although it supports various WMA streams, I found that it did not
successfully work with some of the Australian <a
href="http://www.abc.net.au/">ABC</a> WMA streaming radio
services.</p>

<p>The most reliable method seems to simply use a low-bandwidth MP3
stream over HTTP (24 kbps sounds fine and works great even over Edge).
I could find a number of other blogs, etc. with static methods for
streaming, but nothing that reliably did on-the-fly conversion of an
incoming stream.</p>

<p>My solution is simple Python HTTP server I'm calling <a
href="http://www.wienand.org/junkcode/stream2mp3/stream2mp3.py">stream2mp3</a>.
It uses mplayer, lame and a few pipes to take the incoming stream
(which is pretty much anything mplayer can handle, which is pretty
much anything unencrypted) and spit it out as a low-bandwidth MP3
stream over HTTP.</p>

<p>It seems to reliably handle dropped and closed connections, and
clean-up after itself.  I'd certainly be interested in any bug fixes
or suggestions.  I guess the major disadvantages is you need a
dedicated server (get yourself a <a
href="http://www.linode.com">linode</a>!), it only handles one
connection at a time, and if you want multiple stations I guess you
run multiple instances on different ports.</p>

<p>With this, you can be sitting in traffic on the 101 heading to San
Francisco and, with some local radio, it's just like you're sitting in
traffic on the M2 in Sydney!  Here's a screenshot:</p>

<div class="codebox">
<pre>
~/bin$ python stream2mp3.py
Creating WAV fifo /tmp/incoming.wav
Creating MP3 fifo /tmp/output.mp3
Serving &lt;mms://media3.abc.net.au/702Sydney&gt; on port XXXX
mplayer running as 8524
lame running as 8525
mobile-XXX-XXX-130-107.mycingular.net - - [23/Dec/2008 18:59:22] "GET / HTTP/1.1" 200 -
[radio plays until I stop it...]
connection lost
cleanup complete, ready
</pre>
</div>
]]></content:encoded><category domain="http://www.technovelty.org">toys</category><dc:date>2008-12-22T13:00:00Z</dc:date></item><item><title>Position Independent Code and x86-64 libraries</title><guid isPermaLink="false">code/c/amd64-pic.html</guid><link>http://www.technovelty.org/code/c/amd64-pic.html</link><description>If you've ever tried to link non-position independent code into a shared library on x86-64, you should have seen a ...</description><content:encoded><![CDATA[
<p>If you've ever tried to link non-position independent code into a
shared library on x86-64, you should have seen a fairly cryptic error
about invalid relocations and missing symbols.  Hopefully this will
clear it up a little!</p>

<p>Let's start with a small program to illustrate.</p>

<div class="codebox">
<pre>
<b>$ cat function.c</b>
int global = 100;

int function(int i) {
	return i + global;
}
<b>$ gcc -c function.c</b>
</pre>
</div>

<p>Firstly, inspect the disassembley of this function:</p>

<div class="codebox">
<pre>
0000000000000000 &lt;function&gt;:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	8b 05 00 00 00 00    	mov    0x0(%rip),%eax        # d &lt;function+0xd&gt;
   d:	03 45 fc             	add    -0x4(%rbp),%eax
  10:	c9                   	leaveq
  11:	c3                   	retq
</pre>
</div>

<p>Lets just go through that for clarity:</p>
<ul>

<li><b>0</b>,<b>1</b>: save <tt>rbp</tt> to the stack and save the
stack pointer (<tt>rsp</tt>) to <tt>rbp</tt>.  This common stanza is
setting up the <i>frame pointer</i>, which is essentially a rule used
by debuggers (mostly) to keep track of the base of the stack.  It's
not important for now.</li>

<li><b>4</b>:Move the value from <tt>edi</tt> to 4 bytes below the
stack pointer.  This is moving the first argument (<tt>int i</tt>)
into the "red-zone", a 128-byte scratch area each function has
reserved below the stack pointer.</li>

<li><b>7</b>,<b>d</b>: Move the value at offset 0 from the current
instruction pointer (<tt>rip</tt>) into <tt>eax</tt> (by convention
the return value is left in register <tt>eax</tt>).  Then add the
incoming argument to it (retrieved from the scratch area);
i.e. <tt>return global + i</tt></li>

</ul>

<p>The IP relative move is really the trick here.  We know from the
code that it has to move the value of the <tt>global</tt> variable
here.  The zero value is simply a place holder - the compiler
currently does not determine the required address (i.e. how far away
from the instruction pointer the memory holding the <tt>global</tt>
variable is).  It leaves behind a <i>relocation</i> -- a note that
says to the linker "you should determine the correct address of
<i>foo</i> (<tt>global</tt> in our case), and then patch this bit of
the code to point to that addresss (i.e. <i>foo</i>)."</p>

<img src="http://www.technovelty.org/images/pic.png" alt="Relocations with addend" />

<p>The top portion of the image above gives some idea of how it works.
We can examine relocations in binaries with the <tt>readelf</tt>
tool.</p>

<div class="codebox">
<pre>
<b>$ readelf --relocs ./function.o</b>

Relocation section '.rela.text' at offset 0x518 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000009  000800000002 R_X86_64_PC32     0000000000000000 global + fffffffffffffffc
</pre>
</div>

<p>There are many different types of relocations for different
situations; the exact rules for different relocation types are
described in the ABI documentation for the architecture.  The
<tt>R_X86_64_PC32</tt> relocation is defined as "the base of the
section the symbol is within, plus the symbol value, plus the addend".
The addend makes it look more tricky than it is; remember that when an
instruction is executing the instruction pointer points to the
<i>next</i> instruction to be executed.  Therefore, to correctly find
the data relative to the instruction pointer, we need to subtract the
extra.  This can be seen more clearly when layed out in a linear
fashion (as in the bottom of the above diagram).</p>

<p>If you try and build a shared object (dynamic library) with an
object file with this type of relocation, you should get something
like:</p>

<div class="codebox">
<pre>
<b>$ gcc -shared function.c</b>
/usr/bin/ld: /tmp/ccQ2ttcT.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/tmp/ccQ2ttcT.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
</pre>
</div>

<p>The specific problem is how this relocation interacts with
<i>Position Independent Code</i> (PIC, enabled with <tt>-fPIC</tt>).
PIC just means that the output binary does not expect to be loaded at
a particular base address, but is happy being put anywhere in memory
(compare the output of <tt>readelf --segments</tt> on a binary such as
<tt>/bin/ls</tt> to that of any shared library).  This is obviously
critical for implementing lazy-loading (i.e. only loaded when
required) shared-libraries, where you may have many libraries loaded
in essentially any order.  Trying to pre-allocate where in memory they
would all live is completely impractical and just does not work (not
to mention every single library that might ever be used would be
competing for a spot in the limited address space of a 32-bit
process!).</p>

<p>What's the specific problem with this relocation in a shared
library?  In a shared library situation, we can not depend on the
local value of <tt>global</tt> actually being the one we want.
Consider the following example, where we override the value of global
with a <tt>LD_PRELOAD</tt> library.</p>

<div class="codebox">
<pre>
<b>$ cat function.c</b>
int global = 100;

int function(int i) {
	return i + global;
}
<b>$ gcc -fPIC -shared -o libfunction.so function.c</b>

<b>$ cat preload.c</b>
int global = 200;
<b>$ gcc -shared preload.c -o libpreload.so</b>

<b>$ cat program.c</b>
#include &lt;stdio.h&gt;

int function(int i);

int main(void) {
   printf("%d\n", function(10));
}
<b>$ gcc -L. -lfunction program.c -o program</b>

<b>$ LD_LIBRARY_PATH=. ./program</b>
110
<b>$ LD_PRELOAD=libpreload.so LD_LIBRARY_PATH=. ./program</b>
210
</pre>
</div>

<p>If the code in <tt>libfunction.so</tt> has a fixed offset into its
own data section, it will not be able to see the overridden value
provided by <tt>libpreload.so</tt>.  This is <b>not</b> the case when
building a stand-alone executable, where references are satisfied
internally.</p>

<p>Of course, any problem in computer science can be solved with a
layer of abstraction, and that is what is done when compiling with
<tt>-fPIC</tt>.  To examine this case, let's see what happens with PIC
turned on.</p>

<div class="codebox">
<pre>
<b>$ gcc -fPIC -shared -c  function.c</b>
<b>$ objdump --disassemble ./function.o</b>

./function.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 &lt;function&gt;:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	48 8b 05 00 00 00 00 	mov    0x0(%rip),%rax        # e &lt;function+0xe&gt;
   e:	8b 00                	mov    (%rax),%eax
  10:	03 45 fc             	add    -0x4(%rbp),%eax
  13:	c9                   	leaveq
  14:	c3                   	retq
</pre>
</div>

<p>It's <i>almost</i> the same!  We setup the frame pointer with the
first two instructions as before.  We push the first argument into
memory in the pre-allocated "red-zone" as before.  Then, however, we
do an IP relative load of an address into <tt>rax</tt>.  Next we
de-reference this into <tt>eax</tt> (e.g. <tt>eax = *rax</tt> in C)
before adding the incoming argument to it and returning.</p>

<div class="codebox">
<pre>
<b>$ readelf --relocs ./function.o</b>

Relocation section '.rela.text' at offset 0x550 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000800000009 R_X86_64_GOTPCREL 0000000000000000 global + fffffffffffffffc
</pre>
</div>

<p>The magic here is again in the relocations.  Notice this time we
have a <tt>P_X86_64_GOTPCREL</tt> relocation.  This says "replace the
data at offset <tt>0xa</tt> with the <i>global offset table</i> (GOT)
entry of <tt>global</tt>.</p>

<img src="http://www.technovelty.org/images/got.png" alt="Global Offset Table operation with data variables" />

<p>As shown above, the GOT ensures the abstraction required so symbols
can be diverted as expected.  Each entry is essentially a pointer to
the real data (hence the extra dereference in the code above).  Since
the GOT is at a fixed offset from the program code, it can use an IP
relative address to gain access to the table entries.</p>

<p>This extra reference is obviously slower; however for the most part
I imagine the overhead would be essentially immeasurable and is
required for "generic" operation.  If you have figured the cost of
indirection through the GOT is the major bottleneck of your program, I
imagine you wouldn't be reading this and would already be considering
strategies to remove it!</p>

<p>The next question is why this works on plain old x86-32.
Inspecting the code reveals why:</p>

<div class="codebox">
<pre>
<b>$ objdump --disassemble ./function.o</b>
00000000 &lt;function&gt;:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	a1 00 00 00 00       	mov    0x0,%eax
   8:	03 45 08             	add    0x8(%ebp),%eax
   b:	5d                   	pop    %ebp
   c:	c3                   	ret
<b>$ readelf --relocs ./function.o</b>
Relocation section '.rel.text' at offset 0x2ec contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000701 R_386_32          00000000   global
</pre>
</div>

<p>We start out the same, with the first two instructions setting up
the frame pointer.  However, next we load a memory value into
<tt>eax</tt> -- as we can see from the relocation information, the
address of <tt>global</tt>.  Next we add the incoming argument from
the stack (<tt>0x8(%ebp)</tt>) to the value in this memory location;
implicitly dereferencing it.  This provides the abstraction we need --
if the relocation makes the patched address at <tt>0x4</tt> the
address of the GOT entry, it will be correctly dereferenced.  It is
the inability of the x86-32 architecture to try and optimise by doing
instruction-pointer relative offseting which means it always needs to
do slower memory references, which turns out to be just what you want
when you're making a shared library!</p>

<p>So, the executive summary: the ability of x86-64 to use
instruction-pointer relative offsetting to data addresses is a nice
optimisation, but in a shared-library situation assumptions about the
relative location of data are invalid and can not be used.  In this
case, access to global data (i.e. anything that might be changed
around on you) must go through a layer of abstraction, namely the
global offset table.</p>
]]></content:encoded><category domain="http://www.technovelty.org">code/c</category><dc:date>2008-11-26T02:53:00Z</dc:date></item></channel></rss>
