<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Jakob Engblom</title>
    
    <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/" />
    <id>tag:typepad.com,2003:weblog-86673406891497227</id>
    <updated>2013-04-22T16:34:24-07:00</updated>
    <subtitle>Real Virtuality</subtitle>
    <generator uri="http://www.typepad.com/">TypePad</generator>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/engblom" /><feedburner:info uri="engblom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:browserFriendly></feedburner:browserFriendly><entry>
        <title>Serving Windows Files from a Simics Quick-Start Platform</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2013/04/serving-windows-files-from-a-simics-quick-start-platform.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2013/04/serving-windows-files-from-a-simics-quick-start-platform.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017eea7783d0970d</id>
        <published>2013-04-22T16:34:24-07:00</published>
        <updated>2013-04-22T16:34:24-07:00</updated>
        <summary>Windows file sharing has always felt a bit magical to me. I use it all the time, certainly, but I never quite understood how it worked; it was just this big chunk of Microsoft protocol that felt like it really did not want to talk to other types of operating systems. Sure, I have used the open-source "samba" server for a long time with great success… but it always seemed to suffer from issues with access rights (probably the fault of me and server setup and mixing Unix and Windows accounts, not a fault in the server itself). With this...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Networking" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Open Standards" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Windows file sharing has always felt a bit magical to me. I
use it all the time, certainly, but I never quite understood how it worked;
it was just this big chunk of Microsoft protocol that felt like it really did not want to talk to
other types of operating systems. Sure, I have used the open-source "samba"
server for a long time with great success… but it always seemed to suffer from
issues with access rights (probably the fault of me and server setup and mixing Unix and Windows accounts, not a
fault in the server itself). </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d43032766970c-pi" style="float: right;"><img alt="Visualitylogo" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d43032766970c" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d43032766970c-800wi" style="margin: 0px 0px 5px 5px;" title="Visualitylogo" /></a>With this background, I was really delighted and surprised when I managed
to almost effortlessly get a Simics target to share and serve files to my
Windows host. A <a href="http://blogs.windriver.com/tools/2012/07/inside-a-synthetic-simulation-platform.html" target="_self">big-endian Power Architecture QSP target </a>at that,
running VxWorks, how cool isn’t that? The key to the puzzle was the <strong>Visuality NQ Server</strong>  from <a href="http://www.visualitynq.com/" target="_self">Visuality Systems </a>Ltd. in Israel. It
just worked, and let me try some interesting and educational setups. </p>

<p>The setup I used is shown in the picture below. I have a
Simics QSP for PPC (Power), running VxWorks 6.9.2. On top if this, I run the Visuality
<strong>CIFS NQ version 7.00</strong> Server
software. The QSP target is on a virtual network, and this network is also
connected to the host (otherwise, talking to it from the host would have been
kind of hard). In some development work, we also used Wind River Workbench
talking to a debug agent in the target VxWorks to debug the setup and load the
server as a loadable kernel module. Basically, Simics provides the equivalent
of a generic Power Architecture development board. The communication between
the targets and the host is done using <strong>CIFS/SMB/SMB2
</strong>protocol. This protocol is natively supported in Windows while NQ  Server delivers it to the embedded/mobile
world<strong>.  </strong></p>
<p><strong>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e201901b7a1b4f970b-pi" style="display: inline;"><img alt="Nq-slide-1" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e201901b7a1b4f970b image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e201901b7a1b4f970b-800wi" title="Nq-slide-1" /></a><br /></strong></p>
<p>When I opened up a Windows Explorer and directed it to
\\10.10.0.2\, the view below results. We can see the default demo files
included on the tffs-formatted flash disk image used with the QSP VxWorks setup
(mounted as /tffs0 in VxWorks). The file listing from the VxWorks serial
console is the same as the file listing seen in Windows explorer. My biggest
issue here was just to convince Windows that the new network that showed up at
10.10.0.x was a friendly one that should be allowed to connect over CIFS to the
host machine. For very good reasons, Windows does not necessarily consider an
unknown network friendly, so I had tell it my Simics virtual network was really
my home network. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017eea777f84970d-pi" style="display: inline;"><img alt="Nq-screenshot-1" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017eea777f84970d image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017eea777f84970d-800wi" title="Nq-screenshot-1" /></a></p>
<p>This setup also lets us modify the files on the target. We
can drag in new files, open files for editing and save the changes, and delete
files. Once the target file system was changed, we saved the contents of the
changed file system as a diff file which was added to the base image during the
system setup. This was used in the next step to provide multiple target machines
with individual persistent disk contents.</p>
<p>Indeed, the next step was to try to run a network of
machines inside of Simics. This was very easy to setup. Once the targets were
booted and up, and we tried starting the Visuality NQ servers, we hit a snag.
If the server was started simultaneously on all the machines (using target
serial-port scripted input it is trivial to type the same command at the same
exact time on all target machines), all server instances immediately shut down.
Strange. If we started first one, and then a second server, the first one
stayed up and the second shut down. The reason turned out to be that all CIFS
nodes need to have (actually the guilty protocols are NetBIOS and DNS but let’s
call this CIFS for a sake of simplicity) a unique name, and the VxWorks
instances we had here did not have any name set at all. Thus, a name collision
ensued. The solution was simple enough: provide each target with a unique name
from the scripted command-line setup. </p>
<p>What this demonstrates is that a virtual platform and virtual
network lets you discover properties of the software that you might not have
discovered until much later on hardware. Had we been working with hardware
boards, it is quite unlikely that we would have tried a multiple-machine test,
as that is a bit more work to do in hardware (scrounge a second board from some
other developer, get it configured, hook into your little local test network,
and try to bring it up). In Simics, we just made the script loop setting up
target machines run until 2 instead of 1 (and we could just as well have made
it run to 10 or 100). </p>
<p>Once this was sorted, we could mount multiple servers from
Windows, each with their own individualized disk contents, as shown in this
screenshot:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e201901b7a1c65970b-pi" style="display: inline;"><img alt="Nq-screenshot-2" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e201901b7a1c65970b image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e201901b7a1c65970b-800wi" title="Nq-screenshot-2" /></a></p>
<p>You can see the scripting of the startup setting the IP
addresses and names of the boards, before starting the server with the nqStart
command. This command starts both of the NQ daemon processes: NetBIOS and SMB. The
setup looks like this in Simics:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d43032a68970c-pi" style="display: inline;"><img alt="Nq-slide-2" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d43032a68970c image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d43032a68970c-800wi" title="Nq-slide-2" /></a></p>
<p>Apart from testing and demonstrating the nice software from
Visuality, this exercise provides excellent proof of some classic Simics
values:</p>
<ul>
<li>A Simics QSP synthetic platform is a perfect
tool to <a href="http://blogs.windriver.com/tools/2012/06/you-will-be-simulated-a-bit-quicker.html" target="_self">test and run applications and middleware that need the OS API and a particular
architecture<em> </em></a>. </li>
<li>A Simics QSP provided a stable proven VxWorks on
top of which the Visuality software could be run immediately with network
access and a disk with a file system, with no need to worry about choosing and
configuring a particular target BSP and loading software and formatting a disk.</li>
<li>You can build <a href="http://blogs.windriver.com/tools/2012/05/teaching-networking-using-simics.html" target="_self">virtual networks of machines</a>, and
connect these to the real world either individually or as a whole group.</li>
<li>The contents of target file systems are easy to
customize, and the changes can be saved permanently as disk images (but <a href="http://blogs.windriver.com/engblom/2010/10/physical-or-virtual.html" target="_self">you do not have to save them </a>in case you badly messed up the data, like I did in my first attempts).
</li>
<li>Target scripting that automates the boot and setup of the target systems, including individual scripts
for each target machine that makes each machine different from the common baseline. </li>
</ul>
<p>Finally, for Visuality Systems, Simics provides simple
access to a big-endian cross-target platform to test their software and its
portability. It makes access to arbitrary architectures as easy as running a
software program, rather than having to procure, configure, and maintain hardware
development boards. </p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Debug Quicker with Simics (video)</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2013/01/debug-quicker-with-simics-video.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2013/01/debug-quicker-with-simics-video.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017c357b7f2f970b</id>
        <published>2013-01-09T09:00:55-08:00</published>
        <updated>2013-01-09T09:00:55-08:00</updated>
        <summary>Late last year, I presented a one-hour webinar on how Simics lets you “resolve bugs in minutes instead of weeks.” Part of that webinar were two Simics demos that show Simics in action, from the first booting of a target system through loading software onto it and debugging a nasty crash in a server program. The webinar demos are now available as a single Youtube movie, on the Wind River Youtube channel. The target system used in the demo is a heterogeneous network of four machines. Two ARM-based and two PPC-based. Two of them are running a client application, and...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Wind River" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="debug" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Simics" />
        <category scheme="http://sixapart.com/ns/types#tag" term="virtual platform" />
        <category scheme="http://sixapart.com/ns/types#tag" term="virtual prototyping" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Wind River" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Late last year, I presented a one-hour webinar on how Simics lets you “resolve bugs in minutes instead of weeks.” Part of that webinar were two Simics demos that show Simics in action, from the first booting of a target system through loading software onto it and debugging a nasty crash in a server program. <a href="http://www.youtube.com/watch?v=3CTvtpMptlg" target="_self">The webinar demos are now available as a single Youtube movie, </a>on the <a href="http://www.youtube.com/user/windriverchannel" target="_self">Wind River Youtube channel</a>.</p>

<p>The target system used in the demo is a heterogeneous
network of four machines. Two ARM-based and two PPC-based. Two of them are
running a client application, and two are running a server application. They are
all connected on the same virtual Ethernet network, and the client applications
can talk to the server applications over the network. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c357b718c970b-pi" style="display: inline;"><img alt="Webinar-demo-setup" border="0" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c357b718c970b" style="display: block; margin-left: auto; margin-right: auto;" title="Webinar-demo-setup" /></a></p>
<p>In the demo, we test how the server and client applications
work when ported from PPC (where they originated) to ARM. All the possible
combinations of client to server connections are tested, and in the process we
make use of the key capabilities of Simics:</p>
<ul>
<li>Network simulation</li>
<li><a href="http://blogs.windriver.com/tools/2010/10/physical-or-virtual.html" target="_self">Checkpointing</a></li>
<li>Automated boot and setup of target systems</li>
<li>Scripted testing, automatically exploring all
combinations of (ARM, PPC) and (client, server)</li>
<li>Repeatability (see also my previous blog posts <a href="http://blogs.windriver.com/tools/2012/11/determinism-simics-and-flying-piggies.html" target="_self">here </a>and <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html" target="_self">here</a>)</li>
<li><a href="http://blogs.windriver.com/tools/2010/08/transporting-bugs-with-checkpoints.html" target="_self">Bug transportation </a>via checkpoints and scripting</li>
<li>Eclipse CDT debugger (with reverse)</li>
<li>Reverse execution</li>
<li>Reverse debugging, across the entire system of
four machines</li>
<li>Synchronous system stop</li>
<li><a href="http://blogs.windriver.com/tools/2012/05/teaching-networking-using-simics.html" target="_self">Network </a>traffic inspection, including feeding
data to Wireshark</li>
</ul>
<p>This demo use the Simics <a href="http://blogs.windriver.com/tools/2012/07/inside-a-synthetic-simulation-platform.html" target="_self">Quick Start Platforms</a>, showing how you can <a href="http://blogs.windriver.com/tools/2012/06/you-will-be-simulated-a-bit-quicker.html" target="_self">debug application software and resolve their bugs </a>without needing a model of the complete target system. See my <a href="http://blogs.windriver.com/tools/2012/06/you-will-be-simulated-a-bit-quicker.html" target="_self">previous blog post </a>for more on how you can use QSP to quickly get started reaping the benefits from virtual platforms and simulation. </p>
<p>If you want to see more about reverse execution and debugging with Simics, we have <a href="http://www.youtube.com/watch?v=ZpNYW6pbV4U" target="_self">another demo movie available </a>that also shows to debug with an integrated physics model. </p>
<p>You can go to the full seminar recorded <a href="http://ecast.opensystemsmedia.com/364" target="_blank">here</a> by Open Systems Media. </p></div>
</content>



    </entry>
    <entry>
        <title>Debugging Simics - on Simics</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/12/debugging-simics-on-simics.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/12/debugging-simics-on-simics.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017d3e6930c6970c</id>
        <published>2012-12-05T09:57:23-08:00</published>
        <updated>2012-12-05T09:57:23-08:00</updated>
        <summary>I often write and talk about how useful Simics is for debugging concurrency bugs and glitches in multithreaded and multicore systems. Recently, we had a case where we proved this on a very complex application, namely Simics itself. This nicely demonstrated both the recursive completeness of Simics, and its usefulness for conquering tricky bugs in complex software. The beginning of this story is a bug in Simics, triggered by a certain Simics configuration. The Simics target is a Power Architecture machine, running some bare-metal test code testing the processor simulation. Occasionally, this setup would crash Simics, due to some bug...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>
<img align="right" alt="Blurb" border="0" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e69295d970c" style="margin: 0px 0px 5px 5px;" title="Blurb" />I often write and talk about how useful Simics is for
debugging concurrency bugs and glitches in multithreaded and multicore systems. Recently, we had a case where we proved this on a very complex application,
namely Simics itself. This nicely demonstrated both the recursive completeness
of Simics, and its usefulness for conquering tricky bugs in complex software. </p>

<p>The beginning of this story is a bug in Simics, triggered by
a certain Simics configuration. The Simics target is a Power Architecture
machine, running some bare-metal test code testing the processor simulation.
Occasionally, this setup would crash Simics, due to some bug in Simics or the
models. It was a difficult bug to track down, as it only happened in one run
out of 50 or so. When attaching a debugger to try to diagnose it, it invariably
did not happen (a classic Heisenbug). </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee5dde55a970d-pi" style="display: inline;"><img alt="Simics-on-simics-1" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017ee5dde55a970d" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee5dde55a970d-800wi" style="display: block; margin-left: auto; margin-right: auto;" title="Simics-on-simics-1" /></a></p>
<p>Simics is the perfect tool to diagnose these kinds of
issues, but in order to do that, we had to get the failing program into Simics.
I.e., running Simics on Simics. The first step was to create a duplicate of the
development host, inside of Simics. This was fairly simple, just a matter of
installing a Fedora 16 standard Linux on an 8-core Intel target. Once the Linux
was installed and booted, a checkpoint of the system was taken. </p>
<p>Next, the development code tree from the host was packaged
up as a tar file and put on a DVD image file. Simics was started from the
checkpoint of the booted target system, and the DVD image inserted into the
virtual DVD drive and mounted by the Fedora Linux running on Simics. The tar
file was copied to the file system on the target, and unpacked. A new
checkpoint was taken after the Simics installation was thus completed and
Simics could run on Simics. The result at this point was a completely
self-contained, controllable, and repeatable environment. </p>
<p>The screenshot below shows Simics running on Simics, with
the same desktop wallpaper being used for both the host and outer Simics Fedora
system:</p>
<p>
<img alt="Simics-on-simics-screenshot-600px" border="0" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c343a3bec970b" style="display: block; margin-left: auto; margin-right: auto;" title="Simics-on-simics-screenshot-600px" /></p>
<p>The next step was to replicate the bug inside of Simics. To
this end, a shell command was used that repeatedly ran the inner Simics until
the bug hit (obviously, this session was started from the checkpoint after the
Simics installation). </p>
<p>The result was this setup, ready to run Simics until the bug
hit:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e692b45970c-pi" style="display: inline;"><img alt="Simics-on-simics-2" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d3e692b45970c" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e692b45970c-800wi" style="display: block; margin-left: auto; margin-right: auto;" title="Simics-on-simics-2" /></a></p>
<p>To recap, we have Simics running on Simics. The “inner
Simics” is configured with the Power Architecture setup that resulted in a
crash on the host, and the “outer Simics” is running Fedora 16, providing a
virtual replica of the development host (but inside of Simics).  </p>
<p>Additional scripting in the outer Simics was used to make
the search for and replication of the bug more efficient. </p>
<p><img alt="Simics-on-simics-3" border="0" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c343a3d6b970b" style="display: block; margin-left: auto; margin-right: auto;" title="Simics-on-simics-3" /></p>
<ul>
<li>The Simics script varied the time slices given
to the processors in the IA target system. This caused greater variation in
scheduling of concurrent processes and threads in the Simics-simulated Fedora
16 OS, which in turn helped provoke the bug so that it appeared faster (after
fewer runs of the inner Simics).</li>
<li>A checkpoint was taken after the inner Simics
had been started and the timing variation applied to the IA processors – but
before it had started executing the test case. This meant that a checkpoint would
be available that led straight to the bug, with no need to do any warm-up of
the target or particular configuration of Simics. The checkpoint would in
effect be a self-contained bug report for the issue. </li>
<li>A magic instruction (blue star) was planted in
the segfault handler of the inner Simics, making it very simple to catch the crash
of the inner Simics. Often, using a magic instruction like this is simpler than
trying to capture the right page fault or putting a breakpoint at the right
place. A magic instruction is a live marker in the code that will always
trigger, regardless of debug information or OS awareness. Furthermore, it has
no overhead until it hits. </li>
</ul>
<p>Eventually, after some 20 runs of the inner Simics, the bug
was triggered. Thanks to the checkpoint and Simics repeatability, reproducing
the bug is trivial. The Simics crash could now be reproduced any number of
times, and it was time to go debug and figure out why Simics crash. An
occasional Heisenbug had been converted into a 100% reproducible Bohrbug.</p>
<p>The first step of debugging was to figure out the mapping of
the many dynamically loaded modules in the inner Simics. This was done by
running the outer Simics and sending a Ctrl-Z to the Fedora shell, pausing the
inner Simics. Then, the <span style="color: #a0ff40;">/proc</span> file system on the Fedora Linux running on Simics
was interrogated to find the load addresses. Since the checkpoint was taken
after Simics was started, we know that this is the mapping in the software
setup found in the checkpoint. Every time the checkpoint is opened, the same
mapping applies – so the information was saved and used to setup symbolic debug
information for the Simics modules used. </p>
<p>The next step of debugging was to open the checkpoint again,
turn on reverse execution, and run forward until the magic instruction hit.
Then, OS awareness was used to back up until the last time that the inner
Simics was running prior to hitting the segfault handler. This placed the
execution of the outer Simics at the precise instruction where the inner Simics
crashed. </p>
<p>It turned out that Simics was trying to execute code in a
location (<span style="color: #a0ff40;">BCDE</span>) where no code was to be found.</p>
<p>Stepping back one instruction led to a <span style="color: #a0ff40;">JMP</span> instruction to
the location <span style="color: #a0ff40;">BCDE</span>. </p>
<p>So where did this <span style="color: #a0ff40;">JMP BCDE</span> come from? It was clearly not
part of the static code of Simics, but something that was generated at run time
by Simics itself (Simics contains a JIT compiler and thus modifying running
code at run time is perfectly expected behavior).</p>
<p><img alt="" border="0" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e692f18970c" style="display: block; margin-left: auto; margin-right: auto;" /></p>
<p>To find out how the bad <span style="color: #a0ff40;">JMP</span> was created, a memory write
breakpoint was put on the instruction (<span style="color: #a0ff40;">JMP BCDE</span>), and execution reversed.
Simics stopped at the point where the “<span style="color: #a0ff40;">JMP</span>” part of the instruction was written
to memory. Doing a stack back trace at this point showed the code that was trying
to write a five-byte “<span style="color: #a0ff40;">JMP XYZQ</span>” instruction into the JIT-generated code stream.
Since the breakpoint had hit on the write of the byte containing the <span style="color: #a0ff40;">JMP</span> instruction
code, indicating that the other four bytes (containing the actual <span style="color: #a0ff40;">JMP</span> target
location of <span style="color: #a0ff40;">XYZQ</span>) were yet to be written when the instruction got executed and
Simics crashed. </p>
<p>Stepping forward (on the processor) revealed that a thread
switch happened in the inner Simics, and that the incoming thread immediately
executed the five-byte <span style="color: #a0ff40;">JMP</span> instruction, such as it was. Since only the <span style="color: #a0ff40;">JMP</span> byte
had been written, this was a jump to location <span style="color: #a0ff40;">BCDE</span>, rather than the intended <span style="color: #a0ff40;">XYZQ
</span>(it would also have been OK to execute the original <span style="color: #a0ff40;">ABDCE</span> code). Thus, the
issue was diagnosed to be a read-write race condition, with the twist that the
read was an execution of the memory as code and the write a regular data write.
As soon as the problem was identified it was of
course very easy to fix.</p>
<p>With the same setup, another race condition in Simics was
also found and fixed, involving the more common case of multiple concurrent
threads updating and reading a shared data structure without sufficient
synchronization. </p>
<p>In summary, this blog post has described one instance where
Simics was used to find and fix concurrency bugs in a real-world complex
software system called Simics. The key to the success was the repeatability
that Simics provides, even for timing-related occasional events, along with checkpoints,
scripting, reverse execution, and debug facilities. </p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Determinism, Simics, and Flying Piggies</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/11/determinism-simics-and-flying-piggies.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/11/determinism-simics-and-flying-piggies.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017d3e141d4f970c</id>
        <published>2012-11-23T06:36:27-08:00</published>
        <updated>2012-11-26T05:38:27-08:00</updated>
        <summary>In a recent Simics seminar, I was asked about repeatability, variability, determinism and Simics. This is a question that comes up almost every time I present about Simics in front of an audience with testing experience. The people asking the question intuitively think that determinism is a bad thing - since it sounds like it will limit the execution scenarios that will be explored in testing. For a tester, variation is a good thing. However, determinism is not in conflict with variation. And I think I found a perfect illustration of this in a setting that is a bit more...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>In a recent Simics seminar, I was asked about <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html" target="_self">repeatability, variability, determinism and Simics</a>. This is a question that comes up almost every time I present about Simics in front of an audience with testing experience. The people asking the question intuitively think that <em>determinism</em> is a bad thing - since it sounds like it will <em>limit</em> the execution scenarios that will be explored in testing. For a tester, variation is a good thing. However, determinism is not in conflict with variation. And I think I found a perfect illustration of this in a setting that is a bit more accessible and easy to understand than computer simulators. In a computer game, <a href="http://www.rovio.com/en/our-work/games/view/47/bad-piggies" target="_self">Bad Piggies</a>, from <a href="http://www.rovio.com/" target="_self">Rovio</a>.  </p>

<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee588fabc970d-pi" style="float: right;"><img alt="Balance matters-small" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017ee588fabc970d" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee588fabc970d-800wi" style="margin: 0px 0px 5px 5px;" title="Balance matters-small" /></a>
<p>Bad Piggies is a game based on a physics simulation (not particularly realistic, but still reasonably consistent with everyday experience), where you put together strange contraptions to guide an endearing little pig pilot from the starting point to a goal. What struck me about the game is its incredible sensitivity to inputs, while still being 100% deterministic. To me, this is absolutely analogous to how Simics works (and I guess drawing that analogy indicates a pretty warped mind too). </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c33fe1104970b-pi" style="float: left;">
</a><a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c33fe224b970b-pi" style="float: left;"><img alt="Select-place-small-v2" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017c33fe224b970b" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c33fe224b970b-800wi" style="margin: 0px 5px 5px 0px;" title="Select-place-small-v2" /></a>There are two types on inputs in the game that matter: the (static) configuration of the vehicular contraption that you build, and the interactive inputs you provide during a level to turn on engines, pop balloons, or fire off boxes of TNT. An utterly minor change in the timing of an action can mean the difference between success and failure. On the static side, moving the center of gravity of a vehicle by putting the pilot pig somewhere else can have a huge impact on the behavior. There are levels where all you do is just move the center of gravity around to see which location finally hits the perfect balance. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e142aa3970c-pi" style="float: right;"><img alt="Tnt-two" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d3e142aa3970c" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3e142aa3970c-800wi" style="margin: 0px 0px 5px 5px;" title="Tnt-two" /></a>Still, the game is clearly deterministic. This is particularly obvious in levels where no interactive input is needed. All you do is put the vehicle together and watch it go (roll down a hill, float away on a balloon, or throw the pig by exploding a box of TNT underneath it). In these cases, each game plays out the exact same way, as the physics engine is intentionally and precisely crafted to be deterministic. Whether I play an HD version on an Android tablet or a regular version on a iPod touch, the outcome is the same and the way you clear the levels identical. This is strong <em>determinism</em>. </p>
<p>Is not <em>predetermined</em>, however - when you put a new vehicle together and let it go, you really cannot predict what is going to happen. You might have an idea in mind, and you might hope it does the right thing. But before you play it through, you do not know. It most likely fails on the first attempt, and you go back and tweak the design (in computer programming, this is known as debugging). And try again. Each time, something different will most surely happen, since the setup input is different. There is huge <em>variability</em>, and if we see the game as a way to test the contraptions we put together, there is infinite room for variation on top of the fundamentally deterministic game engine.</p>
<p>Imagine how this would be if the game was really nondeterministic. It would be pretty much unplayable. It would turn into a die-rolling exercise, where you would create the exact same initial conditions,  let it run, and hope that on some run, you would be lucky and get to the goal. Not my idea of fun, exactly. Determinism is clearly good and helpful for testing and development, as long as there is also variability and sensitivity to inputs. But if inputs do not changed, you do not want behavior to change. Normally, computer system are indeed random, and since this is a fact of life, many people make lemonade out of lemons and think of this randomness as a good tool for testing. But when you can do it, avoiding randomness is clearly superior as a way to approach systematic testing.  And as a way to fly computer game pigs to the goal.</p>
<p>Simics works in the same way as the Bad Piggies game. For a static test case like a non-interactive boot of an OS, it will play out the same each time you run it (provided no configuration changes are made). Change the target OS image, and you will get something different. Change the number of processors or the size of memory, and something different will happen. Change the host machine (the machine on which Simics runs), but not the target, and you will get the same result. </p>
<p>If we add uncontrolled interactive inputs to the mix, we will get different results each time. We can make interactive inputs deterministic by scripting or recording them. And given deterministic inputs, the execution will be the same.  This is how Simics achieves perfect repeatability.  Not by limiting what executes and how, but by controlling inputs to a system so that any particular variation can be reproduced. </p>
<p>Taking this one step further, a simulator also allows you to systematically explore issues by programming input variations into scripts. At one extreme, this is <a href="http://blogs.windriver.com/wind_river_blog/2010/10/the-virtual-basil-fawlty.html" target="_self">fault injection</a>, but even just automatically varying inputs to a program within legal bounds can be very interesting. For example, Simics has been used to <a href="http://blogs.windriver.com/wind_river_blog/2012/09/systematically-exposing-os-kernel-races-an-interview-with-ben-blum.html" target="_self">provoke OS bugs</a>, or test how a program runs as the <a href="http://blogs.windriver.com/wind_river_blog/2012/08/testing-manycore-scaling-with-simics.html" target="_self">number of processor cores </a>in a system is varied. </p>
<p>Repeatability and determinism just mean that we can repeat any execution we happen to chance upon - not that we limit the space that can be explored in testing. For any tester who ever had to try to replicate and report a failed test case that only was seen once, repeatability is a wonderful thing indeed. </p>
<p> </p>
<p> </p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Systematically Exposing OS Kernel Races – An Interview with Ben Blum</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/09/systematically-exposing-os-kernel-races-an-interview-with-ben-blum.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/09/systematically-exposing-os-kernel-races-an-interview-with-ben-blum.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017ee3bfb783970d</id>
        <published>2012-09-24T12:55:26-07:00</published>
        <updated>2012-09-24T12:58:25-07:00</updated>
        <summary>Full-system simulators like Simics provide unparalleled insight into what is going on in a target system. Indeed, better insight is one of the main features of simulation that we get regardless of what we simulate and how. In addition, if we want to, we can also exert control over the target system to make it take different execution paths than it otherwise would. Earlier this year, Ben Blum at Carnegie-Mellon University CMU presented a Master’s thesis that provides a very good example of just what can be achieved by combing the insight and control of a simulator with intelligence and...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Testing" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Tools" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Wind River" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Simics" />
        <category scheme="http://sixapart.com/ns/types#tag" term="simulator" />
        <category scheme="http://sixapart.com/ns/types#tag" term="virtual platform" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Wind River" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Full-system simulators like Simics provide unparalleled
insight into what is going on in a target system. Indeed, better insight is one
of the main features of simulation that we get regardless of what we simulate and
how. In addition, if we want to, we can also exert control over the target
system to make it take different execution paths than it otherwise would.
Earlier this year, Ben Blum at Carnegie-Mellon University  CMU presented a Master’s thesis that provides
a very good example of just what can be achieved by combing the insight and
control of a simulator with intelligence and domain knowledge. The system is
called Landslide, and it is used to expose race conditions inside of
operating-system kernels. </p>

<p>Landslide systematically explores the possible execution
paths of the kernel in order to provoke latent bugs that might only happen very
rarely in actual use of the operating system – but as we all know, such
glitches are the ones that tend to be found by customers in critical situations
and force engineering to spend months in reproduction attempts. </p>
<p>For this blog post, I interviewed the creator of Landslide, Ben Blum. </p>
<p><em><strong>Jakob Engblom: </strong>Please
introduce yourself!</em></p>
<p><strong>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee3bfb1f7970d-pi" style="float: right;"><img alt="Bblum-by-alanv" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017ee3bfb1f7970d" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017ee3bfb1f7970d-800wi" style="margin: 0px 0px 5px 5px;" title="Bblum-by-alanv" /></a>Ben Blum</strong>: I'm <a href="http://bblum.net" target="_self">Ben Blum</a>, a graduate student in the
<a href="http://www.csd.cs.cmu.edu/" target="_self">Computer Science Department </a>at <a href="http://cmu.edu" target="_self">Carnegie Mellon (CMU) ,
</a>advised by <a href="http://www.cs.cmu.edu/~garth/" target="_self">Garth Gibson</a>.
I spent last year doing this work as part of the Fifth Year Master's program in
the Computer Science Department at CMU. Now I am staying on for <a href="http://www.pdl.cmu.edu/" target="_self">several more
years for a Ph.D</a>. </p>
<p><em><strong>JE</strong>: What was the
topic of your thesis?</em></p>
<p><strong>BB</strong>: Systematic
testing (or sometimes "exploratory testing" or "systematic
exploration") is the idea of testing for race conditions in a concurrent
system by forcing the system to execute as many different <strong>thread interleavings</strong> as possible.</p>
<p>Normally when you run a concurrent system, its threads run
in an unpredictable pattern (governed by timer events or other device
interrupts), and only one possible interleaving out of many is executed. And
(by contrast), when you use conventional stress testing, you exercise a random
set of thread interleavings, often a very small subset of the whole execution
space. A systematic testing tool tries to enumerate and explore the whole state
space of interleavings.</p>
<p>This approach is very powerful: it can find arbitrarily
complicated race conditions, including TOCTTOU (<a href="http://en.wikipedia.org/wiki/Time_of_check_to_time_of_use" target="_self">Time of Check To Time of Use</a>) bugs. Lighter-weight tools such as data race detectors or static analyses are
often powerless against these, since simply adding locks around each access
might satisfy such a tool but not actually solve the race.</p>
<p>But systematic testing is also very computationally
expensive – you might be thinking "exponential explosion of
possibilities", and you'd be right. A big part of the research involves
figuring out how to reduce the coverage (and/or granularity) of possible
interleavings in a way that still gives meaningful results. Sometimes we use
sophisticated pruning algorithms; sometimes we use simple heuristics.</p>
<p><em><strong>JE</strong>: How did you
use Simics?</em></p>
<p><strong>BB</strong>: I wrote my
tool, called Landslide, as a Simics module. When Simics boots up a kernel, it
also loads Landslide, which then gets to monitor and control the kernel's
execution.</p>
<p>Being in a simulated environment gives a lot of power:
Landslide sees every instruction and memory access the kernel runs, and so
keeps fairly detailed internal representations of the kernel's state. There is
about an order of magnitude slowdown in doing this (compared to regular Simics
simulation speed), but it gets to know (for example) when each kernel thread is
runnable and whether heap accesses are invalid.           </p>
<p><em><strong>JE</strong>: And just what
is Landslide and how does it work?</em></p>
<p><strong>BB</strong>:  During the first run of a test case,
Landslide identifies "decision points" - instructions during the
execution at which it thinks that if the kernel had context switched to another
thread, something interesting might have happened (the set of decision points
defines the state space of thread interleavings). Annotations in the kernel
(which I call "tell_landslide()") also help Landslide track what the
kernel is doing (especially thread lifecycle events).</p>
<p>Then, when the test case finishes, Landslide chooses which
decision point to execute differently (and chooses which thread to run
instead), rewinds to that state, injects timer interrupts to force the kernel's
scheduler to run that thread, and proceeds. (A pruning algorithm called Dynamic
Partial Order Reduction drives the exploration - it chooses which interleavings
to explore next in a way that skips over certain redundant ones.)</p>
<p>Simics has a really great feature that makes this possible:
it supports setting bookmarks when the simulation is stopped and later doing
reverse-execution back to that point. I used this to implement the
"rewinding", and it turns out to be a lot faster than rebooting the
kernel every time (especially in a depth-first search).</p>
<p><em><strong>JE</strong>: Cool, so you
actually make use of reverse execution to create a backtracking search through
the state space of the target system. Kind of obvious once you see it, but also
very clever.  Anyway, please continue.</em></p>
<p><strong>BB</strong>: Finally, when
Landslide finds a bug, it prints a "decision trace", which is a list
of each thread switch that happened and a stack trace at each point. Landslide
has a couple of bug-finding checks: tripped asserts, use-after-free heap
accesses, deadlocks, and (heuristically) livelocks /infinite loops.</p>
<p> </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c321bd49a970b-pi" style="display: inline;"><img alt="Landslide" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017c321bd49a970b image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c321bd49a970b-800wi" title="Landslide" /></a></p>
<p>Which events to consider "decision points" was
also a hard question. During the project, I used voluntary reschedules (such as
calls to yield()) and synchronisation operations (such as calls to
mutex_lock()/mutex_unlock()) as decision points. I felt this struck a good
balance between fine-grained interleavings (more likely to find races) and
coarse-grained interleavings (more feasible to explore the state space).</p>
<p>But I also wanted to harness the user's own intuition, to
focus the search space more intelligently. So the user can also configure
decision points of their own - they can write
"tell_landslide_decide()" in the kernel's code to say "hey, put
a decision point here!", and they can also use another config file to
whitelist/blacklist certain modules of the kernel (maybe you don't want to
waste any time exploring interleavings in the virtual memory subsystem).</p>
<p> </p>
<hr />
<h3>Partial-Order Reduction Explained</h3>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c321bd699970b-pi" style="display: inline;"><img alt="Landslide-reduction" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017c321bd699970b image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c321bd699970b-800wi" title="Landslide-reduction" /></a><br /><br /></p>
<p>Landslide uses an
existing algorithm called Dynamic Partial Order Reduction to prune redundant
subtrees out of the search space of different thread interleavings. The basic
idea of the algorithm is to use a "memory independence relation",
which identifies when certain state transitions of different threads have no
conflicting shared memory accesses, to know when it wouldn't matter which order
those transitions run in. In the picture above, the states are identical after
"1" and "2", regardless of the order that these operations
are done. Therefore, it is sufficient to explore only of the subtrees marked in
yellow.</p>
<hr />
<p><em><strong>JE</strong>: That sounds
like some pretty advanced Simics programming there (I would not call it
scripting, it is way beyond that level of sophistication)!</em> </p>
<p><strong>BB</strong>: Thanks! Yes,
Landslide is basically its own piece of software. It's about 4000 lines of
code, written mostly in C.</p>
<p><em><strong>JE</strong>: What types of
kernel errors do you detect with Landslide?</em></p>
<p><strong>BB</strong>: Well, in
theory, the beauty of systematic testing is that it should be able to find any
kind of race condition.</p>
<p>One example of a race that systematic testing can find that
a data race detector couldn't is the everyday TOCTTOU: let's say the kernel
validates a userspace buffer (with a lock held) then later accesses the buffer,
but in the meantime another thread changed the permissions on that memory (with
the same lock held), making the first thread's access fail. Because of the
locking, it's not a data race, but Landslide can easily force an interleaving
when the first thread drops the memory-validation lock (cf. mutex_unlock()
being a good decision point, as I noted above).</p>
<p>Landslide itself is a little more limited than "in theory",
of course. As implemented, the only source of nondeterminism it controls is the
timer interrupts; so it controls thread scheduling, but not device input
handling. As such, it's most useful for finding races in core or mostly-core
parts of the kernel, such as the thread lifecycle implementation or the virtual
memory system. Finding races in device drivers is beyond the scope of
Landslide's current model (though potentially in the path of future work).</p>
<p><em><strong>JE</strong>: Are you still
working on the tool?</em></p>
<p><strong>BB</strong>: Since I wrote my
MS dissertation on it, I've put it on hold for a while. I've been doing some
other concurrency-related projects in the meantime (which you could read about
if you want on my website, linked above).</p>
<p><em><strong>JE</strong>: What kinds of
code have you tested Landslide on?</em></p>
<p><strong>BB</strong>: I wrote
Landslide targeting "Pebbles", which is a UNIX-like kernel
architecture that students implement in six weeks in our undergrad operating
systems class at CMU. Pebbles kernels have to be fully preemptible and support
concurrent execution of standard thread-lifecycle events (fork, exec, exit,
wait, yield, etc.). Using Pebbles as a case study was a good proving ground for
potentially extending this work in the future to more mainstream kernels, such
as Linux.</p>
<p>Students implement these kernels from the ground up (we only
provide them enough starter code to boot up into an entrypoint function; none
of the design or virtual memory or anything is done for them), so each kernel
ends up with slightly different design and structure. Landslide had to be flexible
enough to be able to attach to and control any of these kernels.</p>
<p>The real test was meeting with some of the students towards
the end of the project to get them to try out Landslide, to see if it works
"in the wild". Sometimes it was a struggle - the average time spent
doing instrumentation was 100 minutes - but in the end, all four (out of five
total) groups that completed the prerequisite instrumentation were able to find
bugs in their kernels with Landslide.</p>
<p><em><strong>JE</strong>: That is a
very elegant twist to the project, I think. You both get to test your tool on
real code that you know nothing about, and you teach the students that there
are tools that can do almost magic things to help them check their code for
errors.</em> </p>
<p><em><strong>JE</strong>: Do you have
any fun examples from the work of bugs found?</em></p>
<p><strong>BB</strong>: Sure. My favorite story is of one of the project's defining
moments, a time when I just stopped in my tracks and thought "huh. I have
made something smarter than me."</p>
<p>I had written a new unit test case, called double_wait, in
which two threads in a parent process attempt to wait() on a single child
process. I was testing it out on my own kernel (the one I wrote when I was an
OS student), and I expected the search to complete with no bugs, each time
having one parent thread succeed and the other return failure. Instead, after a
bit of searching, Landslide made my kernel trip an assert in its condition
variable linked-list logic.</p>
<p>This was a bug nobody ever knew about before - I'd written
it three years before and didn't find it during stress-testing; and the TA who
manually graded my kernel (by looking over it with a red pen) also overlooked
it.</p>
<p><em><strong>JE</strong>: That just
goes to show how difficult reasoning about concurrency is for a human being.</em> </p>
<p><em><strong>JE</strong>: As you know,
I <a href="http://blogs.windriver.com/tools/2012/05/forcing-rare-bugs-to-appear-an-interview-with-tingting-yu.html" target="_self">previously interviewed the creator of SimTester</a>. At a high level, these two tools built on
Simics would seem to be similar, but I think that they are actually quite
distinct. Could you expand a bit on that?</em></p>
<p><strong>BB</strong>: Imagine my
surprise when the SimTester interview came out just when I was writing my
thesis's related work section. "Whoa, wait, another tool that runs in
Simics and identifies key decision points and injects interrupts?"</p>
<p>So our projects are quite similar in those regards, but at a
higher level my strategy is somewhat different from theirs. SimTester focuses
on races involving device interrupt handling code, whereas Landslide is geared
towards non-driver inter-thread races. Additionally, SimTester’s testing model involves
forcing only one interrupt per test run, which is a lighter-weight approach
than systematic exploration.</p>
<p><em><strong>JE</strong>: Thank you for
your time, this has been really interesting. Where can you go to find more
information about Landslide?</em> </p>
<p><strong>BB</strong>: The thesis
can be found at <a href="http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-CS-12-118_abs.shtml" target="_self">http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-CS-12-118_abs.shtml</a>
(and the slides from the MSc defense at <a href="http://bblum.net/landslide-defence.pdf">http://bblum.net/landslide-defence.pdf</a>).</p>
<p> </p>
<p><em>For additional information from Wind River, visit us on <a href="http://www.facebook.com/WindRiverSystems" target="_blank" title="Wind River Facebook page">Facebook</a>.</em></p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Analyzing Manycore Scaling with Simics</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/09/analyzing-manycore-scaling-with-simics.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/09/analyzing-manycore-scaling-with-simics.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e2017c31aefefd970b</id>
        <published>2012-09-12T11:14:42-07:00</published>
        <updated>2012-09-12T11:14:42-07:00</updated>
        <summary>In my previous blog post on multicore scaling investigations with Simics, I tested a simple parallel program on a variety of machines. The scaling obtained was not particularly impressive, especially not on a 60-core target machine. In this post, we will use the Simics timeline view to look a bit closer at what is going on inside the target machines. In particular, with respect to operating system scheduling of the target threads. Before we look at the runs that indicated a lack of scaling, we should look at a well-behaved case to make sure we have something to compare to....</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Multi-core" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Wind River" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="embedded" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Simics" />
        <category scheme="http://sixapart.com/ns/types#tag" term="simulation" />
        <category scheme="http://sixapart.com/ns/types#tag" term="virtual platform" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Wind River" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>In my <a href="http://blogs.windriver.com/wind_river_blog/2012/08/testing-manycore-scaling-with-simics.html" target="_self">previous blog post on multicore scaling
investigations with Simics</a>, I tested a simple parallel program on a variety
of machines. The scaling obtained was not particularly impressive, especially
not on a 60-core target machine. In this post, we will use the Simics timeline
view to look a bit closer at what is going on inside the target machines. In
particular, with respect to operating system scheduling of the target threads. </p>

<p>Before we look at the runs that indicated a lack of scaling,
we should look at a well-behaved case to make sure we have something to compare
to. For this, I brought up the 5-core heavy load experiment again, and reran
it. Some things in the target setup had changed compared to the last blog post,
so the scaling of the length 100 line is a bit different (once again showing
the sensitivity of this program to noise and initial conditions just like it would be on physical hardware). If
we look at the behavior for a nicely scaling line and compare it to the
timeline view plotting the active threads, we can see that nice scaling does
indeed correspond to good parallelism. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e20177448cb640970d-pi" style="display: inline;"><img alt="Qsp-multicore-scale 5core40 200 pkt len in timeline" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e20177448cb640970d image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e20177448cb640970d-800wi" title="Qsp-multicore-scale 5core40 200 pkt len in timeline" /></a></p>
<p>In this plot, each processor core in the target has its own
color, and we see which threads run when on which core. In the picture, we see
the cases with two to five worker threads. For two and three threads, we have
each thread using a single processor core for the duration of the program. For
four threads, where the scaling is a bit less ideal, we can see that two threads
share a single processor. Finally, for five threads, we can see that we use
four target cores on average across the run, and that the OS decides to have
two threads share a core. Thus, we can conclude that scaling in the overall throughput
graph does indeed correspond to parallel execution (no great surprise). </p>
<p>Next, I looked at the behavior on the 60-core target for the
threads counts where scaling was essentially flat. Here, we have a rather
different picture from the above. At six worker threads, we see a scheduling
that only makes use of two cores, in a fairly erratic manner. It looks quite impressive, but it is not what you want from a scalable program. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9a38970c-pi" style="display: inline;"><img alt="Rule30 6 worker threads two cores" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d3bdd9a38970c image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9a38970c-800wi" title="Rule30 6 worker threads two cores" /></a></p>
<p>Things get even more interesting as we go and look at many
cores. When we use 19 worker threads, we see a regular pattern develop in the execution:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9c28970c-pi" style="display: inline;"><img alt="Rule30 19 worker threads totally serial" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d3bdd9c28970c image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9c28970c-800wi" title="Rule30 19 worker threads totally serial" /></a></p>
<p>This was quite surprising. Note how each thread gets to run
in order, and how sometimes we get two threads running in parallel and
sometimes just a single one. It appears that the handling of the shared lock on
the work queue is mostly FIFO, where a thread gets a unit of work and then goes
to the end of the queue. This gives rise to the regular pattern seen, as each thread
gets to run in turn (with some exceptions, indicating that the system does
suffer from a little noise). Clearly, we are not getting anywhere near the
theoretical parallelism that this problem offers. </p>
<p>All this indicates that the communication between threads
needs to be fixed in order to increase performance. This is completely expected
– as we expand the available hardware parallelism, software needs to be
rewritten to minimize communication between threads and maximize independent
execution. </p>
<p>If we plot the behavior of our program as a speedup over the
case with one worker thread, we can see that it actually does not get worse as
we add threads, it just plateaus at a speedup between 3 and 4. Thus, this
program in its current form would not benefit from a platform that is any wider
than quadcore. </p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9d59970c-pi" style="display: inline;"><img alt="Qsp-multicore-60-load-10 speedup" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017d3bdd9d59970c image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017d3bdd9d59970c-800wi" title="Qsp-multicore-60-load-10 speedup" /></a></p>
<p>So what does this give us?</p>
<p>We have determined that the program in its current
incarnation, on a Linux OS, does not scale beyond four worker threads with any
kind of benefit. Thus, if we want to run this program on a wider target machine
(to either run bigger workloads or use more slower processors in order to save
power), we need to first re-architect the software. Throwing hardware at the problem
will have no positive effect. </p>
<p>We did this by exploiting the configurability of a virtual
platform to change the number of cores, and the insight of the virtual platform
to gather statistics and information about how the program executes without
changing its behavior. </p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Testing Manycore Scaling with Simics</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/08/testing-manycore-scaling-with-simics.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/08/testing-manycore-scaling-with-simics.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e20177445c141b970d</id>
        <published>2012-08-27T09:53:28-07:00</published>
        <updated>2012-08-27T09:53:28-07:00</updated>
        <summary>A few years ago, I did a Simics demo where I tested the scalability of a multithreaded program as the target hardware went from two to four to eight cores. Unfortunately, I could not take it beyond that point, since the hardware platform that I used simply did not allow for more than eight cores. Now, with the Simics Quick Start Platforms (QSP), the situation is different. I picked up the demo again, and pushed it to sixty (60) cores with ease. The QSP platform that comes with Simics scales out to 128 processors by default (and more if you...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Multi-core" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Tools" />
        
        <category scheme="http://sixapart.com/ns/types#tag" term="Simics" />
        <category scheme="http://sixapart.com/ns/types#tag" term="simulation" />
        <category scheme="http://sixapart.com/ns/types#tag" term="virtual platform" />
        <category scheme="http://sixapart.com/ns/types#tag" term="Wind River" />
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>A few years ago, I did a Simics demo where I tested the
scalability of a multithreaded program as the target hardware went from two to
four to eight cores. Unfortunately, I could not take it beyond that point,
since the hardware platform that I used simply did not allow for more than
eight cores. Now, with the <a href="http://blogs.windriver.com/wind_river_blog/2012/06/you-will-be-simulated-a-bit-quicker.html" target="_self">Simics Quick Start Platforms (QSP)</a>, the situation is
different. I picked up the demo again, and pushed it to sixty (60) cores
with ease. </p>

<p>The QSP platform that comes with <a href="http://blogs.windriver.com/wind_river_blog/2012/07/inside-a-synthetic-simulation-platform.html" target="_self">Simics scales out to 128
processors by default </a>(and more if you are
willing to do some modifications to the model). It also comes with a Linux
image that works for 128 processors, providing the infrastructure for testing threaded programs. Armed with this, I compiled my original
pthread-based Linux program for the QSP, and ported my old demo scripts to the
QSP – taking the chance to generalize things and add some new bells and
whistles to the setup in the process. </p>
<p>To understand what I am going to show here, you need some understanding about the program I am testing. The program uses a worker pool architecture, with a single
data generator thread that pushes work units into a shared queue, and multiple worker threads pulling work units from this queue and computing on them. The amount of computation needed to process a unit of
work is highly scalable and tune-able, it can easily be increased 100-fold from one run to
another. This allows us to explore the trade-off between communication and
computation in the program. In the graphs that you will see below, we scale one of the
parameters for the workload from 100 to 1000, but there is also a hidden
parameter inside the experiment, and this hidden parameter is multiplied by the visible parameter to produce the total amount of computation needed for each unit of work by a worker. </p>
<p>If
you know anything about parallel programming, that single queue probably sounds
like a bad design – and bear with me, we will see how this works out in the
experiments. </p>
<p>In my experiments, I used a five-core, a sixty-core, and
later a twenty-core machine. I booted them, loaded the test application over the simicsfs
host-access system, and saved a checkpoint of the prepared machines. I then
created a set of scripts that started with a checkpoint, and ran experiments
with various parameters:</p>
<ul>
<li>Which machine to run on (five-core,
twenty-core, or sixty-core)</li>
<li>The thread counts to test </li>
<li>The computational workload per unit of work</li>
</ul>
<p>The experimental scripts use Simics OS awareness to detect
the start and end of each run of the test program, and computes the performance
of each run as the number of work units (packets) processed per (virtual) second.
It then plots the values in a graph, with the number of worker threads on the
horizontal axis and the performance in packets per second on the vertical axis. With ideal scaling, you would expect a nice straight line that ends up being <em>N</em> times the initial value for <em>N</em> worker threads being used. </p>
<p>The first experiment used the five-core machine,
testing from one to five worker threads. Note that this means that the program
contains at most six active threads – the five workers plus the data generator
thread. The resulting graph looked like this:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c317e8b32970b-pi" style="display: inline;"><img alt="Qsp-multicore-5-load-1" class="asset  asset-image at-xid-6a00d83451f5c369e2017c317e8b32970b" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c317e8b32970b-500wi" title="Qsp-multicore-5-load-1" /></a></p>
<p>A bit jumpy, but at least things are improving from left to right. The jumpiness can be attributed to two
factors. First, we only do a single run for each data point. Second, we are
using a very light load, which increases the program variability as overheads and
OS fluctuations dominate over the core program functionality. Still, what clearly can see here is that the Simics simulation is not behaving like an ideal system. Rather, it behaves like a real machine would, showing variations induced by the environment in which a program runs. As a point of comparison, I once built an AMP (separate OS on each core) setup for this program, and there the scaling was totally linear and
ideal since there was no threading package, no synchronization of threads needed, and no OS parallelism involved. </p>
<p>The non-ideality here is really a feature, indicating that
I have to improve my experiments. The improved experiment increases the computation per packet
by a factor of 40 (using the hidden parameter), to emphasize the scalability of the core algorithms over the
system noise. The result looks like this:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017617759cf7970c-pi" style="display: inline;"><img alt="Qsp-multicore-5-load-40" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017617759cf7970c" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017617759cf7970c-800wi" title="Qsp-multicore-5-load-40" /></a><br /><br /></p>
<p>This does indeed result in much smoother graphs. The crossing between the 1000
and 800 lines does not matter that much, they are pretty close to
each other in load. Overall, it seems that we scale about 3x to 4x when
going out to five cores, which is pretty decent. </p>
<p>But what happens if we try to push this program to the
limit? To try this, I used the QSP with 60 cores (a nice, round number). For most
hardware that I know, going to 60 cores is either infeasible (strict limit on
how many cores an SoC or interrupt architecture can support) or prohibitively
expensive (exotic multi-socket motherboards with highest-end Xeons). In Simics,
we just turn a virtual knob. The result looks like this, using a compute load per packet that is ten times the base load, so we should expect pretty smooth scaling (I skipped the 100 packet length in this
experiment as it does not add much information).</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c317e9432970b-pi" style="display: inline;"><img alt="Qsp-multicore-60-load-10" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e2017c317e9432970b image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2017c317e9432970b-800wi" title="Qsp-multicore-60-load-10" /></a><br /><br /></p>
<p>However, this plot indicates that my program is
not that scalable at all once we move beyond the simple world of a few cores. What we seem to
get is about a 3x speedup from 60 worker threads on 60 cores. We get the best
performance around five to six threads, and then goes down and flattens out. This
program clearly needs to be rewritten to work for manycore machines – which is
what we wanted to test. </p>
<p>However, maybe not everything can be blamed on poor coding
in the program. To investigate the impact of the operating system itself, I
brought up a 20-core QSP and reran the experiments. This did indeed shine some
new light on the behavior of the program, as you can see:</p>
<p>
<a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e20177445c1f9f970d-pi" style="display: inline;"><img alt="Qsp-multicore-20-load-10" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e20177445c1f9f970d image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e20177445c1f9f970d-800wi" title="Qsp-multicore-20-load-10" /></a><br /><br /></p>
<p>This plot does not show the step
effects of the 60-core run, and much less jumpiness (except the known jumpy
length 100 line). Note that this is the same program, with the same number of
threads as above. But the hardware size is different, and therefore the OS
behavior can be different for operations that involve scheduling and sychronization between cores. It indicates that make a scalable program, I might have to both update my own code, as well as check if the operating system itself can be tuned in some way to work better with my workload.</p>
<p>To conclude, this blog post has shown how flexible virtual
hardware can help you understand the scalability of software to multiple cores.
Having
a flexible easily-configurable virtual platform that is unconstrained by the
limits of physical hardware designs is a very useful tool to explore the
fundamental behavior of software - both for operating system code and user-level code.  </p>
<p><span style="color: #80ff00;"><strong>Final note:</strong></span> If you wonder what limited the original demo, it was the
use of an <a href="http://jakob.engbloms.se/archives/633" target="_self">eight-bit field at the top of a 32-bit word to enable and disable
processors</a>. After we used these eight bits, there simply was no way to add a
ninth bit and thus a ninth core, without redesigning the hardware and the BSP.
At the time, that felt like serious overkill for what we were trying to
accomplish. </p></div>
</content>



    </entry>
    <entry>
        <title>Inside a Synthetic Simulation Platform</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/07/inside-the-simics-qsp.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/07/inside-the-simics-qsp.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e20168eba982fa970c</id>
        <published>2012-07-03T16:56:11-07:00</published>
        <updated>2012-07-03T16:56:11-07:00</updated>
        <summary>Recently, we introduced a synthetic simulation-only Simics target machine called QSP (Quick-Start Platform) and it's included in the latest version of Wind River Simics. The idea of QSP is to give every user a useful Simics target that allows them to immediately start using Simics and begin reaping the benefits of simulation for software development. In this blog post, we'll be taking a look under the hood of QSP to see how it works. The idea behind QSP is to design a piece of virtual-only hardware that is as simple as possible, while still running real operating systems similar to...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Recently, we <a href="http://www.windriver.com/news/press/pr.html?ID=10801" target="_blank">introduced</a> a synthetic simulation-only Simics target machine called QSP (Quick-Start Platform) and it's included in the latest version of Wind River Simics. The idea of QSP is to give every user a useful Simics target that allows them to immediately start using Simics and begin reaping the benefits of simulation for software development. In this blog post, we'll be taking a look under the hood of QSP to see how it works.</p>

<p><img alt="" src="http://static.typepad.com/.shared:v20120626.01-0-g44ba6e9:typepad:en_us/tiny_mce/3.3.9.4/plugins/pagebreak/img/trans.gif" />The idea behind QSP is to design a piece of virtual-only hardware that is as simple as possible, while still running real operating systems similar to how they work on ordinary hardware.</p>
<p>This is not a new idea. The Simics team and other simulation providers have done similar things in the past - but never with the bottom-up design done with Simics QSP. Usually, designs have started with a system controller and core complex from an existing platform, and then using synthetic devices for various IO. Looking back at my own work history, I ran a student project back in 2006 to port SMP Linux onto a cut-down Simics PowerPC platform. More recently, some colleagues of mine have ported VxWorks to various simple uniprocessor Simics models that basically just contained a processor, memory, serial port, and interrupt controller (modeled closely to existing PICs). The VxWorks team has used a highly-simplified platform in their <a href="http://blogs.windriver.com/engblom/2011/03/kick-starting-an-os-port.html" target="_blank">OS porting to new architectures</a>.</p>
<p>What is truly new with Simics QSP is the scope and depth of the design effort. Instead of cutting down an existing hardware model to something less complex, we started from a blank page and designed each device in the system to be as simple and easy to use as possible. We also wanted the devices to be flexible and scalable and not impose unnecessary accidental limitations on the system setup. After providing the specifications to engineering, I expected to get back a platform with at most a handful of simple devices. However, it turned out that a modern OS requires a surprising number of hardware services in order to work, and the resulting initial QSP design contains nine types of devices in addition to the processor cores.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e201630638f6c0970d-pi"><img alt="Qsp-block-diagram" src="http://blogs.windriver.com/.a/6a00d83451f5c369e201630638f6c0970d-700wi" title="Qsp-block-diagram" /></a></p>
<p>The picture above shows a block diagram of the QSP that we are now shipping. We can see that it contains quite a few pieces of hardware. To make an OS run, we need a processor. A processor needs RAM to store code and data. To get periodic interrupts, it needs a timer (even though the basic needs are taken care of by internal features within the Power Architecture, timers are needed for ARM and possibly other architectures). To handle interupts from devices and inter-processor interrupts in a multi-core design, you need an interrupt controller. To start non-primary processors in a multi-core system, you need a system controller. A real-time clock (RTC) is needed for sane dates, especially important when using disks to store a large file system. </p>
<p>Apart from this core set of devices, serial, Ethernet, and LEDs are needed to provide input and output facilities. Disks and flash disks were added to support large file systems and flash-based applications. Not all of the I/O units will be present in all configurations, as that is up to the user to decide.</p>
<p>For added flexibility, we made sure that the number of processors and important functional devices can be varied in case some application needs several of these elements.  It is a virtual platform, and not subject to the physical limitations of real hardware. We wanted to make sure this benefit was passed on to our users. This is where writing both the BSP and the target hardware was most beneficial. Most real multicore hardware designs have arbitrary limitations in <a href="http://jakob.engbloms.se/archives/633" target="_self">system controller registers </a>or interrupt controller designs that limit how many cores can be used, even when a simulator should allow you to use any number of cores. By designing a very scalable hardware interface and the BSP to drive it, we can scale the system to theoretically any number of cores. The current hard stop is at 128, but this might change in the future.</p>
<p>If we look at the design of the devices themselves, it turned out that we needed more details than what I expected at the start of the project.</p>
<p>For instance, Ethernet requires both receive and transmit interrupts to be present.  In an ideal world, transmit would seem unncessary, right?  But we have to model finite bandwidth, as infinite bandwidth is very confusing to existing software, and that means that packets can take time to transmit, and thus transmit-completed interrupts are interrupts (or equivalently, a transmit complete flag along with a polling driver).  We also have descriptor tables in RAM rather than a simple buffer inside the device, as that is what the existing driver stacks for networking would expect. </p>
<p>Another surprise to me was the fact that simulating (NOR) flash memory is best done by actually implementing a flash command set.  File systems and flash driver stacks expect a flash to follow a certain pattern of operation, such as requiring several writes to turn into write mode.  Just modeling flash as persistent RAM memory does not work, even though it seems the obvious thing to do.</p>
<p>Overall, as we worked on QSP, we discovered just how much the structure of operating systems is shaped by the design of contemporary hardware interfaces.  The BSP or HAL (hardware abstraction layer) in an OS like Linux or VxWorks does not really allow any hardware to run underneath the OS - it expects certain behavioral patterns that are not necessarily what one might think of as the simplest way to express a functionality in hardware. Indeed, if we had made QSP devices more lightweight, the BSP side of the QSP would have been much more complex and would have required much deeper changes to the OS kernels and device driver structures.</p>
<p>Ultimately, what we've released is a solid piece of virtual hardware that allows application development and other system development tasks to progress without the need to model or use any particular hardware model. Obviously, many tasks that Simics is used for does require a model of a specific hardware board, and in those cases QSP is intended as a companion, not a replacement, for traditional Simics models.</p>
<p><em>For additional information from Wind River, visit us on <a href="http://www.facebook.com/WindRiverSystems" target="_blank" title="Wind River Facebook page">Facebook</a>.</em></p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>You Will be Simulated - A Bit Quicker</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/07/you-will-be-simulated-by-qsp.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/07/you-will-be-simulated-by-qsp.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e20168ebb871dc970c</id>
        <published>2012-07-03T16:55:00-07:00</published>
        <updated>2012-07-03T16:55:00-07:00</updated>
        <summary>We just released a new Simics feature, the QSP (Quick-Start Platform). This is a synthetic simulation-only Simics target machine that is included with the Simics base product. QSP provides every user with a useful Simics target that allows you to immediately start using Simics and reap the benefits of simulation for software development, without waiting for a target model to be ready. It provides a quick way to realize our old Simics slogan: "Resistance is Futile, you will be Simulated." With QSP, we provide a shortcut by offering a ready-to-use integration of a model and target software stack that can...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>We just released a new Simics feature, the QSP (Quick-Start Platform). This is a synthetic simulation-only Simics target machine that is included with the Simics base product. QSP provides every user with a useful Simics target that allows you to immediately start using Simics and reap the benefits of simulation for software development, without waiting for a target model to be ready. It provides a quick way to realize our old Simics slogan: "<a href="http://blogs.windriver.com/engblom/2011/05/twenty-thirty-and-sixty-years-ago.html" target="_self">Resistance is Futile, you will be Simulated</a>."</p>
<p>With QSP, we provide a shortcut by offering a ready-to-use integration of a model and target software stack that can be used to do application-level development from the moment a user receives Simics. QSP also makes simulation features like checkpointing, reverse execution, unintrusive debug, repeatability, automation, and scripting easily available for application developers without having to first setup an OS and select a target machine.</p>


<p>A key use case for QSP is developing user-level application software. QSP presents an application developer with an operating system on top of a certain architecture, with a set of functions available in the hardware and supported by the software stack. All application software is compiled to the target architecture, not to the host, and application binaries from a real target should run on the QSP, provided that the OS APIs in the QSP are sufficient. </p>
<p>Let's go over some examples of how QSP can be configured and used.</p>
<p>Our first example will be Linux on a dual-core Power Architecture machine with Ethernet and serial, as shown in the picture below.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016305c2e531970d-pi" /><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6f489970b-pi"><img alt="Qsp-use-1" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6f489970b-550wi" style="display: block; margin-left: auto; margin-right: auto;" title="Qsp-use-1" /></a><br /><br />This type of target would be used to run a networked application. By instantiating multiple QSPs in a single simulation, the network application can be tested with communication between the boards.  It does not matter which Ethernet adapter is used or the precise nature of the board, as long as the application can run with two threads tied to a core each and communicate over Ethernet to its peer.</p>
<p>Thanks to Simics <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html" target="_self">repeatability and control</a> capabilities, network scenarios involving all the machines can be perfectly replicated, analyzed, and debugged, across the entire system. An instantaneous system state, including traffic sessions in progress, can be saved as <a href="http://blogs.windriver.com/engblom/2010/10/physical-or-virtual.html#more" target="_self">Simics checkpoints</a>, allowing the developer to come back later to a particular point in the execution. Checkpoints can also be used to <a href="http://blogs.windriver.com/engblom/2010/08/transporting-bugs-with-checkpoints.html#more" target="_self">capture and communicate bugs </a>between team members, replacing the tedious process of describing system setups in bug reporting systems. The Simics <a href="http://blogs.windriver.com/tools/2011/05/simics-46-initial-impressions.html" target="_self">system-level debugger </a>allows source-code debugging on all target systems at once, at the application level, with synchronous stepping of the entire system, not just an application at a time. Reverse execution and reverse debugging can be applied to the entire system, following the flow of control and packets between the machines backwards in time. <a href="http://blogs.windriver.com/tools/2010/05/analyzer-analyzed.html" target="_self">Simics Analyzer </a>makes it easy to see how programs on both sides of the network connection wait for data and get activated.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016305c2e6e1970d-pi"><img alt="Qsp-use-2" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2016305c2e6e1970d-300wi" style="display: block; margin-left: auto; margin-right: auto;" title="Qsp-use-2" /></a><br />Another crucial function is using a flash disk. In the picture above, we see an ARM machine with a flash disk running VxWorks.  A serial setup is added to have an interactive connection to the target. This setup would allow a programmer to develop and test software interacting with a flash-based file system, without caring about the make and model of the flash memory unit, or how to configure a driver stack (for a real flash-based system you tend to need to know the type of the flash to correctly configure the OS to drive it, while with the QSP, there is just a single type of flash and the driver configuration is given). The QSP flash disk is also much faster than a real one, as there is <a href="http://blogs.windriver.com/engblom/2011/02/working-faster-and-with-less-sweat.html" target="_self">no need to wait for contents to stabilize </a>after a write.</p>
<p>In Simics, saving and restoring multiple versions of the flash disk contents is trivial. Since <a href="http://blogs.windriver.com/engblom/2010/10/physical-or-virtual.html#more" target="_self">disk changes are handled as diffs</a>, any bad operations can be easily undone by restarting the simulation session (which discards all changes). They can also be interactively undone by using <a href="http://blogs.windriver.com/engblom/2010/10/the-giggle-effect.html#more" target="_self">reverse execution </a>in the simulation session.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6fa46970b-pi"><img alt="Qsp-use-3" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6fa46970b-640wi" style="display: block; margin-left: auto; margin-right: auto;" title="Qsp-use-3" /></a></p>
<p>QSP was designed with multicore in mind, and it offers a more flexible target in terms of the number of cores available in hardware than any physical ARM or Power Architecture system available today (with the exception of IBM pSeries servers). As illustrated in the picture above, this can be used to <a href="http://www.windriver.com/whitepapers/whitepaper.php?f=WP_System_Arch_Exploration_using_Simics_0810.pdf" target="_self">test the scaling of </a>and <a href="http://blogs.windriver.com/tools/2010/12/debug-multicore-and-more-debug.html" target="_self">debug</a> the behavior of multithreaded multicore software, easily varying core counts to see what happens. With Simics repeatability and reverse execution and reverse debugging, concurrency bugs are much easier to find and fix than when using physical hardware.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6fef2970b-pi"><img alt="Qsp-use-4" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766b6fef2970b-580wi" style="display: block; margin-left: auto; margin-right: auto;" title="Qsp-use-4" /></a><br />At this point, it is worth remembering that application binaries developed on a QSP will run on real machines too.  Indeed, a key value of using QSP for application development is that the same cross-build tools and the same OS API and ABI are used on the physical target. Thus, the application binaries compiled for and tested on a QSP will run unchanged on a physical target (with the same OS and architecture). If the application uses third-party binary-only libraries, these can be used as-is.  In this way, QSP reduces the distance between the development environment and the production environment, compared to using host-based development. </p>
<p>There is a real and relevant difference between Linux on Power and Linux on ARM and Linux on an x86 host. With QSP you find the target-related issues during application development rather than much later at hardware-software integration. If Simics is also used to model a real board before physical hardware appears, integration issues can even be found and resolved before the actual target exists.</p>
<p><a href="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766be3615970b-pi"><img alt="Qsp-use-5" src="http://blogs.windriver.com/.a/6a00d83451f5c369e2016766be3615970b-700wi" title="Qsp-use-5" /></a></p>
<p>Above, we show an example of how QSP can be used to shorten the development time and time to market for a system based on custom hardware. The eventual goal of the project is to have the real OS running on a model of the actual physical hardware board used in a system, and after that on the physical board itself. With QSP configured with the relevant functions, application development can start concurrently with the development of the model of the physical platform. Once the platform model is finished and the OS runs on it, the application is available for execution and integration test. Without QSP, we would have had to wait until the virtual platform was complete enough to use before doing application development.</p>
<p>QSP is also a great teaching platform. It is an obvious candidate for setups like the one presented in my previous <a href="http://blogs.windriver.com/tools/2012/05/teaching-networking-using-simics.html" target="_self">blog on teaching networking with Simics</a> - you want plenty of Ethernet ports and a certain OS, along with ease of use and ease of portability across target architectures. The QSP design is a natural match for such use cases. QSP is also perfect for <a href="http://blogs.windriver.com/tools/2011/05/teaching-operating-systems-with-simics-an-interview-with-massimo-violante.html" target="_self">university-level OS courses. </a>It reminds me of a course that I used to teach where students were given the task to develop an OS kernel from scratch. In QSP, you have all the relevant interesting behaviors such as interrupts from devices, timers, multicore, etc., but none of the arbitrary complexities of real hardware platforms. For example, configuring an i8254 interrupt controller to get serial port interrupts is pretty complex, distracting students from the core subject material of how an OS kernel works.</p>
<p>This blog post has really only scratched the surface of what QSP can be used for. It offers a very quick way to get going with Simics simulation, and to apply the benefits of simulation for software developers for all application development - regardless of whether there is an actual platform model available for the actual target hardware.</p>
<p> </p>
<p><em>For additional information from Wind River, visit us on <a href="http://www.facebook.com/WindRiverSystems" target="_blank" title="Wind River Facebook page">Facebook</a>.</em></p>
<p> </p></div>
</content>



    </entry>
    <entry>
        <title>Teaching Networking using Simics</title>
        <link rel="alternate" type="text/html" href="http://blogs.windriver.com/engblom/2012/05/teaching-networking-using-simics.html" />
        <link rel="replies" type="text/html" href="http://blogs.windriver.com/engblom/2012/05/teaching-networking-using-simics.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451f5c369e20168eb133e58970c</id>
        <published>2012-05-25T09:41:53-07:00</published>
        <updated>2012-05-25T09:41:53-07:00</updated>
        <summary>Wind River Education Services provides user training for a variety of topics, including Wind River operating systems and tools, as well as more general topics like networking. Training always includes hands-on labs, which can complicate logistics for training sessions. Shipping boards and configuring networks is time-consuming and error-prone. For that reason, we are looking into using Simics as an alternative to physical hardware to streamline training logistics. It also makes it simple to encapsulate configurations, while maintaining the reality of cross-development. In this blog post, we will look at how Simics is used as a tool to facilitate network labs...</summary>
        <author>
            <name>Jakob Engblom</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Simics" />
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://blogs.windriver.com/engblom/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://windriver.com/customer_education/" target="_blank">Wind River Education Services </a>provides user training for a variety of topics, including Wind River operating systems and tools, as well as more general topics like networking. Training always includes hands-on labs, which can complicate logistics for training sessions. Shipping boards and configuring networks is time-consuming and error-prone. For that reason, we are looking into using Simics as an alternative to physical hardware to streamline training logistics. It also makes it simple to encapsulate configurations, while maintaining the reality of cross-development. In this blog post, we will look at how Simics is used as a tool to facilitate network labs in the context of education.</p>


<p>To make an interesting lab, we need a few machines, connected over a few networks. The starting setup currently used is shown below:</p>
<p><a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e20168eb1633d1970c-pi" style="display: inline;"><img alt="Networking class setup" class="asset  asset-image at-xid-6a00d83451f5c369e20168eb1633d1970c" src="http://blogs.windriver.com/.a/6a00d83451f5c369e20168eb1633d1970c-640wi" style="width: 625px; display: block; margin-left: auto; margin-right: auto;" title="Networking class setup" /></a></p>
<p>We have eight machines inside the Simics process, four used as routers ("Router N" in the picture, with yellow borders) and four used as end points for communications ("Target N" in the picture). These eight machines are connected using six Ethernet network links. In the real world, each such network link would have been an Ethernet switch, for a total of fourteen physical units.</p>
<p>One of the target machines connects out to the host to allow host-based software such as Wind River Workbench to control and monitor the operations. To monitor the traffic, we attach live <a href="http://www.wireshark.org/" target="_blank">Wireshark </a>capture points to two of the simulated networks. Each Wireshark capture point is connected to a Wireshark process on the host, which gets the packets flowing in the simulation and presents them just as if they had been captured from an Ethernet interface on the host. This lets students monitor and investigate the traffic flows in the labs. Compared to using a regular network sniffer on a host interface, the Simics Wireshark trace does not require any administrative rights on the host - it is just a stream of data from one user-level process to another.</p>
<p>The target machines shown here are all Power Architecture-based Wind River SBC8548 boards. In their physical incarnation, these boards only expose two of the four Ethernet controllers on the MPC8548 SoC to the outside. In Simics, all four Ethernets are available for connection, as making that change in a model is very simple. In this setup, most of the targets run VxWorks, but a few also run Linux to make the network more heterogeneous and realistic. The software stacks running on the targets are complete OS stacks, the same as would be running had this lab been configured from physical boards and cables. This includes aspects like IPv4 and IPv6 network stacks, SNMP, routing protocol handlers, and higher-level software like web servers. This makes the labs very realistic, and allows us to teach how to use a particular real software stack if that is desired.</p>
<p>Since this setup is created by Simics scripts, it is very easy to change   the topology or nature of the machines. For example, adding in more   machines and networks, changing their architecture from the Freescale MPC8548 to another SoC or another instruction set architecture altogether.  We could also mix different architectures in the setup to emulate the  effects of a mixed-endian network or mixed-OS network.</p>
<p>With this setup, many different labs exercises can be performed, such as:</p>
<ul>
<li>Policy based routing from the Linux targets to the VxWorks targets, making routing decisions based on network protocols used for traffic rather than IP addresses. For example, TCP could be routed over "Router 6" and UDP over "Router 5", by making routing changes on "Router 7".  </li>
<li>Dynamic routing protocols between subnets, by artificially disconnecting the link between "Router 7" and "Router 5". Such a network disconnect is trivial to achieve in Simics.</li>
<li>Watching the ARP broadcast distribution in the system when one of the Linux targets is trying to reach one of the VxWorks targets for the first time. </li>
<li>Injecting packet streams from the host, running the pktgen utility, and inserting them in the Ethernet port of "Router 7" that is connected to the host machine. Other external traffic sources could also be used, such as IXIA and Agilent traffic generators. </li>
<li>Simulating the effects of long-distance WAN links by introducing jitter, delays, and dropped packets on the network traffic on certain links. Simics can be used to provide a model of the world, not just a local lab, to add realism to the training. </li>
</ul>
<p>With Simics, you get a "network in a box" that is ready to go at any training session, anywhere, without needing anything else but a laptop and a virtual machine image with Simics and the lab setups installed. This is using Simics as a tool to achieve the goal of teaching networking - users do not need to learn much about Simics at all, they can just start the session and get going on the labs with minimal overhead. The Simics setup is stable and does not need to be double-checked as a physical setup would.</p>
<p><a class="asset-img-link" href="http://blogs.windriver.com/.a/6a00d83451f5c369e20168ebb13285970c-pi" style="display: inline;"><img alt="Networking class screen cap" border="0" class="asset  asset-image at-xid-6a00d83451f5c369e20168ebb13285970c image-full" src="http://blogs.windriver.com/.a/6a00d83451f5c369e20168ebb13285970c-800wi" title="Networking class screen cap" /></a><br /><br /></p>
<p> In action, the setup can look like the above screenshot (using Simics 4.6 and the  stand-alone GUI). Note the commands to set up network adapters and their  IP addresses in the router serial consoles along the bottom (which is a  task that is best scripted so that Simcis does it for you, although in a  teaching setting it might be pedagogic to have the students do this  manually once just to think about what they do). After this, a user  would enter the routing commands manually to perform the configurations which are part of the labs.</p>
<p> </p>
<p><em>For additional information from Wind River, visit us on <a href="http://www.facebook.com/WindRiverSystems" target="_blank" title="Wind River Facebook page">Facebook</a>.</em></p></div>
</content>



    </entry>
 
</feed><!-- ph=1 -->
