<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">
  <id>http://www.cs.washington.edu/homes/asampson/</id>
  <title>Adrian Sampson</title>
  <updated>2012-04-24T07:00:00Z</updated>
  <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/" />
  
  <author>
    <name>Adrian Sampson</name>
    <uri>http://www.cs.washington.edu/homes/asampson/</uri>
  </author>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/asampson" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="asampson" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
    <id>tag:www.cs.washington.edu,2012-04-24:/homes/asampson/blog/macroscalar.html</id>
    <title type="html">What Is Macroscalar?</title>
    <published>2012-04-24T07:00:00Z</published>
    <updated>2012-04-24T07:00:00Z</updated>
    <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/blog/macroscalar.html" />
    <content type="html">&lt;p&gt;A couple months ago, &lt;a href="http://arstechnica.com/apple/news/2012/02/apple-trademark-may-hint-at-processing-improvement-for-next-gen-a6-processor.ars"&gt;a story&lt;/a&gt; made the nerd-press rounds about Apple&amp;rsquo;s &lt;a href="http://tarr.uspto.gov/servlet/tarr?regser=serial&amp;amp;entry=85530375"&gt;trademark application&lt;/a&gt; and several patents (from &lt;a href="http://www.google.com/patents/US7395419"&gt;2004&lt;/a&gt;, &lt;a href="http://www.google.com/patents/US7617496"&gt;2005&lt;/a&gt;, and &lt;a href="http://www.google.com/patents/US8065502"&gt;2009&lt;/a&gt;) for something called a &amp;ldquo;macroscalar&amp;rdquo; processor architecture. Of course, like any tech company these days, Apple patents any old thing&amp;mdash;most things Apple patents are never even intended to see the light of day. But the surprising thing in this case is that the patents are about microarchitecture: a game that Apple has never played before.&lt;/p&gt;

&lt;p&gt;Over its 36-year history, Apple has depended on other companies&amp;rsquo; CPU designs. The Apple I and II used variants of the &lt;a href="http://www.6502.org/homebuilt"&gt;MOS Technology 6502&lt;/a&gt; microprocessor. The Mac&amp;rsquo;s processors &lt;a href="http://en.wikipedia.org/wiki/Motorola_68000_family"&gt;came from Motorola&lt;/a&gt;, and later from &lt;a href="http://en.wikipedia.org/wiki/PowerPC"&gt;IBM/Motorla/Freescale&lt;/a&gt;; now they&amp;rsquo;re &lt;a href="http://en.wikipedia.org/wiki/Apple–Intel_architecture"&gt;supplied by Intel&lt;/a&gt;. iOS devices today use Apple-designed systems-on-a-chip (SoCs) like the &lt;a href="http://www.anandtech.com/show/5686/apples-a5x-floorplan"&gt;A5X&lt;/a&gt;, but even this custom silicon sports CPU core designs licensed from ARM (namely, the &lt;a href="http://www.arm.com/products/processors/cortex-a/cortex-a9.php"&gt;Cortex A9&lt;/a&gt; used in nearly every current smartphone SoC on the market except Qualcomm&amp;rsquo;s &lt;a href="http://www.qualcomm.com/snapdragon"&gt;Snapdragon&lt;/a&gt;). Could Apple&amp;rsquo;s next step in its quest to control every aspect of its shiny products be to enter the formidable microarchitecture fray with Intel, AMD, ARM, Qualcomm, IBM, and Oracle? The company&amp;rsquo;s 2008 and 2010 acquisitions of small chip design firms &lt;a href="http://arstechnica.com/apple/news/2008/04/apple-disses-intels-atom-buys-powerpc-designer-pa-semi.ars"&gt;P. A. Semi&lt;/a&gt; and &lt;a href="http://arstechnica.com/apple/news/2010/04/apple-purchase-of-intrinsity-confirmed.ars"&gt;Intrinsity&lt;/a&gt; suggests, as unlikely as it may seem, that Apple might believe they can achieve significant power, performance, or functionality gains by going toe-to-toe with the microarchitectural establishment.&lt;/p&gt;

&lt;p&gt;If Apple were to get into the CPU design business, this move would represent a major shift in consumer electronics: as far as I know, Apple would be the only company to sell &amp;ldquo;whole widgets&amp;rdquo; to consumers featuring their own in-house microarchitecture. And, if this move proves not to be mere &lt;a href="http://en.wikipedia.org/wiki/Not_invented_here"&gt;NIH&lt;/a&gt;ism and the iPads of the future are way more awesome because of their new Apple-designed processor cores, it could lead to a sea change in the computer architecture landscape and the way that consumer electronics companies compete.&lt;/p&gt;

&lt;h2 id="decoding-macroscalar"&gt;Decoding Macroscalar&lt;/h2&gt;

&lt;p&gt;Because of the potential significance of the change, it&amp;rsquo;s worth looking closer what Apple&amp;rsquo;s architects might be working on. We outsiders have no evidence either way about whether Apple intends to do anything at all with its &amp;ldquo;macroscalar architecture&amp;rdquo; patents and trademark, but even so, grokking the idea might help us understand what they might be up to.&lt;/p&gt;

&lt;p&gt;But some cursory googling reveals that news outlets have only reported on the idea in broad strokes. &lt;a href="http://www.zdnet.com/blog/storage/apples-macroscalar-architecture-what-it-is-what-it-means/1435"&gt;Robin Harris at ZDNet&lt;/a&gt; writes about the etymology of the &amp;ldquo;macroscalar&amp;rdquo; neologism (brand?) and gives some background on unrolling and vectorization; &lt;a href="http://arstechnica.com/apple/news/2012/02/apple-trademark-may-hint-at-processing-improvement-for-next-gen-a6-processor.ars"&gt;Chris Foresman at Ars Technica&lt;/a&gt; outlines the broad idea somewhat more technically; but I haven&amp;rsquo;t found any attempts at deeper understanding of the technique. I like to think of myself as a computer architect (I&amp;rsquo;m a third-year Ph.D. student in the area), so I&amp;rsquo;ve taken a stab at decoding the publicly available information about &amp;ldquo;macroscalar&amp;rdquo; architectures to give a coherent picture of the idea.&lt;/p&gt;

&lt;p&gt;First, a caveat: patents are terrible, monstrous documents full of &amp;ldquo;one or more&amp;rdquo;s and &amp;ldquo;according to another embodiment&amp;rdquo;s. They&amp;rsquo;re obtusely written and probably elide important details intentionally. (I even believe that patents can often &lt;a href="http://www.thisamericanlife.org/radio-archives/episode/441/when-patents-attack"&gt;hamper innovation more than they help&lt;/a&gt;.) But for the benefit of you, dear reader, I have slogged through the legalese to extract what I believe to be the core of the &amp;ldquo;macroscalar&amp;rdquo; idea. But because patents are all I have to work with, it&amp;rsquo;s possible I misunderstood the whole thing and have grossly mischaracterized it here. Please &lt;a href="&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#097;&amp;#115;&amp;#097;&amp;#109;&amp;#112;&amp;#115;&amp;#111;&amp;#110;&amp;#064;&amp;#099;&amp;#115;&amp;#046;&amp;#119;&amp;#097;&amp;#115;&amp;#104;&amp;#105;&amp;#110;&amp;#103;&amp;#116;&amp;#111;&amp;#110;&amp;#046;&amp;#101;&amp;#100;&amp;#117;"&gt;&amp;#103;&amp;#101;&amp;#116;&amp;#032;&amp;#105;&amp;#110;&amp;#032;&amp;#116;&amp;#111;&amp;#117;&amp;#099;&amp;#104;&lt;/a&gt; if you find any errors.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ll first summarize the design in a more general-interest way and then give a technical explanation, suitable for people familiar with computer architecture. Depending on who you are, you&amp;rsquo;ll probably want to read one section and skip the other.&lt;/p&gt;

&lt;h2 id="somewhat-less-technical-summary"&gt;Somewhat Less Technical Summary&lt;/h2&gt;

&lt;p&gt;Read this section if you&amp;rsquo;re a technical person but aren&amp;rsquo;t necessarily interested in hardcore architectural details. If you are familiar with the basics of computer architecture, you might find this section redundant and unsatisfying&amp;mdash;skip to the next one.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Out-of-order_execution"&gt;&lt;em&gt;Out-of-order&lt;/em&gt; (OoO) processing&lt;/a&gt; is a technique used by many CPU designs today to execute programs faster. OoO processors reorder instructions so that they run as soon as their inputs are available, taking advantage of the fact that two instructions that work on different pieces of data can run in any order without changing the meaning of the program. These processors can also take advantage of &lt;a href="http://en.wikipedia.org/wiki/Register_renaming"&gt;many extra registers&lt;/a&gt; (on-chip storage units) without requiring the program to be recompiled to use them. However, classic out-of-order designs are quite complex&amp;mdash;they have to dynamically look for communication (&amp;ldquo;dependencies&amp;rdquo;) between instructions to decide which order to execute them in.&lt;/p&gt;

&lt;p&gt;Macroscalar architecture takes some ideas from OoO design&amp;mdash;reordering instructions and using extra registers without recompilation&amp;mdash;to get some of the speedups of OoO execution without all the complexity. Complexity in processors is expensive: it consumes valuable design and testing time, it makes the processor more prone to bugs, it uses up valuable transistors, and it can make CPUs use more energy. So, even if macroscalar designs can&amp;rsquo;t fully match the performance of their OoO counterparts, they might be worth it because of their advantages in power efficiency, time-to-market, and silicon area.&lt;/p&gt;

&lt;p&gt;Specifically, macroscalar processors only accelerate tight loops that do a lot of similar computation over many different pieces of data. With some help from the compiler (but without requiring recompilation for every new chip design), the processor finds parts of a loop that are independent and interleaves them so that several iterations of the loop run in a single pass. (The process is similar to &lt;a href="http://cs.oberlin.edu/~jdonalds/317/lecture18.html"&gt;loop unrolling&lt;/a&gt;, an optimization traditionally performed by compilers rather than processors.) By relying on information from the compiler about instruction dependencies, the processor can perform this instruction reordering without checking for &lt;a href="http://en.wikipedia.org/wiki/Hazard_(computer_architecture)"&gt;communication problems&lt;/a&gt; as OoO processors must.&lt;/p&gt;

&lt;p&gt;Because the world doesn&amp;rsquo;t have any macroscalar processors to experiment on (even simulated ones), it&amp;rsquo;s not clear how close they can come to matching traditional OoO processors&amp;rsquo; performance or how much complexity they really save. But it&amp;rsquo;s plausible that, because tight loops make up important parts of many programs, macroscalar execution could get many of the benefits of OoO execution for some applications while not using as much energy.&lt;/p&gt;

&lt;h2 id="technical-description"&gt;Technical Description&lt;/h2&gt;

&lt;p&gt;Read this section if you&amp;rsquo;re a big nerd and know a little bit about modern computer architecture. If you&amp;rsquo;re an ordinary, curious nerd of the non-architect type, you may want to skip this one.&lt;/p&gt;

&lt;p&gt;Macroscalar design is a technique for extracting instruction-level parallelism (or, possibly, loop-level parallelism). It&amp;rsquo;s a high-level microarchitecture style: an &lt;em&gt;alternative&lt;/em&gt;, rather than an extension, to today&amp;rsquo;s pervasive out-of-order superscalar designs, VLIW architectures, or vector machines. So it&amp;rsquo;s appropriate to think of the proposal as an improvement over a base &lt;em&gt;in-order&lt;/em&gt; design rather than a modification of an OoO design. Some aspects of the macroscalar architecture look like structures found in familiar OoO or vector processors, but resist the temptation to dismiss these aspects as redundant—remember, we&amp;rsquo;re exploring &lt;em&gt;alternatives&lt;/em&gt; to these better-known designs.&lt;/p&gt;

&lt;p&gt;Another important takeaway: the technique depends on limited co-design with a compiler. Unlike superscalar processors, which transparently accelerate binaries compiled to target in-order cores, macroscalar architectures require lightweight compiler annotations on loop bodies.&lt;/p&gt;

&lt;p&gt;The macroscalar technique focuses on breaking false cross-iteration dependencies in loops to partially parallelize them and to take advantage of additional physical registers. This process is called &lt;em&gt;dynamic loop aggregation&lt;/em&gt;, but as a first approximation, let&amp;rsquo;s begin by thinking of it as dynamic &lt;a href="http://cs.oberlin.edu/~jdonalds/317/lecture18.html"&gt;loop unrolling&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id="loop-unrolling"&gt;Loop Unrolling&lt;/h3&gt;

&lt;p&gt;Unrolling, of course, is typically used by compilers to (among other things) keep pipelined functional units busy. Consider a loop that squares the elements of an array in place (here I&amp;rsquo;m borrowing an example used in the patent). Here&amp;rsquo;s some pseudo-assembly for such a loop:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;i = 0
L1:
val = arr[i]
val *= val
arr[i] = val
i++
cmp i max
jne L1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Here, all the &amp;ldquo;variables&amp;rdquo; I&amp;rsquo;m using are stored in registers; &lt;code&gt;arr&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt; are inputs. Also, assume for the moment that the array loads and stores always hit in the first-level cache.) If the multiplier has a latency of, say, four cycles, then there are three wasted cycles between the multiply and the store. One iteration thus takes 7 cycles (not counting the jump). An optimizing compiler might unroll this loop and issue four multiplies in sequence. Here&amp;rsquo;s an unrolled loop body (I&amp;rsquo;m omitting the head and tail of the loop):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;i1 = 0
i2 = 1
i3 = 2
i4 = 3
L1:
val1 = arr[i1]
val2 = arr[i2]
val3 = arr[i3]
val4 = arr[i4]
val1 *= val1
val2 *= val2
val3 *= val3
val4 *= val4
arr[i1] = val1
arr[i2] = val2
arr[i3] = val3
arr[i4] = val4
i1 += 4
i2 += 4
i3 += 4
i4 += 4
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, assuming the non-multiply instructions execute in a single cycle, the loop performs four iterations in 16 cycles because each multiply result (&lt;code&gt;val#&lt;/code&gt;) is ready when it&amp;rsquo;s consumed by the corresponding array store. Unrolling bought us a speedup of three cycles per iteration.&lt;/p&gt;

&lt;p&gt;Of course, unrolling requires more available registers and a knowledge of operation latencies. If a compiler wants to get the most out of loop unrolling, it needs to know how many registers are available and statically allocate registers to different iterations of a loop. With loop aggregation, the &lt;em&gt;processor&lt;/em&gt; essentially performs loop unrolling instead of the compiler, relaxing the need for microarchitecture-specific optimization and freeing the design to transparently take advantage of large register files that are hidden from the ISA.&lt;/p&gt;

&lt;h3 id="dynamic-loop-aggregation"&gt;Dynamic Loop Aggregation&lt;/h3&gt;

&lt;p&gt;When targeting a macroscalar architecture, a compiler provides some metadata for each program loop that allows the architecture to perform loop unrolling on its own terms. Specifically, the processor gets to decide &lt;em&gt;how much&lt;/em&gt; to unroll the loop. In the jargon of the patents, the processor determines the &lt;em&gt;loop aggregation factor&lt;/em&gt; or &lt;em&gt;F&lt;/em&gt;. (In our example above, &lt;em&gt;F&lt;/em&gt; = 4.)&lt;/p&gt;

&lt;p&gt;To unroll a loop requires a bunch of extra registers. To this end, macroscalar architectures reuse a fundamental idea from OoO design: distinction between a small set of architected registers and a larger set of physical registers that are not exposed in the ISA. In macroscalar, the hidden registers are called &lt;em&gt;extended registers&lt;/em&gt; or XRs. Each architected register is also backed by a single physical register—the architected registers are not merely abstract constructs. The XR file is used only for loop aggregation.&lt;/p&gt;

&lt;p&gt;To enable aggregation, the compiler analyzes the loop body and identifies the (architected) registers that are used in an iteration-local way, called the &lt;em&gt;dynamic registers&lt;/em&gt; or DRs. Pedantically, DRs are those that are written before they are first read in the loop body; in our example, &lt;code&gt;val&lt;/code&gt; is a dynamic register. The register used for iteration, &lt;code&gt;i&lt;/code&gt;, is also considered a dynamic register. The remaining registers, which are read-only in the loop body, are called &lt;em&gt;static registers&lt;/em&gt; or SRs—&lt;code&gt;arr&lt;/code&gt; above, for example. (Note that neither category can express a cross-iteration dependency, which would exhibit a read followed by a write in the body. Cross-iteration dependencies will be dealt with later.) To unroll with factor &lt;em&gt;F&lt;/em&gt;, the processor needs enough XRs (physical registers) to store &lt;em&gt;F&lt;/em&gt; copies of each DR, so it calculates &lt;em&gt;F&lt;/em&gt; = &lt;em&gt;D&lt;/em&gt; ÷ &lt;em&gt;X&lt;/em&gt; where &lt;em&gt;D&lt;/em&gt; is the number of DRs and &lt;em&gt;X&lt;/em&gt; is the number of available XRs.&lt;/p&gt;

&lt;p&gt;So the compiler provides two annotations for every aggregate-able loop:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Which (architected) registers are DRs and which are SRs. (For example, the compiler could use low-numbered registers as DRs and write down the index of the last DR.)&lt;/li&gt;
  &lt;li&gt;Which register is used as the loop index, so it can be initialized. Specifically, the compiler replaces the zero-initialization (something like &lt;code&gt;mov $0 rN&lt;/code&gt;) with a special instruction &lt;code&gt;index rN&lt;/code&gt;. When the loop is aggregated, the many XRs corresponding to the index DR will be initialized to the first few natural numbers (0, 1, 2, &amp;hellip;, &lt;em&gt;F&lt;/em&gt;-1).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The processor uses this information to determine the aggregation factor &lt;em&gt;F&lt;/em&gt;, translate DR references to XR references, and replicate instructions in the loop body.&lt;/p&gt;

&lt;h3 id="implementation"&gt;Implementation&lt;/h3&gt;

&lt;p&gt;To implement aggregation, the architecture uses elements called &lt;em&gt;iteration units&lt;/em&gt; coupled with each &lt;em&gt;execution unit&lt;/em&gt; (which you can think of as a functional unit for now). Iteration units are responsible for receiving &lt;em&gt;primary instructions&lt;/em&gt; (program instructions) and transforming each into a sequence of &lt;em&gt;secondary instructions&lt;/em&gt;, which are the unrolled copies of the original instructions adapted to use XRs. This approach mostly separates the loop aggregation logic from the ordinary instruction fetch and decode units, which only need to deal with primary instructions.&lt;/p&gt;

&lt;p&gt;In each iteration of the aggregated loop, &lt;em&gt;F&lt;/em&gt; (the number of program iterations to be executed during the current aggregated iteration) is calculated. Then, the issue logic sends &lt;em&gt;F&lt;/em&gt; along with each primary instruction to the iteration units. The iteration units immediately begin creating the &lt;em&gt;F&lt;/em&gt; corresponding secondary instructions and sending them to the functional unit. Because the primary instructions are issued in program order, the secondary instructions execute in the same order, ensuring that dependencies are satisfied without explicit coordination between the iteration units.&lt;/p&gt;

&lt;p&gt;The relative independence of the iteration units may comprise a complexity advantage over OoO design: no explicit dependence management or scheduling is necessary to expose ILP.&lt;/p&gt;

&lt;h3 id="parallelization"&gt;Parallelization&lt;/h3&gt;

&lt;p&gt;So far, loop aggregation has bought us essentially the same performance benefits as loop unrolling but in a microarchitecture-independent way. The performance benefits come from dynamically choosing to interleave instructions in a way that avoids stalls due to register dependencies, which is reminiscent of the benefits offered by single-issue OoO design. However, the macroscalar design can go beyond instruction ordering and execute multiple instructions simultaneously&amp;mdash;a feature typically associated with superscalar designs and vector (SIMD) instructions.&lt;/p&gt;

&lt;p&gt;To do this, the patent proposes pairing every iteration unit with &lt;em&gt;multiple&lt;/em&gt; FUs. This way, the iteration unit can kick off several secondary instructions in each cycle. In this sense, loop aggregation can look like vectorization: if a macroscalar chip has four multipliers, then the corresponding iteration unit can multiply four adjacent numbers in an array &amp;ldquo;all at once&amp;rdquo; based on a single program (primary) instruction.&lt;/p&gt;

&lt;p&gt;Parallelizing the execution of secondary instructions necessarily complicates the instruction scheduling problem. The patent proposes a simplistic solution here: only use &lt;em&gt;N&lt;/em&gt;-way parallelism when &lt;em&gt;all&lt;/em&gt; the units involved in some loop body can issue at least &lt;em&gt;N&lt;/em&gt; secondary instructions at once (i.e., the degree of parallelism falls to that of the least-parallel execution unit).&lt;/p&gt;

&lt;h3 id="loop-carried-dependencies"&gt;Loop-Carried Dependencies&lt;/h3&gt;

&lt;p&gt;Until this point, we&amp;rsquo;ve only considered loops without data dependencies between iterations (sometimes &amp;ldquo;DoAll&amp;rdquo; loops). In practice, many loops read variables that are updated in previous iterations. Loop aggregation clearly breaks in these cases: transposing instructions to use independent XRs instead of shared architected registers effectively isolates loop iterations from one another.&lt;/p&gt;

&lt;p&gt;To deal with loop-carried dependencies, a compiler for a macroscalar architecture delineates loop bodies into sections with and without such dependencies. The dependent parts of a loop are called &lt;em&gt;sequence blocks&lt;/em&gt; and the independent parts are called &lt;em&gt;vector blocks&lt;/em&gt;. The compiler uses a third kind of annotation to distinguish the instruction ranges that make up these blocks; at runtime, the processor only iterates the instructions in vector blocks. Sequence blocks are executed in order and skip the iteration units entirely&amp;mdash;the instruction issue logic sends primary instructions from sequence blocks directly to the execution units.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s unclear to me how effective this analysis can be on real programs. As a wild, unsubstantiated guess, I would venture that most loop-carried dependencies affect large portions of a loop&amp;mdash;that is, if a given loop has a single cross-iteration dependence, then the entire loop body is likely to be dependent. I can&amp;rsquo;t think of a real-world example where only a small, isolated portion of a loop carries a dependence but the rest is independent and thus vectorizable.&lt;/p&gt;

&lt;h3 id="other-topics"&gt;Other Topics&lt;/h3&gt;

&lt;p&gt;For the purposes of this (already-too-long) article, I&amp;rsquo;m omitting some less-interesting details from the patents that nonetheless are critical to making macroscalar designs viable:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Handling nested loops. Compiler instrumentation and hardware structures are used to efficiently support aggregating nested loops that draw from the same pool of XRs.&lt;/li&gt;
  &lt;li&gt;Loop control flow. Special flags are used to implement C&amp;rsquo;s &lt;code&gt;continue&lt;/code&gt;, &lt;code&gt;break&lt;/code&gt;, and &lt;code&gt;return&lt;/code&gt;. In the latter two cases, limited rollback is necessary to conceal the effects of partially-executed iterations after the loop is terminated.&lt;/li&gt;
  &lt;li&gt;Context switching and OS support.&lt;/li&gt;
  &lt;li&gt;Exception handling.&lt;/li&gt;
  &lt;li&gt;Predication. Because aggregated vector blocks are not allowed to contain control flow, predicated instructions and predicated blocks are used to avoid branching when the program loop contains conditionals.&lt;/li&gt;
  &lt;li&gt;Prefetch. The patents propose using an aggressive stride-based stream prefetcher to help ensure that regular memory accesses in vector blocks rarely miss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the curious, the patents do give a complete architectural picture (with the notable exception, of course, of empirical evaluation). Few details are omitted if you&amp;rsquo;re willing to wade through the legalese.&lt;/p&gt;

&lt;h2 id="my-opinion"&gt;My Opinion&lt;/h2&gt;

&lt;p&gt;Dynamic loop aggregation is a legitimately interesting idea and macroscalar could have a shot at being a viable high-level core design strategy. For several reasons, however&amp;mdash;its competitiveness with OoO, the unimportance of ISA opacity, and the fading relevance of instruction-level parallelism (ILP)&amp;mdash;I don&amp;rsquo;t believe that Apple will ever sell a macroscalar iPad.&lt;/p&gt;

&lt;p&gt;Macroscalar architecture is an enhancement to in-order designs. It is mutually exclusive with (or, at best, orthogonal to) out-of-order superscalar, VLIW, and vector machines&amp;mdash;traditional approaches to extracting ILP. So the relevant question is: What advantages might a macroscalar processor have over a traditional superscalar one or a statically scheduled ILP technique like SIMD or VLIW? The patents don&amp;rsquo;t address this comparison directly, but I&amp;rsquo;ll try to give a reasonable perspective here.&lt;/p&gt;

&lt;p&gt;Over OoO techniques, I believe the potential advantages are twofold:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Complexity (and, consequently, power and area): Macroscalar designs can be seen as a simpler, lower-power way to get ILP in constrained situations. It remains to be seen &lt;em&gt;how much&lt;/em&gt; simpler it is than OoO and how much ILP it can expose in real applications.&lt;/li&gt;
  &lt;li&gt;Fetch and decode: By iterating &amp;ldquo;secondary&amp;rdquo; instructions post-decode, loop aggregation may have a positive effect on code size (and thus I-cache pressure) as well as the power overhead of instruction decode logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The magnitude of both effects will never be clear without an empirical evaluation, but it&amp;rsquo;s hard to believe that a large advantage can be eked out over optimized OoO designs. The ever-popular &lt;a href="http://www.arm.com/products/processors/cortex-a/cortex-a9.php"&gt;Cortex A9&lt;/a&gt; is itself out-of-order and dual-issue, providing an existence proof of energy-efficient superscalar processing.&lt;/p&gt;

&lt;p&gt;Over VLIW and vector machines, macroscalar processors offer abstraction from microarchitectural parameters. Here macroscalar&amp;rsquo;s case is even weaker than it is against OoO: I believe ISAs of the future will trend toward &lt;em&gt;more&lt;/em&gt; compiler&amp;ndash;architecture co-design, not less. This goes doubly for Apple: they already have &lt;a href="http://llvm.org/"&gt;their own compiler infrastructure&lt;/a&gt; and &lt;a href="http://www.apple.com/mac/app-store/"&gt;a tightly controlled software deployment system&lt;/a&gt;. If Apple eventually also builds its own CPUs, it will be perfectly positioned to generate hardware-specific binaries even for third-party apps (using either offline or JIT compilation). If Apple controls the whole widget, including the compiler, it has little incentive to  carefully tailor an ISA for backward- and forward-compatibility. In this setting, explicit SIMD instructions (or GPGPUs) are likely to offer all the benefits of macroscalar at even lower complexity.&lt;/p&gt;

&lt;p&gt;Broadly, I am bearish on macroscalar because now is not the time to be making drastic architectural changes for the sake of a little ILP. There are many more critical problems to be addressed, such as &lt;a href="http://www.tilera.com/"&gt;manycores&lt;/a&gt; and &lt;a href="http://sampa.cs.washington.edu/sampa/Deterministic_MultiProcessing_(DMP)"&gt;their programmability&lt;/a&gt;, &lt;a href="http://www.arm.com/products/processors/technologies/bigLITTLEprocessing.php"&gt;heterogeneous SoCs&lt;/a&gt;, and &lt;a href="http://www.cs.utexas.edu/~hadi/doc/paper/2011-isca-dark_silicon.pdf"&gt;dark silicon&lt;/a&gt; constraints. Tweaks to single-threaded ILP exploitation solve none of them. If compiler&amp;ndash;architecture co-design is on the table, much more radical opportunities are available.&lt;/p&gt;

&lt;p&gt;While macroscalar may not be Apple&amp;rsquo;s future in-house microarchitecture, it seems clear that they will eventually have one. There are too many Apple job postings for &lt;a href="http://jobs.apple.com/index.ajs?method=mExternal.showJob&amp;amp;RID=105000"&gt;RTL designers&lt;/a&gt;, &lt;a href="http://jobs.apple.com/index.ajs?method=mExternal.showJob&amp;amp;RID=111908"&gt;circuit engineers&lt;/a&gt;, and &lt;a href="http://jobs.apple.com/index.ajs?method=mExternal.showJob&amp;amp;RID=100006"&gt;compiler designers with &amp;ldquo;experience with developing compilers for novel micro-architectures and instruction sets&amp;rdquo;&lt;/a&gt; for the macroscalar patents to be a one-time affair. I&amp;rsquo;m excited to see what Apple&amp;rsquo;s microarchitecture offers&amp;mdash;beyond progress in the company&amp;rsquo;s ongoing quest to control every last detail of its products.&lt;/p&gt;

</content>
    <summary type="html">A couple months ago, [a story][ars] made the nerd-press rounds about Apple's trademark application and several patents for something called a "macroscalar" processor architecture. I've taken a stab at decoding the publicly available information about macroscalar architectures to give a coherent picture of the idea.

[ars]: http://arstechnica.com/apple/news/2012/02/apple-trademark-may-hint-at-processing-improvement-for-next-gen-a6-processor.ars
</summary>
  </entry>
  <entry>
    <id>tag:www.cs.washington.edu,2012-04-17:/homes/asampson/blog/greenclouds.html</id>
    <title type="html">Green Clouds</title>
    <published>2012-04-17T07:00:00Z</published>
    <updated>2012-04-17T07:00:00Z</updated>
    <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/blog/greenclouds.html" />
    <content type="html">&lt;p&gt;&lt;a href="http://seattletimes.nwsource.com/html/businesstechnology/2017997234_greenpeace18.html"&gt;Janet Tu writes in the Seattle Times today&lt;/a&gt; about a new &lt;a href="http://www.greenpeace.org/international/Global/international/publications/climate/2012/iCoal/HowCleanisYourCloud.pdf"&gt;Greenpeace report&lt;/a&gt; analyzing the use of renewable energy in data centers (or, as they put it, &amp;ldquo;the cloud&amp;rdquo;). I provided a couple of comments for the Times story and I&amp;rsquo;ll expand a little bit here on the importance and feasibility of improving energy efficiency in a cloud-centric world.&lt;/p&gt;

&lt;p&gt;While energy efficiency is a hot topic in computer architecture currently, environmental impact is not. With &lt;a href="http://dl.acm.org/citation.cfm?id=2150976.2150980"&gt;notable exceptions&lt;/a&gt;, academics focus on other motivations for efficiency: cost, cooling, battery life, and fundamental power density limits. But &lt;a href="http://iopscience.iop.org/1748-9326/3/3/034008/"&gt;computers consume a large portion of the world&amp;rsquo;s energy&lt;/a&gt;, so it&amp;rsquo;s important to recognize that computational efficiency can have an impact on climate change and other negative effects of energy consumption.&lt;/p&gt;

&lt;p&gt;&amp;ldquo;Cloud&amp;rdquo; energy consumption is particularly relevant from this perspective. Personal computing is currently undergoing a shift from mostly local (on-device) computation to a local&amp;ndash;remote hybrid model. Gmail, Siri, Google Docs, iCloud, and Office 365 all rely on data center computation in cooperation with local devices. (Academic proposals like &lt;a href="http://research.microsoft.com/en-us/um/people/alecw/mobisys-2010.pdf"&gt;MAUI&lt;/a&gt; and &lt;a href="http://www.princeton.edu/~ekoukoum/papers/Koukoumidis_Pocket_Cloudlets_ASPLOS_2011.pdf"&gt;Pocket Cloudlets&lt;/a&gt; follow the same trend.) Modern personal computers, from laptops to iPhones, spend energy from two different sources: energy from their batteries and energy &amp;ldquo;in the cloud&amp;rdquo;. You can be energy-conscious&amp;mdash;buy green energy from your utility and choose modern, low-power devices&amp;mdash;while remaining unaware of the energy you&amp;rsquo;re using in data centers.&lt;/p&gt;

&lt;p&gt;So Greenpeace is right to highlight the sustainability of cloud infrastructures. But, in doing so, the report conflates two different aspects of the cloud energy problem: &lt;em&gt;sourcing&lt;/em&gt; and &lt;em&gt;efficiency&lt;/em&gt;. The report&amp;rsquo;s quantitative analysis mainly examines the cloud providers&amp;rsquo; renewable energy sources, but efficiency&amp;mdash;the amount of energy used in the first place&amp;mdash;is arguably the more important long-term issue for sustainability.&lt;/p&gt;

&lt;p&gt;When we think in terms of efficiency, the picture becomes more complex than who&amp;rsquo;s-buying-what-from-whom. For example, the report  (rightly) praises Facebook for focusing on its servers&amp;rsquo; energy efficiency as demonstrated by its &lt;a href="http://opencompute.org/"&gt;Open Compute Project&lt;/a&gt;, but Facebook is not the only provider working aggressively on energy efficiency. In fact, cloud computing consolidates lots of work&amp;mdash;the work that we would once have done on desktops in our houses&amp;mdash;into data centers where a single company foots the bill for the energy used by that work. So cloud providers like Amazon and Microsoft have a powerful incentive to reduce their energy consumption and they take computational efficiency very seriously. This is one of the great benefits of cloud computing: you can let the experts at Microsoft or Rackspace focus on energy efficiency while you focus on the work you need to do. As &lt;a href="http://seattletimes.nwsource.com/html/businesstechnology/2017997234_greenpeace18.html"&gt;Amazon told Tu for the Times article&lt;/a&gt;, &amp;ldquo;cloud&amp;rdquo; computing is probably intrinsically more energy-efficient than operating many independent server rooms.&lt;/p&gt;

&lt;p&gt;The advantages of cloud consolidation apply to sourcing as well as efficiency.
&lt;a href="http://arstechnica.com/apple/news/2012/02/apple-confirms-plans-for-oregon-data-center-outlines-green-initiatives.ars"&gt;Apple is building massive on-site solar arrays&lt;/a&gt; to power its North Carolina and (upcoming) Oregon data centers. It&amp;rsquo;s much easier for Apple to convert iCloud to run on a private solar array than it is to mobilize millions of homes to do the same&amp;mdash;which would be necessary in a pre-cloud world where most computational work is local.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s also important to realize that much of the responsibility for energy-efficient computation lies with hardware designers rather than data center operators. Greenpeace doesn&amp;rsquo;t mention the role played by chip designers&amp;mdash;Intel, AMD, IBM, and Oracle&amp;mdash;in the energy consumption of clouds. Hardware design is at least as important as the deployment decisions made by cloud providers.&lt;/p&gt;

&lt;p&gt;Energy efficiency is a critical aspect of computing&amp;rsquo;s transition to a cloud-centric model&amp;mdash;for environmental reasons as well as others&amp;mdash;and Greenpeace&amp;rsquo;s report helps bring the issue into the public eye. In the long term, however, individual companies&amp;rsquo; power sourcing decisions are likely to be less important than the fundamental efficiency of computers. Computer architecture research needs to focus on issues like server energy proportionality, energy-based accounting, and energy-aware programming to address the need for efficient computing. The economic realities of cloud computing are such that cloud providers are incentivized to adopt innovations that curb energy consumption, so this is an area where we as researchers can hope for significant real-world impact.&lt;/p&gt;

</content>
    <summary type="html">[Janet Tu writes in the Seattle Times today][times] about a new [Greenpeace
report][report] analyzing the use of renewable energy in data centers (or,
as they put it, "the cloud"). I provided a couple of comments for the Times
story and I'll expand a little bit here on the importance and feasibility
of improving energy efficiency in a cloud-centric world.

[report]: http://www.greenpeace.org/international/Global/international/publications/climate/2012/iCoal/HowCleanisYourCloud.pdf
[times]: http://seattletimes.nwsource.com/html/businesstechnology/2017997234_greenpeace18.html
</summary>
  </entry>
  <entry>
    <id>tag:www.cs.washington.edu,2012-03-19:/homes/asampson/blog/kuow.html</id>
    <title type="html">On the Radio</title>
    <published>2012-03-19T07:00:00Z</published>
    <updated>2012-03-19T07:00:00Z</updated>
    <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/blog/kuow.html" />
    <content type="html">&lt;p&gt;Seattle&amp;rsquo;s local public radio station, &lt;a href="http://www.kuow.org/"&gt;KUOW&lt;/a&gt;, invited me for a short interview
on their daily show, &lt;a href="http://www.kuow.org/weekday"&gt;Weekday&lt;/a&gt; with Steve Scher. We talked about the &lt;a href="https://www.facebook.com/notes/facebook-engineering/announcing-the-2012-2013-facebook-fellows/10150558596698920"&gt;Facebook
fellowship&lt;/a&gt;, our group&amp;rsquo;s work on &lt;a href="/research.html"&gt;approximate computing&lt;/a&gt;, UW CSE&amp;rsquo;s
budget, and &lt;a href="http://www.hmc.edu/"&gt;Harvey Mudd&lt;/a&gt; (my alma mater). You can &lt;a href="http://www.kuow.org/program.php?id=26240"&gt;hear the program in
KUOW&amp;rsquo;s archives&lt;/a&gt; (my segment starts around 21:45). Steve is incredibly
friendly and a fantastic interviewer; I had a great time with my ten minutes of
public-radio fame.&lt;/p&gt;

&lt;p&gt;Also, I had the pleasure of meeting &lt;a href="http://en.wikipedia.org/wiki/Jack_Hitt"&gt;Jack Hitt&lt;/a&gt;, one of my very favorite
&lt;a href="http://www.thisamericanlife.org/"&gt;&lt;em&gt;This American Life&lt;/em&gt;&lt;/a&gt;-ites, briefly in the green room. He was talking
about his one-man show, &lt;a href="http://thejackhittplay.com/"&gt;&lt;em&gt;Making Up the Truth&lt;/em&gt;&lt;/a&gt;, after my interview.
His interview (also in the above audio archive) is way more interesting than
mine.&lt;/p&gt;

</content>
    <summary type="html">I appeared briefly on [Weekday][], a local public radio program, to talk
about the [Facebook fellowship][fbf] and energy-efficient computing. And I
met Jack Hitt!

[weekday]: http://www.kuow.org/weekday
[fbf]: https://www.facebook.com/notes/facebook-engineering/announcing-the-2012-2013-facebook-fellows/10150558596698920
</summary>
  </entry>
  <entry>
    <id>tag:www.cs.washington.edu,2012-02-12:/homes/asampson/blog/truffle.html</id>
    <title type="html">Truffle, an Architecture for Approximate Computing</title>
    <published>2012-02-12T08:00:00Z</published>
    <updated>2012-02-12T08:00:00Z</updated>
    <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/blog/truffle.html" />
    <content type="html">&lt;p&gt;In an extension of some &lt;a href="/blog/enerj.html"&gt;previous work on the EnerJ approximation-aware
programming language&lt;/a&gt;, I recently worked (with &lt;a href="http://www.cs.washington.edu/homes/hadianeh/"&gt;Hadi
Esmaeilzadeh&lt;/a&gt;, &lt;a href="http://www.cs.washington.edu/homes/luisceze/"&gt;Luis Ceze&lt;/a&gt;, and &lt;a href="http://research.microsoft.com/en-us/people/dburger/"&gt;Doug Burger&lt;/a&gt;) on designing a
hardware architecture to support this kind of approximate computing. The
architecture, which we call Truffle (a soft core with a hard shell&amp;mdash;this name
due to Hadi&amp;rsquo;s genius with nomenclature), allows parts of programs to run
precisely and parts in a low-power, approximate mode. &lt;a href="http://www.cs.washington.edu/homes/asampson/media/papers/truffle-asplos2012.pdf"&gt;Our paper about
Truffle&lt;/a&gt; was recently accepted to &lt;a href="http://research.microsoft.com/en-us/um/cambridge/events/asplos_2012/"&gt;ASPLOS 2012&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As we originally imagined in &lt;a href="http://www.cs.washington.edu/homes/asampson/media/papers/enerj-pldi2011.pdf"&gt;the EnerJ paper&lt;/a&gt;, we wanted a
processor that could alternate between precise and approximate operation in
several different structures:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Functional units (ALUs and FPUs).&lt;/li&gt;
  &lt;li&gt;On-chip caches.&lt;/li&gt;
  &lt;li&gt;Registers.&lt;/li&gt;
  &lt;li&gt;Main memory (not explored much in the Truffle, which is a CPU design).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that the CPU should have an ISA that allows approximate arithmetic
operations along with approximate loads and stores with carefully-defined
effects on the registers and caches. Truffle&amp;rsquo;s ISA lets the compiler use
approximate instructions&amp;mdash;for example, an &lt;code&gt;ADD.A&lt;/code&gt; instruction that executes an
approximate addition&amp;mdash;that have &lt;em&gt;approximate semantics:&lt;/em&gt; they do not define any
particular output behavior, instead allowing arbitrary errors to occur in the
data they compute. The register file and caches also both have two modes; they
switch between approximate and precise mode based on the instructions that write
to them. This makes it simple for a compiler to keep track of the storage
structures&amp;rsquo; modes and to carefully avoid storing precise data in approximate
mode (which could lead to corruption of critical data).&lt;/p&gt;

&lt;p&gt;In fact, this co-design with the compiler is an important facet of Truffle&amp;rsquo;s
design: to avoid spending energy to save energy, Truffle relies on all safety
checks to be performed off-line by the compiler. No costly dynamic checks need
to be performed at runtime. This also makes Truffle substantially simpler than
other information-flow-inspired architectures, which generally need to tag and
track data movement across the processor.&lt;/p&gt;

&lt;p&gt;In the end, simulations of EnerJ benchmarks on Truffle demonstrate energy
savings up to the 40% range&amp;mdash;which is clearly encouraging. We found that
Truffle&amp;rsquo;s energy savings are bounded by the processor&amp;rsquo;s &amp;ldquo;front end&amp;rdquo;: the fetch,
decode, and register renaming logic that must be performed precisely. In the
future, we hope to look into approximate computing structures that bypass the
need for a fully-precise front end to achieve even better energy efficiency.&lt;/p&gt;

</content>
    <summary type="html">I recently worked on a project, called Truffle, that lends some 
credibility to the architecture assumed by [EnerJ][enerjblog], the language
for approximate computing that I worked on previously. [The paper about
Truffle][trufflepaper] was recently accepted to [ASPLOS][asplos]! Woohoo! I
will give a talk about the project there in March.

[enerjblog]: /blog/enerj.html
[trufflepaper]: http://www.cs.washington.edu/homes/asampson/media/papers/enerj-pldi2011.pdf 
[asplos]: http://research.microsoft.com/en-us/um/cambridge/events/asplos_2012/
</summary>
  </entry>
  <entry>
    <id>tag:www.cs.washington.edu,2011-12-09:/homes/asampson/blog/powermeasurement.html</id>
    <title type="html">Measuring Smartphone Energy on a Budget</title>
    <published>2011-12-09T08:00:00Z</published>
    <updated>2011-12-09T08:00:00Z</updated>
    <link rel="alternate" href="http://www.cs.washington.edu/homes/asampson/blog/powermeasurement.html" />
    <content type="html">&lt;p&gt;For a recent research project, I measured the power consumption of a smartphone.
Because their battery life is a critical design constraint, it&amp;rsquo;s import to
understand how smartphones&amp;rsquo; software behavior influences their power usage, but
there aren&amp;rsquo;t many tools available for actually measuring power&amp;mdash;especially ones
that are remotely affordable. I&amp;rsquo;ve seen many papers that use expensive,
special-purpose power measurement tools like the &lt;a href="http://www.msoon.com/LabEquipment/PowerMonitor/"&gt;$771 one from Monsoon
Solutions&lt;/a&gt; or cumbersome custom setups. Because I&amp;rsquo;m utterly clueless
when it comes to electronics, I needed a straightforward apparatus that could be
controlled mostly through software. This post describes a simple, relatively
cheap setup I used to get some reasonable power measurements.&lt;/p&gt;

&lt;h3 id="the-equipment"&gt;The Equipment&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://www.cs.washington.edu/homes/asampson/media/apparatus/psup.jpeg" class="illus" width="350" height="179" /&gt;&lt;/p&gt;

&lt;p&gt;The only equipment I shelled out for in this setup was a DC
power supply: specifically, the &lt;a href="http://www.bkprecision.com/products/model/1696/programmable-dc-power-supply-1-20vdc-0-999a.html"&gt;BK Precision 1696&lt;/a&gt;. At $375, this power
supply has everything we need: a reasonable DC voltage and current range and,
crucially, a serial port for communication with a computer that can do the
measurement and control footwork. (I&amp;rsquo;m sure any similar equipment would work,
but I&amp;rsquo;ve written some software that works with this supply in particular&amp;mdash;see
below.) Unless you have an ancient serial-port-equipped host machine, you&amp;rsquo;ll
likely also want a USB serial interface like &lt;a href="http://www.amazon.com/TRENDnet-Serial-Converter-TU-S9-Blue/dp/B0007T27H8"&gt;this one&lt;/a&gt; (a very
affordable commodity part). The only other physical equipment necessary is a few
wires of the sort easily found in an electrical-engineering lab on any
university campus. (I cannot, of course, condone the &amp;ldquo;borrowing&amp;rdquo; of any such
materials&amp;hellip;)&lt;/p&gt;

&lt;h3 id="the-setup"&gt;The Setup&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://www.cs.washington.edu/homes/asampson/media/apparatus/connection.jpeg" class="illus" width="350" height="219" /&gt;&lt;/p&gt;

&lt;p&gt;The next challenge is to get the smartphone to run off of the power supply so we
can measure its energy consumption. The most straightforward way I found to do
this was to replace the device&amp;rsquo;s battery with the power supply. This just
involves removing the battery, finding the appropriate terminals, and wiring the
supply&amp;rsquo;s output to them. In my case&amp;mdash;I was measuring an original Motorola
DROID&amp;mdash;there were two extra pins for communication with the battery&amp;rsquo;s charge
meter; the phone worked fine with these pins left disconnected from anything.&lt;/p&gt;

&lt;p&gt;After looking up the battery&amp;rsquo;s voltage, I configured the power supply to that
level in &amp;ldquo;constant voltage&amp;rdquo; mode and turned on the phone. Shockingly, this
works! The DROID was slightly confused about its battery capacity level, but it
operated normally.&lt;/p&gt;

&lt;p&gt;To control the measurements, I used connected a host laptop via USB to both the
phone (to control the software) and the power supply (to take measurements).&lt;/p&gt;

&lt;h3 id="the-software"&gt;The Software&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://www.cs.washington.edu/homes/asampson/media/apparatus/netbook.jpeg" class="illus" width="350" height="261" /&gt;&lt;/p&gt;

&lt;p&gt;On the host machine, there are two main software components: one to talk to the
smartphone and one that talks to the power supply. The former is pretty
straightforward&amp;mdash;I manage the &amp;ldquo;rooted&amp;rdquo; Android software via SSH
(&lt;a href="https://market.android.com/details?id=berserker.android.apps.sshdroid&amp;amp;hl=en"&gt;SSHDroid&lt;/a&gt;)&amp;mdash;so I&amp;rsquo;ll just describe the power supply management here.&lt;/p&gt;

&lt;p&gt;First, I needed a driver for the USB/serial adapter. Most Linux boxes probably
have this installed; on Mac OS X, &lt;a href="http://osx-pl2303.sourceforge.net/"&gt;the osx-pl2303 project&lt;/a&gt; covers most
devices.&lt;/p&gt;

&lt;p&gt;I used a somewhat hard-to-find &lt;a href="http://kb.bkprecision.com/getattachment.php?data=MjB8UmVtb3RlX2NvbW11bmljYXRpb25fMTY5Nl8xNjk4LnBkZg%3D%3D"&gt;description of the BK Precision 1696&amp;rsquo;s serial
protocol&lt;/a&gt; to build a &lt;a href="https://github.com/sampsyo/bkp1696"&gt;Python library for communicating with the power
supply&lt;/a&gt;. The library takes care of the command encoding, conversion
between ordinary floating-point numbers and the protocol&amp;rsquo;s arcane fixed-point
formats, and other mundanities. To get started, &lt;a href="https://github.com/sampsyo/bkp1696"&gt;download the library&lt;/a&gt;
and start coding:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import psup
with psup.Supply() as sup:
    sup.voltage(1.3)
    volts, amps = sup.reading()
    print '%f V, %f A' % (volts, amps)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;Supply&lt;/code&gt; class represents a connection to the power supply. It&amp;rsquo;s also a
&lt;a href="http://docs.python.org/library/stdtypes.html#typecontextmanager"&gt;context manager&lt;/a&gt;, so if you wrap your code in a Python &lt;code&gt;with&lt;/code&gt; statement,
the connection will be automatically opened and then cleaned up when you&amp;rsquo;re
done. Once you have a connection, you can call &lt;code&gt;voltage&lt;/code&gt; to set the voltage,
&lt;code&gt;reading&lt;/code&gt; to get the current sampled voltage and current, &lt;code&gt;maxima&lt;/code&gt; to get the
acceptable parameter ranges, et cetera. I haven&amp;rsquo;t implemented the entire serial
protocol, but I found these commands to be enough to conduct reasonable
power-measurement experiments. If you want to add any functionality, please feel
free to fork the project on GitHub or &lt;a href="&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#097;&amp;#115;&amp;#097;&amp;#109;&amp;#112;&amp;#115;&amp;#111;&amp;#110;&amp;#064;&amp;#099;&amp;#115;&amp;#046;&amp;#119;&amp;#097;&amp;#115;&amp;#104;&amp;#105;&amp;#110;&amp;#103;&amp;#116;&amp;#111;&amp;#110;&amp;#046;&amp;#101;&amp;#100;&amp;#117;"&gt;&amp;#115;&amp;#101;&amp;#110;&amp;#100;&amp;#032;&amp;#109;&amp;#101;&amp;#032;&amp;#112;&amp;#097;&amp;#116;&amp;#099;&amp;#104;&amp;#101;&amp;#115;&lt;/a&gt;.&lt;/p&gt;

</content>
    <summary type="html">For a recent research project, I measured the power consumption of a
smartphone. I am clueless when it comes to electronics and I didn't want to
drop a lot of (my advisor's) cash, so I needed a simple, relatively cheap
setup to get reasonable power measurements. This post describes how you can
get a similar apparatus up and running with a [custom Python
library][pylib] I wrote for controlling a DC power supply.

[pylib]: https://github.com/sampsyo/bkp1696
</summary>
  </entry>
</feed>

