<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Future Chips</title>
	
	<link>http://www.futurechips.org</link>
	<description />
	<lastBuildDate>Wed, 19 Jun 2013 06:25:20 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/FutureChips" /><feedburner:info uri="futurechips" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Meet Flux7 Labs (update + shameless marketing)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/yiufNLZu2ks/quick-post-meet-flux7-labs-blatant-marketing.html</link>
		<comments>http://www.futurechips.org/thoughts-on-latest-happenings/quick-post-meet-flux7-labs-blatant-marketing.html#comments</comments>
		<pubDate>Tue, 18 Jun 2013 16:56:13 +0000</pubDate>
		<dc:creator>Aater Suleman</dc:creator>
				<category><![CDATA[Thoughts on Latest Happenings]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[big data training]]></category>
		<category><![CDATA[cassandra]]></category>
		<category><![CDATA[cassandra training]]></category>
		<category><![CDATA[consulting]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[flux7 labs]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hadoop training]]></category>
		<category><![CDATA[mongodb training]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1372</guid>
		<description><![CDATA[I made a major career decision two months ago to leave my day job and start Flux7 labs, a consulting and training firm. I made this decision because Flux7 Labs allows me to follow my passion of teaching (providing Big Data/NoSQL trainings) and solving challenges in latest technologies like Hadoop, Cassandra, Twitter Storm, etc. Please <a href='http://www.futurechips.org/thoughts-on-latest-happenings/quick-post-meet-flux7-labs-blatant-marketing.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>I made a major career decision two months ago to leave my day job and start Flux7 labs, a consulting and training firm. I made this decision because Flux7 Labs allows me to follow my passion of teaching (providing Big Data/NoSQL trainings) and solving challenges in latest technologies like Hadoop, Cassandra, Twitter Storm, etc. Please visit the <a href="http://www.linkedin.com/company/flux7-labs">LinkedIn profile</a> to learn more.</p>
<p>One of the best parts is that I will have more time and incentive to write blog posts. For those of you who read my post on Linked Lists, I am finally doing a performance benchmark to generate data that proves my points. For those interested in Big Data, I am writing a qualitative and quantitate comparison of different Hadoop distributions out there. Exciting times ahead.</p>
<p>&nbsp;</p>
<p><span id="more-1372"></span></p>
<p><span style="font-size: 1.5em;">About Flux7 Labs</span></p>
<p>I have setup Flux7 Labs like a research lab in my home office (aka. garage) where my team and I partner with selected clients to work on innovative and new technologies. Our website (<a href="http://www.flux7.com">http://www.flux7.com</a>) is WIP but you can visit our <a href="http://www.linkedin.com/company/flux7-labs">LinkedIn profile</a> to learn more (followers and likes will be highly appreciated). The work is focused on measuring and improving performance of Big Data and Cloud Computing. We also teach the same topics as on-site/online corporate or independent trainings.</p>
<p>In the lab, we are working on some very exciting projects about measuring, simulating, and increasing performance of Hadoop, Cassandra, CouchDB,  etc using hardware+software techniques in the context of BioInformatics, Renewable energy, banking, and video analytics applications. Our clients/partners include Xockets Inc., Bioaxial Inc., Horan &amp; Bird, Bank of America, Cloudant Inc, and Automaton Inc.</p>
<div><em>Some philosophy: I have learned a lot to share about this field in the last two years so expect more posts from me on Big Data performance. I am seeing a lot of things people do wrong in their Big Data deployments from an architecture standpoint. I believe this is because the field is new and not everyone thinks about scaling and performance upfront. What is intriguing is that many of the problems and their solutions seen in the Big Data space are the same as what we see in CPU architecture, just at a different scale. For example, challenges I faced at Intel while designing a coherence network for Intel Xeon Phi or during my thesis about the ACMP apply directly in the NoSQL domain without changes. </em></div>
<p>Broadly, at Flux7 Labs, we accept projects related to:</p>
<div>
<ul>
<li>Big Data (Hadoop, Twitter Storm, and related technologies like HDFS, HBase, Pig, Hive, etc)</li>
<li>Databases (CouchDB, MongoDB, Cassandra, MySQL)</li>
<li>Cloud Computing (AWS, OpenStack)</li>
<li>Video processing systems</li>
<li>Anything else thats challenging, innovative, and needs performance</li>
</ul>
<p>We can handle optimization across all seven performance determinants: app software, OS, compiler, disk, network, CPU, and memory. Having worked in low-power throughout computing hardware for the last seven years, I have a specific knack for Xeon Phi and Microservers (Intel ATOM and ARM-based servers from Calxeda and other vendors) so any consulting/projects in that area are particularly interesting for me.</p>
</div>
<p>Any technical or sales/marketing feedback, tips, guidance, introductions, and references will be much appreciated.</p>
<p>Thank you for your time and sorry for putting you through the shameless marketing.</p>
<p><em>Side note: Shoot me an email if you are in a situation where you are having to decide between your job and venturing into a new career. I made a comparison spreadsheet that you may like. </em></p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/yiufNLZu2ks" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-on-latest-happenings/quick-post-meet-flux7-labs-blatant-marketing.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-on-latest-happenings/quick-post-meet-flux7-labs-blatant-marketing.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=quick-post-meet-flux7-labs-blatant-marketing</feedburner:origLink></item>
		<item>
		<title>ARM Virtualization – ARM vs x86 (Part 5)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/GKwcZPWxsFs/arm-virtualization-arm-x86-part-5.html</link>
		<comments>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.html#comments</comments>
		<pubDate>Tue, 30 Apr 2013 15:01:30 +0000</pubDate>
		<dc:creator>Ali Hussain</dc:creator>
				<category><![CDATA[Software for Hardware guys]]></category>
		<category><![CDATA[Thoughts on Latest Happenings]]></category>
		<category><![CDATA[Tips for Power Coders]]></category>
		<category><![CDATA[arm]]></category>
		<category><![CDATA[CISC]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[RISC]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1344</guid>
		<description><![CDATA[Sorry for the delay in this post. I could not get to this post in time and wanted to be sure it is well-researched. The final post in this series is a comparison of the hardware support in the ARM and x86 world. As mentioned in the previous post the biggest reason for ARM to <a href='http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Sorry for the delay in this post. I could not get to this post in time and wanted to be sure it is well-researched. The final post in this series is a comparison of the hardware support in the ARM and x86 world. As mentioned in the previous post the biggest reason for ARM to include virtualization in their architecture is to be viable in the server market against x86. So I think a comparison of x86 and ARM hardware support for virtualization is warranted.</p>
<p><span id="more-1344"></span></p>
<p>The extensions added to both ISAs have a similar scope. Which isn&#8217;t surprising since they have the same goals. These goals, first formalized by <a href="http://en.wikipedia.org/wiki/Popek_and_Goldberg_virtualization_requirements">Popek and Goldberg</a> in 1974, are:</p>
<ol>
<li>A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly.</li>
<li>The VMM must be in complete control of the virtualized resources.</li>
<li>A statistically dominant fraction of machine instructions must be executed without VMM intervention.</li>
</ol>
<div>Still there are differences in the implementation of each and tradeoffs chosen which make an interesting case study.</div>
<p><strong>The Players</strong><br />
The ARM architecture is owned by ARM. ARM designs its own processors and licenses its architecture to some other chip manufacturers. In the x86 world both Intel and AMD have their own offerings. Intel’s offerings fall under the umbrella marketing term Virtualization Technologies or VT. VT is composed of  VT-x, which encompasses the core side features; VT-d, which encompasses the IOMMU; and VT-c, which covers the network interface. AMD markets its core side virtualization offerings under the label AMD-V while the IOMMU is marketed under AMD-Vi. The Intel and AMD offerings are different in implementation details but are more closely architected to each other than ARM’s offering.</p>
<p><strong>Design Philosophies</strong><br />
The support for virtualization in both architectures is similar but the implementation details are colored by the design philosophies of each architecture. So before we start comparing the specifics of each architecture we need to talk about the general philosophies guiding the design of each.</p>
<p><em>RISC vs CISC</em><br />
When we compare x86 virtualization vs ARM virtualization their respective pedigrees as CISC and RISC machines start to be visible. In CISC (Complex Instruction Set Computer), instructions are more powerful and try to bridge the semantic gap between an assembly programmer and the machine. In RISC (Reduced Instruction Set Computer), instructions are simpler giving more flexibility to the software. There are tradeoffs in each philosophy dealing with implementation difficulty, code density, and compiler support.</p>
<p><em>Backwards Compatibility</em><br />
The extensions in both ARM and x86 mostly offer backwards compatibility to existing software. Intel has historically gone the extra mile in providing backward compatibility. It is this effort that allows experiments like the one in the video below:</p>
<p><iframe src="http://www.youtube.com/embed/vPnehDhGa14" frameborder="0" width="420" height="315"></iframe></p>
<p>A commitment to backwards compatibility makes software porting significantly easier. But over time it results in cruft in the architecture that make implementation more difficult. It also prevents a major paradigm shift. While ARM provides backwards compatibility, generally ARM is more willing to make major architectural changes.</p>
<p><em>Ecosystem</em><br />
A lot of differences between the two architectures stem from the different business models of ARM and Intel. ARM is an IP company that does no manufacturing. It designs an architecture, processor cores, and system IP blocks. Customers may choose to buy all or part of their offerings and customize their SoC according to their needs.</p>
<p>Intel on the other hand manufactures their own core, their own chipset, and provides reference designs to OEMs. There is very little customization and Intel is the owner of all the blocks in the system except for external devices that communicate through a standard interface. AMD for the most part operates in the same manner with a top-down design philosophy.</p>
<p><strong>Feature Implementation Comparison</strong><br />
At a high level the features supported in each architecture are very similar. They all support two stage translations. Intel calls this Extended Page Tables, AMD calls it Rapid Virtualization Indexing. They both support exception and interrupt interception by hypervisor and triggering of interrupts in the guest. They maintain host state separate from the VM state.</p>
<p><em>Privilege Levels</em><br />
In x86 the expectation is for the processor to run Windows or any other OS right out of the box without any major changes. The hypervisor needs to be a trusted program that can be installed on the operating system. So the hypervisor works as a part of privileged mode using special virtualization instructions, including instructions for enabling and disabling the hypervisor. Once the hypervisor is enabled a privilege separation is created for host versus VM level.</p>
<p>This is very different from how the privileges are organized in the ARM world. The developers don’t have the same expectations from ARM. Until very recently most of the commercial ARM devices did not allow the user to purchase and use software of their choice. This allowed the device maker to make major changes to the software. When implementing virtualization ARM does not have the same restrictions present in the x86 world. Because of this ARM&#8217;s approach to the privilege levels is different from x86.In ARM the hypervisor is a new mode at a higher privilege level than the regular kernel. Using a hypervisor requires changes to the boot code and it is incompatible with running the hypervisor as an application on an unchanged OS.</p>
<p><em>Virtual Machine Entry</em><br />
The VT-x enhancement create a memory mapped area called VMCS (Virtual Machine Control State) for each of the VMs. This area is setup with a state for all the registers and a VM Launch instruction is used to mirror this state onto the processor. AMD has similar instructions with a similar data structure.</p>
<p>ARM on the other hand does not make these software simplifications and expects the hypervisor to manually modify any registers it needs to modify. There isn’t even a special instruction to launch a VM. In the ARM architecture to launch a VM the hypervisor sets the Exception Link Register (ELR) to the desired PC (Program Counter, Instruction Pointer in x86) and performs an exception return.</p>
<p>This is in line with ARM being a RISC architecture and x86 being a CISC architecture. Analogous to the launch of a VM, a similar structure is created for handling process details in the x86 world while in ARM it is entirely maintained by the OS.</p>
<p><em>Exception Intercepts</em><br />
ARM allows redirecting most exceptions to the hypervisor. The ARM architecture allows intercepting interrupts. There is a special mode that allows running an application directly under the hypervisor and this mode allows intercepting undefined instruction and supervisor call exceptions but enabling this mode disables the kernel level. In the ARM architecture, however, there isn’t a policy for redirecting page faults from the kernel to the hypervisor.</p>
<p>In x86 a page fault can be redirected to the hypervisor. This allows the hypervisor to page out the exception vectors for the guest and helps in providing shadow page tables. This feature is necessary in x86 because when hardware support for virtualization was initially introduced, it did not contain support for a two stage tablewalk. The ARM architecture requires 2-stage tablewalks be implemented when implementing virtualization. Since a 2-stage tablewalk usually has higher performance and is easier to implement than shadow page tables, the lack of this feature isn’t a problem for the ARM architecture.</p>
<p><em>IOMMU and Devices</em><br />
Both ARM and x86 chipsets have support for the features we talked about in <a title="ARM Virtualization – I/O Virtualization (Part 3)" href="http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html">Post 3</a>. One thing to note is that IOMMU does not need core support so in the ARM world there is a potential for customers to choose their own implementation of IOMMU than to go with ARM&#8217;s offering. In addition to the IOMMUs mentioned already there are other important implementations. There is a <a href="http://www.pcisig.com/specifications/iov/">standard defined</a> for an IOMMU for PCI-Express devices. In addition graphics cards have supported the <a href="http://en.wikipedia.org/wiki/Graphics_address_remapping_table">GART</a> IOMMU since AGP.</p>
<p>Different devices may offer different amounts of support for virtualization. That may include sharing a device between multiple virtual machines. For example, Intel&#8217;s VT-c allows the ethernet controller to be shared between multiple virtual machines.</p>
<p>In conclusion both ARM and x86 have similar hardware extensions for virtualization. Which is expected since both of the architectures are competing for the same market. The implementation of each is different in several ways as mentioned and it reflects the pedigree of each architecture. And ARM&#8217;s support for virtualization points to an interesting future.</p>
<p><em>For further reading please check out the links below:</em></p>
<p><em><a href="http://support.amd.com/us/Processor_TechDocs/24593_APM_v2.pdf">http://support.amd.com/us/Processor_TechDocs/24593_APM_v2.pdf</a></em><br />
<em> <a href="http://www.intel.com/content/www/us/en/virtualization/virtualization-technology-connectivity-technology-brief.html">http://www.intel.com/content/www/us/en/virtualization/virtualization-technology-connectivity-technology-brief.html</a></em><br />
<em> <a href="http://download.intel.com/products/processor/manual/325462.pdf">http://download.intel.com/products/processor/manual/325462.pdf</a></em></p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/GKwcZPWxsFs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=arm-virtualization-arm-x86-part-5</feedburner:origLink></item>
		<item>
		<title>ARM Virtualization – Applications (Part 4)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/ubL7weTq1Yw/arm-virtualization-part-4-applications.html</link>
		<comments>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html#comments</comments>
		<pubDate>Mon, 08 Apr 2013 14:51:26 +0000</pubDate>
		<dc:creator>Ali Hussain</dc:creator>
				<category><![CDATA[Software for Hardware guys]]></category>
		<category><![CDATA[Thoughts on Latest Happenings]]></category>
		<category><![CDATA[big.LITTLE]]></category>
		<category><![CDATA[cortex a15]]></category>
		<category><![CDATA[cortex a7]]></category>
		<category><![CDATA[Exynos 5]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1324</guid>
		<description><![CDATA[In the last few posts we discussed the hardware support needed to provide virtualization. In this post how virtualization can empower the user. We&#8217;ll discuss the use cases we already see in the server and desktop space, and mobile specific applications like big.LITTLE and lowering production costs for handsets. The first post in this series <a href='http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p dir="ltr">In the last few posts we discussed the hardware support needed to provide virtualization. In this post how virtualization can empower the user. We&#8217;ll discuss the use cases we already see in the server and desktop space, and mobile specific applications like big.LITTLE and lowering production costs for handsets.</p>
<p><strong><strong><span id="more-1324"></span></strong></strong></p>
<p dir="ltr">The <a title="ARM Virtualization Extensions — Introduction (Part 1)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html">first post</a> in this series had an overview of virtualization. The <a title="ARM Virtualization Extensions – Memory and Interrupts (Part 2)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html">second post</a> went deeper into the features added to support virtualization in the core. The <a title="ARM Virtualization – I/O Virtualization (Part 3)" href="http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html">third post</a> talked about support needed at the system level for virtualization. This post will focus on use cases of virtualization.</p>
<p dir="ltr"><strong>Server Space Applications</strong></p>
<p dir="ltr">The biggest impact of virtualization has been in redefining the modern datacenter. Virtualization provides fault tolerance, VM migration, sandboxing. Without virtualization we wouldn’t have services like EC2 that are powering many of the world’s biggest websites. With ARM and Intel rattling sabers for the last few years and the release of the Atom processor line by Intel, it was only a matter of time before ARM had to make a play for the server space, and to do this ARM needed a competing offering in virtualization. This is the biggest reason for providing hardware support for virtualization.</p>
<p dir="ltr"><strong>Using Cross Platform Applications</strong></p>
<p dir="ltr">Many of us already use virtualization for this purpose. You can run a different OS on top of the current OS, or just run select services to support the current application. There can be extensions to this idea. A thin OS can be packed with an application to provide compatibility while running on a variety of platforms, similar to running a Java application but not requiring binary translation.</p>
<p dir="ltr"><strong>big.LITTLE</strong></p>
<p dir="ltr">You may have heard of Samsung’s Exynos 5 Octa SoC. The highlight of this SoC is it provides a cluster of  four high performance, high power Cortex-A15 cores and a cluster of four low power but low performance Cortex-A7 cores. Most of the time the performance requirements for a cell-phone can be met with the low power Cortex-A7 core. At this time the high power Cortex-A15 cores are powered down. Although this migration can be done without virtualization. Although with virtualization this feature naturally falls into VM migration and is significantly easier to implement in software.</p>
<div class="wp-caption aligncenter" style="width: 835px"><img title="Exynos 5 Octa Block Diagram" src="http://images.anandtech.com/galleries/2637/Screen%20Shot%202013-02-20%20at%2012.42.46%20PM_575px.png" alt="Courtesy of Anandtech" width="825" height="600" /><p class="wp-caption-text">Exynos 5 Octa SoC block diagram</p></div>
<p dir="ltr">Figure below courtesy of <a href="http://www.anandtech.com/show/6768/samsung-details-exynos-5-octa-architecture-power-at-isscc-13">Anandtech</a> gives an idea of the potential power savings from a big.LITTLE configuration. Although this slide gives an idea of the power and performance tradeoffs a few points need to be made about the methodology. The first is the tradeoff we&#8217;re measuring is not power vs performance. It is energy vs performance. The energy consumed for completing a task is average power divided by performance for that task provided the core can enter a low power mode once the task is complete.</p>
<p dir="ltr">The second thing that caught my eye in that graph was the use of Dhrystone as a benchmark, as indicated by using the unit DMIPS on the y-axis. Dhrystone is an old synthetic benchmark that no longer reflects real world usage. For example the entire workload fits in the L1 cache so it does not see any performance benefits from the larger L2 cache in the Cortex-A15, yet it also doesn&#8217;t reflect the increased power consumption from the L2 cache. There is no right or wrong answer to what is the correct benchmark to use. It always depends on the particular use case, but using SPECint would have been a more realistic indicator of system performance and power consumption.</p>
<p dir="ltr"><img class="alignnone" title="Exynos 5 power vs performance graph" src="http://images.anandtech.com/doci/6768/Screen%20Shot%202013-02-20%20at%2012.42.41%20PM.png" alt="" width="1076" height="772" /></p>
<p dir="ltr"><strong>Separating OS Kernel From Device Drivers</strong></p>
<p dir="ltr">Virtualization can be used to separate the operating system from the underlying hardware. This separation can be done using one of the methods mention in the <a title="ARM Virtualization Extensions – Memory and Interrupts (Part 2)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html">last post</a>. This separation has several practical applications. The first would be reducing development time and costs by making it easier to port an OS to a newer device. But the more important advantage would be in rolling out OS updates. This is a commonly seen problem with Android. After an Android update is rolled out, each of the device manufacturers have to port the changes to their system. This significantly slows down the delivery of updates. Separating the OS from the underlying hardware can speed up the processing of rolling out the updates.</p>
<p dir="ltr"><strong>Separating Business From Pleasure</strong></p>
<p dir="ltr">Our cellphones have become the biggest extension of our identity. Our cellphones have our contacts, our photos, our daily schedules, our emails, our text communications. This mixing of personal and business information can cause problems. Right now we give IT administrators extensive control over our personal information. If IT believes my phone is stolen they have the authority to issue a remote wipe of it. On the flipside IT is still  not able to protect corporate data appropriately. For example, they cannot I can still install a malware app on the system that can compromise the data stored on my phone. This problem can be solved by creating separate VMs for work and personal usage on phones. So compromising a personal VM does not affect the corporate data. This point is discussed in this <a href="http://blogs.arm.com/smart-connected-devices/615-importance-of-hw-virtualization-in-arm-cortex-a7-biglittle-processing/">blog post</a> by ARM (video below):</p>
<p><iframe src="http://www.youtube.com/embed/Tx4volfYWT0" frameborder="0" width="560" height="315"></iframe></p>
<p dir="ltr"><strong>Allow Different OSes On Same Device</strong></p>
<p dir="ltr">Going back to reducing R&amp;D costs. Manufacturers can also create a single handset that runs different OSes, say Android or Windows Phone 8. Devices sold can come preloaded with installers for both OSes and on first time startup the user can choose what OS they want to use. This would reduce development and deployment cost while causing a slight waste of Flash memory. But the idea can be extended further. A try before you buy policy can be implemented allowing a user to use a different OS than they&#8217;re used to and if they don&#8217;t enjoy the experience they can revert to the system they are familiar with.</p>
<p dir="ltr"><strong>Security Features</strong></p>
<p dir="ltr">The hypervisor provides a layer of privilege higher than the OS. The OS can be excluded from accessing regions of memory by the hypervisor. So the hypervisor can be used to store high security data like keys for financial accounts, hashes for checking application integrity, and yes DRM. While the geek in me protests at walling off parts of the system I paid for, the pragmatist in me accepts that these features improve the user experience of someone like my grandmother and enables features that wouldn&#8217;t be implemented otherwise.</p>
<div>That concludes this post about the potential applications of virtualization. If you have any other ideas for applications of virtualization I&#8217;d like to hear about it in the comments. The final <a title="ARM Virtualization – ARM vs x86 (Part 5)" href="http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.html">post</a> in this series compares ARM and x86.</div>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/ubL7weTq1Yw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=arm-virtualization-part-4-applications</feedburner:origLink></item>
		<item>
		<title>ARM Virtualization – I/O Virtualization (Part 3)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/JOX51hGiENw/arm-virtualization-part-3-iommu.html</link>
		<comments>http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html#comments</comments>
		<pubDate>Mon, 01 Apr 2013 15:25:25 +0000</pubDate>
		<dc:creator>Ali Hussain</dc:creator>
				<category><![CDATA[Chip Design for All]]></category>
		<category><![CDATA[Tips for Power Coders]]></category>
		<category><![CDATA[Understanding Chips]]></category>
		<category><![CDATA[AMD-Vi]]></category>
		<category><![CDATA[arm]]></category>
		<category><![CDATA[cortex a15]]></category>
		<category><![CDATA[Cortex A57]]></category>
		<category><![CDATA[drivers]]></category>
		<category><![CDATA[emulation]]></category>
		<category><![CDATA[Intel VT-d]]></category>
		<category><![CDATA[IOMMU]]></category>
		<category><![CDATA[paravirtualization]]></category>
		<category><![CDATA[System MMU]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1296</guid>
		<description><![CDATA[In the second part of the series we introduced memory management and interrupt handling support provided by virtualization hardware extensions. But effective virtualization solutions need to reach beyond the core to communicate with peripheral devices. In this post we discuss the various techniques used for virtualizing I/O, the problems faced, and the hardware solutions to <a href='http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p dir="ltr">In the <a title="ARM Virtualization Extensions – Memory and Interrupts (Part 2)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html">second part</a> of the series we introduced memory management and interrupt handling support provided by virtualization hardware extensions. But effective virtualization solutions need to reach beyond the core to communicate with peripheral devices. In this post we discuss the various techniques used for virtualizing I/O, the problems faced, and the hardware solutions to mitigate these problems.</p>
<p><strong><strong><span id="more-1296"></span></strong></strong></p>
<p dir="ltr"><strong>The Difficulty Of Virtualizing I/O</strong></p>
<p dir="ltr">Before we talk about the hardware solutions at the system level for virtualization we need to set up a motivation for what is driving these features. To appreciate the problems we have to recognize that in some ways communicating with I/O in a virtualized environment is a paradox. We want to run an operating system in a sandboxed environment where it is oblivious to the the system outside the virtual environment. But I/O cannot be oblivious to the outside environment because it is communicating with that environment. So, understandably virtualizing I/O becomes a difficult problem.</p>
<p dir="ltr">So moving away from the philosophical questions, what is the goal of virtualization and how does I/O fit into that goal?  In my view it is to provide a managed environment for hosting a VM that improves the overall user experience. To achieve this goal, ideally we’d like I/O in a VM to have the following properties:</p>
<ol>
<li>The guest has access to the same I/O devices it would use in a native environment.</li>
<li>The guest OS cannot affect the I/O operations or memory of other guests.</li>
<li>The software changes to the guest OS must be minimal.</li>
<li>The guest OS needs to be able to recover from a failure of the hardware or migration of the VM.</li>
<li>The I/O operations on the guest OS should have similar performance to running natively.</li>
</ol>
<p dir="ltr">In this list we can see how several items on the list are competing with other items on the list. So the final solution will require trade-offs based on the particular use-case. Now, With these goals in mind let us look at the various techniques for implementing I/O virtualization and the problems faced.</p>
<p dir="ltr"><strong>Emulated Or Paravirtualized Devices</strong></p>
<p dir="ltr">When implementing full virtualization, one of the simplest options is for the guest OS to emulate a virtual device on the host. The guest communicates with this virtual device and the hypervisor detects the guest’s communication. This can be done using trapping of device accesses, or permissions to certain pages of memory. The hypervisor understands the operations by the guest OS on the virtual device and performs the corresponding operation on the physical device. This technique is called hosted or split I/O.</p>
<p style="text-align: center;"><strong><strong><img class="aligncenter" src="https://lh4.googleusercontent.com/jeUpythafFs2QMn5dUlBJcdhpb-ta0UFdyRE8Ukls2l1hc2PeSERqr_maFt3fSsQssSb37d-HELdLSluF0akwJgq5cEQI3Z_ZXtitDwfpK0zuokeNitwTlhQ" alt="" width="388px;" height="497px;" /><br />
</strong></strong></p>
<p dir="ltr">The advantage of this technique is that since every call goes through the hypervisor, the hypervisor can provide the desired functionality. For example the hypervisor can track every I/O operation the device is presently waiting on. Similarly restricting a guest from affecting other guests becomes simplified because all physical device accesses are managed by the hypervisor. But this technique has a high CPU overhead. The data needs to be copied multiple times, processed through multiple I/O stacks, etc.</p>
<p dir="ltr">The performance can be improved by using paravirtualization. In this case the device drivers in the OS implement an ABI with the hypervisor. The device drivers interface with the hypervisor and the hypervisor directly communicates with the physical device as is shown in the figure below.</p>
<p style="text-align: center;"><strong><strong><br />
<img class="aligncenter" src="https://lh3.googleusercontent.com/lclIUi7RHzpTxniuvimIRc8ME4a8nn_HgbZzj27LO2hRn_fe0moAxEt-v8GbI4CopPrXePaddpCT66En6k1T9mqkekiKFYf-cikPyA6vW7FC_45T0mpT2RZQ" alt="" width="217px;" height="484px;" /><br />
</strong></strong></p>
<p style="text-align: left;" dir="ltr">This technique provides better performance with similar control but there is still a significant performance overhead, for example, in trapping to the hypervisor. Figure below shows the difference observed by IBM in using an emulated IDE controller vs IBM’s virtio-blk paravirtualized device drivers in KVM.</p>
<p style="text-align: left;" dir="ltr"><img class="aligncenter" src="https://lh5.googleusercontent.com/30qwIFHl0x1t6p2gJU_e9PbS4Af6r16xCbv4UIDlmb-jvsGLKP0hhOFZtVlFCrureQ3p8q8Z0AqOCOpRGcHzx_YFsFqo587MGJaLguZ9SyLtRQoXpA8kCvK5" alt="" width="655px;" height="384px;" /></p>
<p>When looking at this overhead it is important to keep in mind it is very use-case dependent. A CPU bound benchmark will not show much sensitivity to the virtualization of I/O. Alternatively for an I/O heavy benchmark this overhead can be significant. As an example the conjugate-gradient method for solving a system of linear equation spends around 70% of CPU cycles in the user mode and spends the remaining time in the hypervisor kernel engaged in disk I/O.</p>
<p dir="ltr"><strong>Passthrough I/O</strong></p>
<p dir="ltr">Passthrough I/O greatly improves performance by remapping the guest page tables to directly write to the physical device. This eliminates most of the overhead in trapping to the hypervisor for every operation. This technique brings the bulk of I/O processing to near-native speeds.</p>
<p style="text-align: center;"><strong><strong><br />
<img class="aligncenter" src="https://lh3.googleusercontent.com/2T6Lxa9d9n7owQjS_YlU6bFWmf9StpF8f9TbuvYMO06P6SN-ttEdleHlE9ihpsPojfe4YiToF2dON_EjT9gGdfYuO3uWMWahgJ0o3iUglLspLi6evlsOa6Ix" alt="" width="256px;" height="511px;" /><br />
</strong></strong></p>
<p dir="ltr">There are several issues that need to be addressed to effectively virtualize I/O using this technique. Consider the case of a guest using DMA accesses to communicate with a device. In this scenario we need to account for the following issues.</p>
<p dir="ltr"><strong>Isolation</strong></p>
<p dir="ltr">The goal of virtualization is to to sandbox the guest OS to keep it from accessing the data of other guest OSes. We do this in the guest by adding a second stage translation. However, the DMA devices operate on physical addresses and are not aware of second stage translations. So if a guest is given unrestricted access to a DMA device it can read or write to any physical address in memory and corrupt the memory of other guests. So there needs to be a protection mechanism instituted to make sure a device only directs DMA requests from a particular guest to go to memory associated with that guest.</p>
<p dir="ltr">Furthermore, more than one guest may need to access the same device. The device needs to be able to distinguish between the accesses coming from different devices and redirect them correctly.</p>
<p dir="ltr"><strong>Physical Address</strong></p>
<p dir="ltr">To complete the DMA transaction the guest OS needs to provide the device with the proper physical address in memory to find the data. But the guest does not know the physical address of the data, only the Intermediate Physical Address (IPA) which is in essence a virtual address. For the DMA access to work the device must be able to translate the IPA to the correct physical address.</p>
<p dir="ltr"><strong>Contiguous Memory Blocks</strong></p>
<p dir="ltr">The problem cannot be solved by just providing the device with the correct PA. The device expects the DMA target region to be located in a contiguous region of memory. In a virtualized environment this is not guaranteed. The hypervisor may allocate guest pages that are not contiguous in as small as 4K blocks. So the device must be able to do this translation for the entire DMA region.</p>
<p dir="ltr"><strong>32 Bit Devices In Larger Address Spaces</strong></p>
<p dir="ltr">This problem is similar to the problem with a 32 bit guest on a 64 bit host discussed in the previous post. The system may have older devices that cannot access the complete larger address spaces of newer systems. An address translation is necessary to use these devices with a DMA outside their normal addressable range.</p>
<p dir="ltr"><strong>Hardware Support</strong></p>
<p dir="ltr">The problems mentioned above are not easily solved in software and need a hardware solution that correctly maps device addresses to the correct guest. Most platforms have hardware solutions for this. This mechanism is called <a href="http://en.wikipedia.org/wiki/IOMMU">IOMMU</a> for IO Memory Management Unit. Intel calls their implementation VT-d, AMD calls their implementation AMD-Vi, and ARM calls their implementation SystemMMU.</p>
<p dir="ltr">The basic idea for the IOMMU is simple. An address translation unit is placed in between any devices that may be used by a guest OS. When the hypervisor is setting up second stage page tables for a guest OS to access the device, it sets up the IOMMU too. Similar to tablewalks in the core, address translations are expensive. So TLBs are implemented to  reduce the overhead of address translations.</p>
<div id="attachment_1301" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.futurechips.org/wp-content/uploads/2013/04/system-using-smmu.png"><img class="wp-image-1301 " title="An Example ARM system using System MMU" src="http://www.futurechips.org/wp-content/uploads/2013/04/system-using-smmu-300x204.png" alt="" width="500" height="340" /></a><p class="wp-caption-text">An example system showing where the System MMU can be located. Transactions with the device are translated through the system MMU.</p></div>
<p dir="ltr"><strong>System MMU</strong></p>
<p dir="ltr">The ARM System MMU is programmed with different translation contexts. It maps each transaction to the corresponding context by matching against expected streams. Based on the context the System MMU may either bypass the translation, cause a fault, or perform a translation. The System MMU in the ARM architecture provides full 2 stage translation support (as described in the previous post) and depending on the context we may either do a first stage translation or a second stage translation. To perform the translation the System MMU has registers analogous to the TTBRs and other control registers for each context.</p>
<p dir="ltr">The system MMU may also receive faults during its translation process or if a context is not setup. Depending on the type of fault and how the System MMU is configured it may take certain actions. A translation fault can trigger an interrupt. This allows an opportunity for the hypervisor to service the interrupt and restart the translation so it can come to completion. The System MMU may also send a BUSERROR to the appropriate requestor. There are syndrome registers present to ease the process of diagnosing and fixing the problem.</p>
<p dir="ltr">Some advantages of System MMU don&#8217;t even need virtualization. Since the System MMU enables every device to perform VA to PA translations, I/O operations can be performed by drivers in user-space using VAs. The permission checking and translation maps can ensure one user application does not corrupt the memory of another application . This would eliminate the traps to kernel presently required further reducing I/O overhead. Another problem is dealing with contiguous memory. Many operations result in very large DMA accesses that cannot be allocated a single chunk of memory by the OS. Presently they need to either be split into multiple DMA requests or performed with complex DMA scatter-gather operations. The System MMU enables the device to communicate via a DMA based on a contiguous VA instead of fragmented PAs. This both reduces the CPU overhead and simplifies the software and device.</p>
<p dir="ltr">It should be noted that the System MMU is a part of the platform rather than a part of the core architecture. This means it only affects the drivers. Because of this many features are implementation defined. For example the bits used to match a stream and map it to a context are implementation defined. Since there is no user code that is aware of this part of the system, changes to the system MMU architecture wouldn&#8217;t require as many legacy code issues.</p>
<p>So using these techniques the hypervisor can provide an appropriate implementation of virtualized I/O according to the use-case. This concludes the third installment of this series on virtualization. This series continues in the <a href="http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html" title="ARM Virtualization Part 4 – Applications">next post</a> discussing the use-cases for virtualization especially the use cases targeted in the mobile space by ARM.</p>
<p dir="ltr"><strong>References</strong></p>
<p dir="ltr">For more information check out the following resources.</p>
<ul>
<li><a href="http://xpgc.vicp.net/course/svt/TechDoc/ch12-IOArchitecturesForVirtualization.pdf">http://xpgc.vicp.net/course/svt/TechDoc/ch12-IOArchitecturesForVirtualization.pdf</a></li>
<li><a href="http://nowlab.cse.ohio-state.edu/NOW/dissertations/huang.pdf">http://nowlab.cse.ohio-state.edu/NOW/dissertations/huang.pdf</a></li>
<li><a href="http://www.ibm.com/developerworks/linux/library/l-virtio/">http://www.ibm.com/developerworks/linux/library/l-virtio/</a></li>
<li><a href="http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbestpractices_pdf.pdf">http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbestpractices_pdf.pdf</a></li>
<li><a href="http://www.mulix.org/lectures/xen-iommu/xen-io.pdf">http://www.mulix.org/lectures/xen-iommu/xen-io.pdf</a></li>
<li><a href="http://developer.amd.com/wordpress/media/2012/10/IOMMU-ben-yehuda.pdf">http://developer.amd.com/wordpress/media/2012/10/IOMMU-ben-yehuda.pdf</a></li>
<li><a href="http://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdf">http://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdf</a></li>
<li><a href="http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices">http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices</a></li>
<li><a href="http://support.amd.com/us/Processor_TechDocs/48882.pdf">http://support.amd.com/us/Processor_TechDocs/48882.pdf</a></li>
</ul>
<div></div>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/JOX51hGiENw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=arm-virtualization-part-3-iommu</feedburner:origLink></item>
		<item>
		<title>ARM Virtualization Extensions – Memory and Interrupts (Part 2)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/WPO6LacfMCw/arm-virtualization-part-2-memory-interrupts.html</link>
		<comments>http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html#comments</comments>
		<pubDate>Mon, 25 Mar 2013 00:02:40 +0000</pubDate>
		<dc:creator>Ali Hussain</dc:creator>
				<category><![CDATA[Tips for Power Coders]]></category>
		<category><![CDATA[Understanding Chips]]></category>
		<category><![CDATA[arm]]></category>
		<category><![CDATA[cortex a15]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[interrupts]]></category>
		<category><![CDATA[memory management]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[VTTBR]]></category>
		<category><![CDATA[x86 hardware virtualization technology]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1260</guid>
		<description><![CDATA[In the first part of this series, I introduced the topic of virtualization. Today I will venture deeper into the ARM virtualization extensions for memory management and handling of interrupts. Within the core, virtualization mostly provides controls over the system registers. But as we move further from the core, and start to communicate with the outside <a href='http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p dir="ltr">In the <a title="ARM Virtualization Extensions — Introduction (Part 1)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html">first part of this series</a>, I introduced the topic of virtualization. Today I will venture deeper into the ARM virtualization extensions for memory management and handling of interrupts. Within the core, virtualization mostly provides controls over the system registers. But as we move further from the core, and start to communicate with the outside world, difficulties and nuances in the problem start to emerge and the need for hardware support for virtualization becomes apparent.</p>
<p><strong><strong><span id="more-1260"></span> </strong></strong></p>
<p dir="ltr">As a note, this post will gloss over parts of the ARM architecture. To go deeper into the details of the implementation you can check out the ARM Architecture Reference Manuals. These can be found at <a href="http://infocenter.arm.com">infocenter.arm.com</a> but require a free registration.</p>
<p dir="ltr"><strong>&#8220;Virtual&#8221; Virtual Memory </strong></p>
<p dir="ltr">Virtualization requires that the guest OS cannot have access to the hypervisor&#8217;s memory space. Without virtualization extensions, this requirement is enforced with a technique called <a href="http://stackoverflow.com/a/9834730/745944">“shadow page tables”</a>. In this technique, the OS maintains its page tables but the hypervisor keeps the OS from setting the <em>Memory Management Unit</em> (MMU) registers appropriately. Instead, the hypervisor creates its own page tables with its own mappings. When it sees a page fault, it reads the page tables created by the OS, recognizes the address as being mapped by the OS, and creates the actual mapping for the page tables that is used by the hardware MMU. So only the hypervisor has the true <em>Virtual Address</em> (VA) to <em>Physical Address</em> (PA) translation rather than the guest OS. This approach has two problems: it unnecessarily complicates the hypervisor and has performance overhead.</p>
<p dir="ltr">In ARM Virtualization extensions, the above is performed by hardware. This simplifies the hypervisor while adding functionality and improving performance. In the ARM virtualization extensions, the hypervisor essentially sets the hardware to treat the “physical” addresses generated by the guest OS as virtual addresses and adds another level of translation on these addresses. The ARM terminology for the guest OS address is <em>Intermediate Physical Address</em> (IPA). The translation from IPA to PA is called a stage 2 translation and the translation from VA to IPA is called stage 1 translation.</p>
<p dir="ltr">Figure shows how this translation proceeds.When virtulization is disabled, only stage 1 translation is performed. Stage 1 translation has 1 to 3 levels of tablewalks (depending on the page size, e.g, 4KB pages require 3 levels but 1GB pages require only one level). The first level uses the TTBR and the VA to create the PA/IPA for the first descriptor access. Subsequent levels use the descriptor and the VA to generate the address the of the next level descriptor, until the PA/IPA is known. The unshaded boxes in the figure show this process.</p>
<div id="attachment_1288" class="wp-caption aligncenter" style="width: 532px"><a href="http://www.futurechips.org/wp-content/uploads/2013/03/stage-1-translation-3.png"><img class="size-full wp-image-1288" title="ARM virtual memory translation" src="http://www.futurechips.org/wp-content/uploads/2013/03/stage-1-translation-3.png" alt="ARM virtual memory translation" width="522" height="1040" /></a><p class="wp-caption-text">ARM virtual memory translation</p></div>
<p dir="ltr">When virtualization is enabled, each of the translation levels has two stages. stage 1 works as described above, and outputs an IPA (intermediate physical address). stage 2 takes the IPA, and the VTTBR to create the descriptor address for the stage 2 tablewalk (as shown in figure below). Similar to the stage 1 tablewalk, the stage 2 tablewalk can have 1 to 3 levels.</p>
<div id="attachment_1290" class="wp-caption aligncenter" style="width: 886px"><a href="http://www.futurechips.org/wp-content/uploads/2013/03/stage-2-translation-1.png"><img class=" wp-image-1290 " title="ARM IPA to PA translation" src="http://www.futurechips.org/wp-content/uploads/2013/03/stage-2-translation-1.png" alt="Stage 2 translation" width="876" height="718" /></a><p class="wp-caption-text">Stage 2 translation</p></div>
<p dir="ltr"><em>That’s A Lot Of Descriptors</em></p>
<p dir="ltr">The indirection added by a second stage tablewalk can quickly get very expensive. Memory accesses are already the most expensive operation and with virtualization we may need up to 16 accesses to get the required data. Without virtulization, to reduce the cost of page tablewalks, the result of translations are saved in Translation Lookaside Buffers (TLBs). With virtualization, the cost of a tablewalk goes up significantly with the additional stage. In a processor with virtualization extensions enabled, not only do the TLBs need to be bigger, partial stages of the tablewalks are saved to accelerate the tablewalks.</p>
<p dir="ltr"><em>You Shall Not Pass</em></p>
<p dir="ltr">The second stage of translation includes permission controls to fault to the hypervisor. This includes the standard controls for disabling any combination of, reads, writes or execution on a page table by the guest OS or the applications running on it.</p>
<p>Example Use Case</p>
<table border="1">
<colgroup>
<col width="149" />
<col width="282" /></colgroup>
<tbody>
<tr>
<td>
<p dir="ltr">No access</p>
</td>
<td>
<p dir="ltr">Waiting on an external update</p>
</td>
</tr>
<tr>
<td>
<p dir="ltr">Rd only</p>
</td>
<td>
<p dir="ltr">Lazy VM copy</p>
</td>
</tr>
<tr>
<td>
<p dir="ltr">Wr only</p>
</td>
<td>
<p dir="ltr">output device</p>
</td>
</tr>
<tr>
<td>
<p dir="ltr">All access</p>
</td>
<td>
<p dir="ltr">Normal execution<span style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;"> </span></p>
</td>
</tr>
</tbody>
</table>
<p dir="ltr"><em>All Memory Is Not Equal</em></p>
<p dir="ltr">The second stage also allows the hypervisor to modify the memory attributes for a particular location including marking it as device memory. Keep in mind that the ARM architecture only allows memory mapped I/O so isolating I/O operations can be done through the page table mappings. The general rule is that the final attribute of memory is going to be the more restrictive of the first stage and second stage attributes. For example, if the first stage marks memory as device and second stage marks it as normal cacheable, the processor treats the memory as a device memory, i.e., disables caching and enforces the ordering requirements for device memory. The hypervisor can also trigger a trap to the hypervisor when the guest accesses memory that is marked as device memory. The hypervisor can use this trap to communicate with the physical external hardware.</p>
<p dir="ltr"><strong>Virtual Interrupts</strong></p>
<p dir="ltr">Dealing with device attributes is the first part of helping a guest OS communicate with the outside world. The next part is properly directing interrupts. The ARM architecture provides multiple ways for dealing with interrupts. The hypervisor can be configured for all interrupts to be redirected to the hypervisor. The hypervisor also has access to bits in the HCR for triggering a virtual interrupt in the guest OS. When an interrupt is asserted in the ARM architecture, the device holds the interrupt pin high until the OS tells the device to lower the interrupt pin. When running the OS, this protocol must be maintained. If the interrupt is triggered by the hypervisor using the HCR, a trap to hypervisor is needed to clear the virtual interrupt.</p>
<p dir="ltr">The hypervisor may handle the interrupt in concert with the Generic Interrupt Controller (GIC, pronounced &#8220;gick&#8221;). The GIC is implemented as a memory mapped I/O device handling the priority and distribution of all the interrupts coming into the system. Without virtualization, the OS communicates the completion of an interrupt to the GIC. With virtualization, the physical interrupt is rerouted to the hypervisor. The hypervisor determines the correct VM to redirect the interrupt to and sets up the corresponding virtual interface for the GIC. It also sets up the second stage page tables for the guest OS to map to the virtual CPU interface of the GIC. The GIC will from then on know how to reroute that interrupt and does not need hypervisor intervention.</p>
<p dir="ltr">Memory management and interrupt virtualization both scratch the surface of the problems involved with sandboxing a virtual machine and providing the guest OS access to physical devices. The <a title="ARM Virtualization – I/O Virtualization (Part 3)" href="http://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html">next post</a> in this series discusses extensions to the system to support virtualization.</p>
<p dir="ltr"><strong>Resources</strong></p>
<p dir="ltr"><a href="http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-arm-zyngier.pdf">http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-arm-zyngier.pdf</a></p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/WPO6LacfMCw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=arm-virtualization-part-2-memory-interrupts</feedburner:origLink></item>
		<item>
		<title>ARM Virtualization Extensions — Introduction (Part 1)</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/4E_BGVTw9m0/arm-virtualization-extensions-introduction-part-1.html</link>
		<comments>http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html#comments</comments>
		<pubDate>Mon, 18 Mar 2013 16:05:46 +0000</pubDate>
		<dc:creator>Ali Hussain</dc:creator>
				<category><![CDATA[Tips for Power Coders]]></category>
		<category><![CDATA[Understanding Chips]]></category>
		<category><![CDATA[arm virtualization]]></category>
		<category><![CDATA[ASID]]></category>
		<category><![CDATA[cortex a15]]></category>
		<category><![CDATA[CPACR]]></category>
		<category><![CDATA[HCPTR]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[OS]]></category>
		<category><![CDATA[TTBCR]]></category>
		<category><![CDATA[VMID]]></category>
		<category><![CDATA[VTTBR]]></category>
		<category><![CDATA[x86 hardware virtualization technology]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1244</guid>
		<description><![CDATA[Sorry guys for another hiatus, my job at Calxeda keeps me busy. I was recently discussing ARM’s virtualization support with my friend Ali Hussain (yup, that&#8217;s our idea of a fun dinner conversation) and found some very interesting facts. I requested Ali to share his knowledge in a blog post series on this topic, so here <a href='http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p><em>Sorry guys for another hiatus, my job at Calxeda keeps me busy. I was recently discussing ARM’s virtualization support with my friend <a href="http://www.linkedin.com/pub/ali-hussain/41/34/666/" target="_blank">Ali Hussain</a> (yup, that&#8217;s our idea of a fun dinner conversation) and found some very interesting facts. I requested Ali to share his knowledge in a blog post series on this topic, so here you go. Ali is in ARM’s performance modeling team and has been working on ARM cores since 2008.</em></p>
<p>The idea for this blog post stemmed from talking to people that had the impression that ARM’s virtualization support, even with the virtualization extensions in Cortex-A15, is limited. I plan to write a few posts exploring virtualization, and the support for it in the ARM and x86 ISAs. This post will draw heavily on my understanding of the ARM architecture and operating systems.</p>
<p><span id="more-1244"></span><strong>What is Virtualization?</strong><br />
Before we can look at virtualization, we need to define a few key things. The first is virtualization itself. Virtualization in general is creating an environment in software to emulate something physical. More specifically, when we talk about hardware virtualization, it is running an operating system on a sandboxed virtual machine (VM) as opposed to having access to the physical hardware. This is done through the hypervisor which manages guest operating systems the same way an operating system manages applications.</p>
<p><img title="ARM virtualization" src="https://lh5.googleusercontent.com/eM3iitZxbhZUfu1LvpSLlgXSragorGGQj_xksIzC0RxsT4KEj8qvoiXM_LMFZ83ZSnud8yRh51rRrCndbUVoz18iC-til_zW3KNebNpv6CtdUFRN7X5XUk0E" alt="Privilege levels" width="NaN" height="NaN" /></p>
<p><strong id="internal-source-marker_0.8232251415029168"><br />
</strong>Virtualization does not require hardware support. It can be performed completely in software. An existing operating system can be patched to work at a lower privilege level and trap to the hypervisor. This is called <a href="http://en.wikipedia.org/wiki/Paravirtualization" target="_blank">paravirtualization</a>. The advantage of hardware support is in simplifying the software work for the hypervisor and providing performance improvements. For this post, I want to discuss how hardware can help the hypervisor in performing its functions. Those interested in paravirtualization will find this talk about problems faced while implementing paravirtualization on ARM interesting by VMWare: <a href="http://labs.vmware.com/academic/arm-core-virtualization">http://labs.vmware.com/academic/arm-core-virtualization</a></p>
<p>Let’s explore some of the duties of the hypervisor and how the hardware accomplishes them to better understand hardware-assisted virtualization.</p>
<p><strong>Management Has Its Privileges</strong><br />
To sandbox an OS, the hypervisor has to be at a higher privilege level than the guest OS. ARM, for example, creates a higher privilege level called hypervisor mode. The hypervisor mode has access to its own set of system registers that are analogous to the registers present in the system mode. For example, just like the OS tracks process IDs using an Address Space IDentifier (ASID which is a part of the TTBR), the hypervisor tracks current VM using a VMID (which is a part of the VTTBR).</p>
<div id="attachment_1245" class="wp-caption alignnone" style="width: 647px"><a href="http://www.futurechips.org/wp-content/uploads/2013/03/Screen-Shot-2013-03-17-at-5.40.24-PM.png"><img class="size-full wp-image-1245" title="TTBR bit fields" src="http://www.futurechips.org/wp-content/uploads/2013/03/Screen-Shot-2013-03-17-at-5.40.24-PM.png" alt="64 bit LPAE version of TTBR" width="637" height="84" /></a><p class="wp-caption-text">ARM TTBR bit fields</p></div>
<div id="attachment_1248" class="wp-caption alignnone" style="width: 645px"><a href="http://www.futurechips.org/wp-content/uploads/2013/03/Screen-Shot-2013-03-17-at-5.39.19-PM.png"><img class="size-full wp-image-1248" title="VTTBR bit fields" src="http://www.futurechips.org/wp-content/uploads/2013/03/Screen-Shot-2013-03-17-at-5.39.19-PM.png" alt="ARM VTTBR" width="635" height="80" /></a><p class="wp-caption-text">ARM VTTBR bit fields</p></div>
<p>The hypervisor also has similar access controls as the supervisor mode. In addition,  the hypervisor can read and write the system control registers for the OS.</p>
<p>Having two layers of access control does create a lot of interesting scenarios where the hypervisor and the guest OS compete for a trap. ARM has gone with the philosophy that a good manager is one that intrudes into your work as little as possible. So, if an exception occurs, the guest OS is typically given the opportunity to service it before the hypervisor because the OS is better equipped to handle the applications’ requirements. Let me explain this with an example. Both the OS and the hypervisor provide a similar feature for disabling access to the floating point and SIMD units, using the CPACR (Coprocessor Access Control Register) and HCPTR (Hypervisor Coprocessor Trap Register) respectively. When an application running in the guest OS is disallowed access by both the CPACR and HCPTR, the CPACR has priority.</p>
<p>This is an interesting design choice. It has two advantages. First, it improves the response time of the exception. Second, it allows the OS to behave as normal. However, it also makes it extremely difficult to provide certain functionality, e.g., making the floating point unit invisible to the guest.</p>
<p>Check out the next post in this series, talking about memory management and interrupt handling in a hypervisor <a title="ARM Virtualization Extensions – Memory and Interrupts (Part 2)" href="http://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.html">here</a>.</p>
<p>Disclaimers:</p>
<p>1. All opinions expressed here are my own and not of ARM or any other entity.</p>
<p>2. I haven’t implemented a hypervisor or operating system myself so I would love to hear comments from experts in the field.</p>
<p>Resources:<br />
<a href="http://en.wikipedia.org/wiki/Virtualization">http://en.wikipedia.org/wiki/Virtualization</a><br />
<a href="http://en.wikipedia.org/wiki/Operating_system">http://en.wikipedia.org/wiki/Operating_system</a><br />
<a href="http://infocenter.arm.com/help/index.jsp">http://infocenter.arm.com/help/index.jsp</a><br />
<a href="http://www.linaro.org/documents/download/d7fe510d8eb46775afc3953d217b15224fbb93086598a">http://www.linaro.org/documents/download/d7fe510d8eb46775afc3953d217b15224fbb93086598a</a></p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/4E_BGVTw9m0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/understanding-chips/arm-virtualization-extensions-introduction-part-1.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=arm-virtualization-extensions-introduction-part-1</feedburner:origLink></item>
		<item>
		<title>Which little PC should I buy? Raspberry Pi? Mele A1000? or …</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/w-j3NoNb-JM/comparison-small-pcs-rasberry-pi.html</link>
		<comments>http://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.html#comments</comments>
		<pubDate>Mon, 16 Jul 2012 13:32:20 +0000</pubDate>
		<dc:creator>Aater Suleman</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Thoughts for Researchers]]></category>
		<category><![CDATA[cortex a8]]></category>
		<category><![CDATA[cortex a9]]></category>
		<category><![CDATA[mele A1000]]></category>
		<category><![CDATA[mk802]]></category>
		<category><![CDATA[pandaboard]]></category>
		<category><![CDATA[raspberry pi]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1213</guid>
		<description><![CDATA[Raspberry Pi, Mele A1000, MK802, and &#8230; . the market is getting filled with these low price geek toys. I personally see a lot of potential here. These &#8220;devicelets&#8221; can do to hardware what apps did to software. Some readers may remember that I posted a tutorial to create a simple evaluation board out of <a href='http://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Raspberry Pi, Mele A1000, MK802, and &#8230; . the market is getting filled with these low price geek toys. I personally see a lot of potential here. These &#8220;devicelets&#8221; can do to hardware what apps did to software. Some readers may remember that I posted <a title="Tips for iPhone App developers: Development board setup (Part 1)" href="http://www.futurechips.org/chip-design-for-all/tips-optimizing-iphone-apps-development-board-setup-part-1.html">a tutorial to create a simple evaluation board out of a iPhone 3GS</a> last year. Back then, <a href="http://pandaboard.org/node/300/#Panda">Pandaboard</a> was the only choice to get an ARM computer in the market and it was never available. Now there are so many vendors and sellers that it has become difficult to chose. This post is just a concise summary of all the available choices I have come across so far.</p>
<p><span id="more-1213"></span></p>
<h3>Comparison data<br />
(also available as a <a href="https://docs.google.com/spreadsheet/ccc?key=0Aiujyz31zczMdDRtd1dMTHVJNVdobVNISGllbHBJZUE">google spreadsheet</a>):</h3>
<p>&nbsp;</p>

<!-- Iframe plugin v.2.2 (wordpress.org/extend/plugins/iframe/) -->
<iframe width="650" height="650" frameborder="0" src="https://docs.google.com/spreadsheet/pub?key=0Aiujyz31zczMdDRtd1dMTHVJNVdobVNISGllbHBJZUE&amp;output=html&amp;widget=true" scrolling="no" class="iframe-class"></iframe>
<p>&nbsp;</p>
<p>Disclaimer: This is just the result of my informal research. Please let me know if I am missing your device or have gotten some detail wrong.</p>
<div></div>
<div>Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.</div>
<div></div>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/w-j3NoNb-JM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.html/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=comparison-small-pcs-rasberry-pi</feedburner:origLink></item>
		<item>
		<title>Generating passes for iOS6′s Passbook</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/6xFfAM8MoEE/generating-passes-ios6s-passbook.html</link>
		<comments>http://www.futurechips.org/thoughts-on-latest-happenings/generating-passes-ios6s-passbook.html#comments</comments>
		<pubDate>Fri, 13 Jul 2012 21:19:26 +0000</pubDate>
		<dc:creator>Aater Suleman</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Thoughts on Latest Happenings]]></category>
		<category><![CDATA[Tips for Power Coders]]></category>
		<category><![CDATA[ios programming]]></category>
		<category><![CDATA[ios6]]></category>
		<category><![CDATA[ios6 passbook]]></category>
		<category><![CDATA[passbook]]></category>
		<category><![CDATA[pkpass]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1178</guid>
		<description><![CDATA[After I downloaded  iOS6 on my iPhone last week, the first icon I clicked on was Passbook only to find that Apple had not put any example passes in there. Since Passbook was the primary reason I had downloaded iOS6, I dug into the API and learned how to create a pass myself. It was a <a href='http://www.futurechips.org/thoughts-on-latest-happenings/generating-passes-ios6s-passbook.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>After I downloaded <a href="http://www.apple.com/ios/ios6/"> iOS6</a> on my iPhone last week, the first icon I clicked on was <a href="http://www.apple.com/ios/ios6/#passbook">Passbook</a> only to find that Apple had not put any example passes in there. Since Passbook was the primary reason I had downloaded iOS6, I dug into the API and learned how to create a pass myself. It was a great learning experience that I want to share with others. I also provide a <a href="https://github.com/maater/TCSH-PKPass">shell script</a> to automate the pass generation process and also present to you, <a href="http://iPass.pk">iPass.pk</a>, a user-friendly GUI-based service to create passes.</p>
<p><span id="more-1178"></span></p>
<h2><strong>What is a pass? </strong></h2>
<p>A pass is in fact a regular zip archive a .pkpass extension. It contains the following files:</p>
<p><strong>1. logo.png (optional): </strong>The file to be used as the logo.</p>
<p><strong>2. background.png (optional): </strong>The file to be used as the pass&#8217; background.</p>
<p><strong>3. icon.png (optional): </strong>The file to be used as the icon.</p>
<p><strong>4. manifest.json (required):</strong>This file contains a JSON object with SHA1 hashes of <em>all</em> files in the pass <em>except</em> the signature file (as its generated using manifest.json) and the manifest.json file itself. The file has the following format:</p>
<pre>{
  "pass.json" : "&lt;sha1 pf pass.json file&gt;",
  "logo.png" : "sdfqefqefqef",</pre>
<pre>  ...
}</pre>
<p>For reference, the easiest way to create a SHA1 hash in Linux is to use the command:</p>
<pre>openssl SHA1 &lt;filename&gt;</pre>
<p><strong>5. signature (required):</strong> File created by signing the manifest.json file with the certificate</p>
<p><strong>6. pass.json (required): </strong>It contains a JSON object which specifies the type, formatting, and the data to be displayed on the pass. Apple&#8217;s reference is available <a href="http://upassbook.com/documents/PassKit_Bundle.pdf">here</a>, however, it is far from complete and is missing several important details. The only real way to understand pass.json is trial-and-error:  try to generate lots of passes with different options and see the outputs on iOS6 itself. So, that what I was doing for the last two weeks.I developed a few tools to helps me iterate through these options really fast. First it a C-shell script to generate passes from the command-line and second is a pass designer at <a href="http://www.ipass.pk">iPass.pk</a> that allows you to change the values/labels of fields in a reasonable GUI.</p>
<p>Let me give you a few examples of what I have learned that was not mentioned any where in Apple&#8217;s docs.  First, to privde some context, lets take a look at a sample pass file and its JSON object with a description of each field.</p>
<p><a href="http://www.futurechips.org/wp-content/uploads/2012/07/photo-5.png"><img class="aligncenter" title="photo 5" src="http://www.futurechips.org/wp-content/uploads/2012/07/photo-5-150x150.png" alt="Screenshot of a boarding pass" width="150" height="150" /></a></p>
<dl id="attachment_1207" class="wp-caption aligncenter" style="width: 160px;">
<dt class="wp-caption-dt" style="text-align: -webkit-auto;"></dt>
<dd class="wp-caption-dd">Boarding Pass</dd>
</dl>

<!-- Iframe plugin v.2.2 (wordpress.org/extend/plugins/iframe/) -->
<iframe src="http://pastebin.com/embed_iframe.php?i=ZaUAfW0e" style="border:none;width:100%" width="100%" height="480" scrolling="no" class="iframe-class" frameborder="0"></iframe>
<p>The most interesting fields are: logoText, and primary, secondary, auxiliary, and header fields.  They are all labelled in this annotated screenshot. I have filled each field with &lt;field type abbreviation&gt;&lt;field number in the corresponding fields array&gt;. Thus. P1 implies Primary Filed 1, i.e., the first primary field in the primaryFields array. Similarly S* denotes secondary fields, A* auxiliary fields, and H* header fields. Note their positions, font-sized, and styles. To checkout the other fields, try creating some passes using the following procedure or use the GUI at <a href="http://www.ipass.pk">iPass.pk</a>.</p>
<p>&nbsp;</p>
<h2><strong>How to generate passes manually?</strong></h2>
<div>In case, you don&#8217;t want to use the GUI to iterate through the passes, you can use my <a href="https://github.com/maater/TCSH-PKPass">C-shell script</a>. The C-shell script in effect performs the following steps.</div>
<h4><strong>Step 1: Generate the Apple Developer Certificate</strong></h4>
<p>Go to the Apple Developer Portal and request a pass type identifier and your certificate. Export a .p12 file from the Key chain.</p>
<h4><strong>Step 2:  Generate key.pem and certificate.pem</strong></h4>
<pre>openssl pkcs12 -in "My PassKit Cert.p12" -clcerts -nokeys -out certificate.pem
openssl pkcs12 -in "My PassKit Cert.p12" -nocerts -out key.pem</pre>
<h4><strong>Step 3: Gather files </strong></h4>
<p>Copy pass.json and any logo/icon/background image files in a folder.</p>
<h4><strong>Step 4: Generate manifest.json </strong></h4>
<p>The goal is to generate JSON object which contains the SHA1 hashes of all the files your gathered in Step 3. I used the following csh script to generate manifest.json</p>
<pre> set MANIFEST = ../manifest.json
 echo '{' &gt; $MANIFEST
 foreach i (*)
   set sha1 = `openssl sha1 $i | cut -d' ' -f2`
   echo \'$i\' : \'$sha1\', &gt;&gt; $MANIFEST
 end
 echo '}' &gt;&gt; $MANIFEST
 cp $MANIFEST .</pre>
<h4><strong>Step 5: Generate signature</strong></h4>
<p>Execute the following csh commands to get a signature file:</p>
<p>set PASSWORD = xxxx<br />
openssl smime -passin pass:$PASSWORD  -binary -sign -signer $CWD/certificate.pem -inkey $CWD/key.pem -in manifest.json -out signature xs-outform DER</p>
<p>Now email this file to yourself, and open it. Cool?</p>
<h4><strong>Step 6: Zip the pass</strong></h4>
<pre>zip test.pkpass *</pre>
<p>&nbsp;</p>
<p>Now email test.pkpass to yourself and see the magic.</p>
<p>&nbsp;</p>
<p>Let me know if you if you have any questions/comments about passes, this tutorial,<a href="http://www.ipass.pk"> iPass.pk</a>, or the <a href="https://github.com/maater/TCSH-PKPass">shell script.</a></p>
<p>&nbsp;</p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/6xFfAM8MoEE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-on-latest-happenings/generating-passes-ios6s-passbook.html/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-on-latest-happenings/generating-passes-ios6s-passbook.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=generating-passes-ios6s-passbook</feedburner:origLink></item>
		<item>
		<title>Clarifying Throughput vs. Latency</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/u7QpTV9w1Ho/clarifying-throughput-vs-latency.html</link>
		<comments>http://www.futurechips.org/thoughts-for-researchers/clarifying-throughput-vs-latency.html#comments</comments>
		<pubDate>Sat, 30 Jun 2012 05:47:54 +0000</pubDate>
		<dc:creator>Aater Suleman</dc:creator>
				<category><![CDATA[Thoughts for Researchers]]></category>
		<category><![CDATA[latency]]></category>
		<category><![CDATA[queuing]]></category>
		<category><![CDATA[throughput]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=601</guid>
		<description><![CDATA[Yet another hiatus. Sorry, I was very busy with my job as a performance architect at Calxeda. Will try to be regular again.  I have recently been interviewing people at Calxeda, my new employer. There are a few fundamental concepts I expect every engineer/CS major to understand, regardless of what position they are applying for. One <a href='http://www.futurechips.org/thoughts-for-researchers/clarifying-throughput-vs-latency.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p><em>Yet another hiatus. Sorry, I was very busy with my job as a performance architect at <a href="http://www.calxeda.com">Calxeda</a>. Will try to be regular again. </em></p>
<p>I have recently been interviewing people at <a href="http://www.calxeda.com">Calxeda</a>, my new employer. There are a few fundamental concepts I expect every engineer/CS major to understand, regardless of what position they are applying for. One of them is the difference between a channel&#8217;s throughput and its latency. It is surprising how many candidates get it wrong. I will not only try to explain the concepts of latency and throughput using a simple analogy, but also try to hypothesize why IMO most people get them confused.</p>
<p><span id="more-601"></span></p>
<p>Quick background: I was first asked to differentiate between latency and throughput when I was a college sophomore. It was during an internship interview at nVidia Austin in Spring 2002, more than a decade ago. Sure enough, I was not very clear about it myself and the interviewer had to explain it to me. I wish I remembered his name to give him credit since I am using the same analogy that he used.</p>
<p>When you go to buy a water pipe, there are two completely independent parameters that you look at: the diameter of the pipe and its length. The diameter determines the <em>throughput</em> of the pipe and the length determines the <em>latency</em>, i.e., the time it will take for a water droplet to travel across the pipe. Key point to note is that the length and diameter are independent, thus, so are are latency and throughput of a communication channel.</p>
<p>More formally, <strong>Throughput</strong> is defined as the amount of water entering or leaving the pipe every second and <strong>latency</strong> is the average time required to for a droplet to travel from one end of the pipe to the other.</p>
<p>Let&#8217;s do some math:</p>
<p>For simplicity, assume that our pipe is a 4inch x 4inch square and its length is 12inches. Now assume that each water droplet is a 0.1in x 0.1in x 0.1in cube. Thus, in one cross section of the pipe, I will be able to fit 1600 water droplets. Now assume that water droplets travel at a rate of 1 inch/second.</p>
<p><strong>Throughput</strong>: Each set of droplets will move into the pipe in 0.1 seconds. Thus, 10 sets will move in 1 second, i.e., 16000 droplets will enter the pipe per second. Note that this is independent of the length of the pipe.<br />
<strong>Latency</strong>: At one inch/second, it will take 12 seconds for droplet A to get from one end of the pipe to the other regardless of pipe&#8217;s diameter. Hence the latency will be 12 seconds.</p>
<p><strong>Queuing: </strong>Note that droplets may be arriving at a rate faster than 16,000/second, say 16,100/second. Since the pipe cannot let more than 16,000 droplets to enter each second, the extra 100 droplets will have to wait. Said another way, they are put in queue where they will get to enter the next second. The time a droplet waits in the queue before entering the pipe is called the <strong>queuing delay. </strong></p>
<p><strong>Possible source of confusion:</strong> Since queuing delay is related to throughput &#8211;NOT latency&#8211; many scientists/engineers confuse queuing delay with latency and conclude that latency is related to throughput &#8212; which is wrong. Latency itself is independent of throughput, its the queuing delay which isn&#8217;t.</p>
<p>This and many other cool concepts about queuing are explained by Queuing theory.</p>
<p>&nbsp;</p>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/u7QpTV9w1Ho" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/thoughts-for-researchers/clarifying-throughput-vs-latency.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/thoughts-for-researchers/clarifying-throughput-vs-latency.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=clarifying-throughput-vs-latency</feedburner:origLink></item>
		<item>
		<title>What’s Wrong With GNU make?</title>
		<link>http://feedproxy.google.com/~r/FutureChips/~3/ngzwPau6I5k/what%e2%80%99s-wrong-gnu-make.html</link>
		<comments>http://www.futurechips.org/tips-for-power-coders/what%e2%80%99s-wrong-gnu-make.html#comments</comments>
		<pubDate>Wed, 24 Aug 2011 19:12:41 +0000</pubDate>
		<dc:creator>Aater Suleman</dc:creator>
				<category><![CDATA[Tips for Power Coders]]></category>

		<guid isPermaLink="false">http://www.futurechips.org/?p=1137</guid>
		<description><![CDATA[I typically do not share articles on this blog but I found this white paper today which was very enlightening and doesn&#8217;t seem to have gotten the deserved attention. The author has done an excellent job of explaining the shortcomings of GNU Make. I now question why I use Make:-) &#160; Below is excerpt and <a href='http://www.futurechips.org/tips-for-power-coders/what%e2%80%99s-wrong-gnu-make.html' class='excerpt-more'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>I typically do not share articles on this blog but I found this white paper today which was very enlightening and doesn&#8217;t seem to have gotten the deserved attention. The author has done an excellent job of explaining the shortcomings of GNU Make. I now question why I use Make:-)</p>
<p>&nbsp;</p>
<p>Below is excerpt and a link to the article. Since the original post doesn&#8217;t have space for comments, we can use this post for our discussion.</p>
<p>&nbsp;</p>
<blockquote><p>GNU <code>make</code> is a widely used tool for automating software builds. It is the de facto standard build tool on Unix. It is less popular among Windows developers, but even there it has spawned imitators such as Microsoft’s<code>nmake</code>.</p>
<p>Despite its popularity, <code>make</code> is a deeply flawed tool. Its reliability is suspect; its performance is poor, especially for large projects; and its makefile language is arcane and lacks basic language features that we take for granted in other programming languages.</p>
<p>Admittedly, <code>make</code> is not the only automated build tool. Many other tools have been built to address <code>make</code>’s limitations. Some of these tools are clearly better than <code>make</code>, but <code>make</code>’s popularity endures. The goal of this document is, very simply, to educate you about some of the issues with <code>make</code>—to increase awareness of these problems.</p>
<p>&nbsp;</p>
<p><em><strong><a href="http://www.conifersystems.com/whitepapers/gnu-make/">Read more</a></strong></em></p></blockquote>
<img src="http://feeds.feedburner.com/~r/FutureChips/~4/ngzwPau6I5k" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.futurechips.org/tips-for-power-coders/what%e2%80%99s-wrong-gnu-make.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://www.futurechips.org/tips-for-power-coders/what%e2%80%99s-wrong-gnu-make.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=what%25e2%2580%2599s-wrong-gnu-make</feedburner:origLink></item>
	</channel>
</rss>
