<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
<channel>
<title>The NVIDIA Blog</title>
<link>http://blogs.nvidia.com/ntersect/</link>
<description>The official NVIDIA blog</description>
<language>en-US</language>
<lastBuildDate>Tue, 30 Nov 2010 13:39:27 -0800</lastBuildDate>
<generator>http://www.typepad.com/</generator>

<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ntersect/parallel-world" /><feedburner:info uri="ntersect/parallel-world" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
<title>The World is Parallel - GPU Processing for Options Pricing</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/TI2lNFpnRd4/the-world-is-parallel-gpu-processing-for-options-pricing.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/07/the-world-is-parallel-gpu-processing-for-options-pricing.html</guid>
<description>Financial derivatives and quantitative analysis of securities took something of a reputational hit in the 2008 crash. But they are still very much a part of the mathematically sophisticated Wall Street scene. And it turns out that the intense computations needed for trading financial options is well suited to GPU processing—and a number of CUDA-based tools have been developed to make it easier. The idea of an option is relatively simple. Basically an option gives you the right to buy (a call option) or sell (a put option) some good or security at a fixed price by some date in...</description>


<content:encoded><![CDATA[<p>Financial derivatives and quantitative analysis of securities took something of a reputational hit in the 2008 crash. But they are still very much a part of the mathematically sophisticated Wall Street scene. And it turns out that the intense computations needed for trading financial options is well suited to GPU processing—and a number of CUDA-based tools have been developed to make it easier.
</p>

<p>The idea of an option is relatively simple. Basically an option gives you the right to buy (a call option) or sell (a put option) some good or security at a fixed price by some date in the future. A call is a bet that the price will go up. If you have an option to buy 100 shares of stock X at $100 per share and the stock hits $110 before the option expires, you score a profit of $1,000 (10 X $100) less the initial cost of the option. A put works the same but in reverse; you profit if the price of the stock falls below the option’s “striking price.”</p>

<p>While the concept is straightforward, the pricing of options, and decisions on when to buy or exercise them is anything but. Quantitative analysts, or quants, use statistical models to consider large numbers of possibilities over time and to find the likeliest outcome.
</p>


<p>One of the oldest methods for assessing prices over time is the Monte Carlo simulation, a technique that requires generating large numbers of random data points following some distribution. According to <a href="http://www.tradingmarkets.com/.site/news/Stock%20News/2285232/" target="_blank">TradingMarkets.com</a>, the big French bank BNP Paribas ran up against capacity constraints running its Monte Carlo pricing models on a cluster of CPUs and moved to a CUDA-based GPU system to get higher performance.</p>

<p>Probably the best known approach to financial modeling is the Black-Scholes equation, for which two of its developers, Myron Scholes and Robert Merton, won the 1997 Nobel Prize in economics (the third, Fischer Black, had died in 1995.) Black-Scholes is a differential equation that assumes the security prices are a random walk, that is, that day-to-day price fluctuations are random.</p>

<p><a href="http://www.oneye.com.au/" target="_blank">OnEye</a>, and Australian firm specializing in high-performance financial computing, found running a CUDA-based Black-Scholes model on an NVIDIA GPU produced results almost 700 times faster than running the same analysis on a 2.21 GHz AMD Athlon 64 X2 CPU alone. The GPU system was able to price over 4 billion options per second.</p>

<div style="text-align: center;"><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e201348569ee98970c-pi" style="display: inline;"><img alt="Binomial-tree" class="asset asset-image at-xid-6a00d834515fca69e201348569ee98970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e201348569ee98970c-320wi" /></a> <br />
</div><p>Another way to analyze options is to look at prices over time. In the binomial model, a price goes up or down by a fixed amount at each time interval with each direction of movement having its own probability. The result is a tree-like structure in which each level represents a time period and each leaf a possible price. Marayam Ganesan, Roger D. chamberlain. And Jeremy Buhler at Washington University in St. Louis <a href="http://saahpc.ncsa.illinois.edu/09/papers/Ganesan_paper.pdf" target="_blank"> ran a CUDA binomial model</a> on an NVIDIA GPU. “We theorize an optimal speedup of 15× over a comparable parallel implementation for a problem size of 1000 time steps,” they wrote. “In general the expected speed-up is proportional to the square root of the problem size.”</p>

<p>The trinomial model is a little more complicated in that it lets a price go up, down, or stay the same at each time interval. Two French researchers, Gregoire Jauvion and Tuan Nguyen built a <a href="http://www.arbitragis-research.com/cuda-in-computational-finance/coxross-gpu.pdf/view" target="_blank"> trinomial option pricing model</a> for a system with an Intel quad-core 3.2 GHz Core 2 Duo CPU and an NVIDIA GPU. The CUDA code on the GPU ran nearly 32 times faster for a model pricing 64 options over 1,024 time periods.</p>

<p>One theme that runs through all the research reports of the use of CUDA for options pricing models is that getting the maximum performance gain requires careful planning to take advantage of the strengths and minimize the weaknesses of GPU processing. But beyond that, it doesn’t seem to have posed any particular challenges. As John nelson, who writes the Path Dependent blog on programming, complex systems, and trading wrote: </p><blockquote><p>“I started learning CUDA yesterday; I wrote my first simple CUDA program today. The library does have a non-negligible learning curve, but it is not steep. It largely is a matter of learning the most efficient ways to work with CUDA (e.g. shared, local, or constant memory). Happily, this is an incremental process; You can learn to write bad yet working CUDA applications while slowly learning to write them better; And, as a bonus, even your bad code is likely to run laps around your CPU (for finance apps anyway.)&quot;</p></blockquote>

<p><span style="font-size: 12px;"><span style="font-size: 11px;"><em>This post is an entry in </em><a href="http://blogs.nvidia.com/ntersect/parallel-world/" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em> series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on </em><a href="http://feeds.feedburner.com/nTersect/parallel-world" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em>.</em></span></span></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/TI2lNFpnRd4" height="1" width="1"/>]]></content:encoded>



<category>CUDA</category>

<category>GPGPU</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Wed, 14 Jul 2010 09:00:00 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/07/the-world-is-parallel-gpu-processing-for-options-pricing.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel: Teaching CUDA</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/JuxFsL0Wa8Q/the-world-is-parallel-teaching-cuda.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/06/the-world-is-parallel-teaching-cuda.html</guid>
<description>High-performance programming on GPUs has broad potential in a wide array of fields, from chemical research to image processing to business analytics. But bringing these advantages to potential beneficiaries in fields other than computer science requires the creation of applications to use the power of parallel processing. And creating those programs requires educating a new generation of software developers in the techniques of general purpose computing on GPUs. One way that NVIDIA is doing its bithelps this process along is by sponsoring graduate fellowships to students working on Ph.Ds in computer science or allied fields “who are researching topics that...</description>


<content:encoded><![CDATA[High-performance programming on GPUs has broad potential in a wide array of fields, from chemical research to image processing to business analytics. But bringing these advantages to potential beneficiaries in fields other than computer science requires the creation of applications to use the power of parallel processing. And creating those programs requires educating a new generation of software developers in the techniques of general purpose computing on GPUs. One way that NVIDIA is doing its bithelps this process along is by sponsoring graduate fellowships to students working on Ph.Ds in computer science or allied fields “who are researching topics that will lead to major advances in a number of fields, and are investigating innovative ways of leveraging the power of the GPU.“<br /><br />
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e2013483f98836970c-pi" style="display: inline;"><img alt="Patney" class="asset asset-image at-xid-6a00d834515fca69e2013483f98836970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e2013483f98836970c-320wi" /></a> </center>

<p>Anjul Patney and Duane Merrill are two of this year’s NVIDIA graduate fellows. Patney. <a href="http://idav.ucdavis.edu/%7Eanjul/main.htm#begin" target="_blank">Patney</a> is a third-year doctoral student at the University of California at Davis, working under John Owens, a specialist in general purpose programming on GPUs. “I have a strong interest in understanding the evolution of programmable graphics pipelines, a possibility recently enabled by the flexible CUDA, OpenCL and Direct Compute programming models..” Patney says. “I am interested in identifying both software and hardware principles that will define these pipelines in the future. In this pursuit I often run into some very intriguing problems in graphics and parallel computing. It&#39;s a lot of fun.”</p>
<p>His area of interest is graphics rendering. As he wrote in his research statement in applying for the fellowship: ”My research goal is to explore infrastructures that allow complex, dynamic, yet fully customizable data structures and algorithms to help build new rendering schemes. I wish to deploy them in an open-source library, to serve as an abstraction over which flexible graphics applications can efficiently interact with the underlying hardware.&quot;</p>
<p>Patney was first exposed to parallel processing techniques as an undergraduate at the Indian Institute of Technology, Delhi. “Early training in parallel and multicore programming is very important for interested undergrads,” he says. </p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133f0cf3a8e970b-pi" style="display: inline;"><img alt="Merrill" class="asset asset-image at-xid-6a00d834515fca69e20133f0cf3a8e970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133f0cf3a8e970b-320wi" /></a></center>
<p><a href="http://www.cs.virginia.edu/%7Edgm4d/" target="”_blank”">Merrill</a> is a doctoral student at the University of Virginia. Merrill is working on applying the techniques of GPU processing to some of the basic building blocks of software, things such as sorting, the efficient processing of lists, and graph problems that he describes as “the nuts and bolts of data-intensive computer science, yet are the hardest to make scream on the GPU.” </p>
<p>One of the challenges of GPU programming is that since it is relatively new—the first version of CUDA was only released three years ago—programmers have not had time to create the big libraries of pre-tested, ready-to-run code that can make software development must faster and easier—a situation Merrill hopes to help remedy. “We simply don&#39;t have many collections of reusable software components at our disposal, particularly at [the level] where the brunt of cooperative performance gains are derived,” he says. “This is the hardest software to construct, make fast, and get correct,” he says. ”This is why the prospect of writing extremely high performance, high quality code for GPUs is such a challenge.” </p>
<p>Merrill did his undergraduate work at Virginia, then spent several years in industry before returning to graduate school. In his undergraduate days, GPU programming “wasn’t on the radar screen,” but that has changed. “The experience is much different these days,” he says. “Many of the undergraduate computer science students in our department opt to take a parallel-computing elective that has units on SPMD [single-program, multiple data] computing. This course is usually instructed by my advisor, Andrew Grimshaw, and I teach the GPGPU unit, in which we use the CUDA framework exclusively.” </p>
<p>In addition to a $25,000 stipendfor the 2010-11 academic year, the fellowship also provides engineering and technical support to 9 graduate students at&#0160;9 different universities. “The relationship NVIDIA fosters with university researchers through its fellowship program provides a conduit for ideas and technology to flow between academia and industry,” says NVIDIA researcher and former NVIDIA fellow Jared Hoberock. “In addition to financial sponsorship, the fellowship affords student researchers a unique opportunity to cultivate a dialogue with the finest engineering minds in the industry .”</p>
<p>NVIDIA will be accepting applications for the 2011-12 graduate fellowships beginning in November. Details on the program and on application procedures are on the <a href="http://research.nvidia.com/relevant/graduate-fellowship-program" target="_blank">NVIDIA website</a>.</p><p><span style="font-size: 12px;"><span style="font-size: 11px;"><em>This
 post is an entry in </em><a href="http://blogs.nvidia.com/ntersect/parallel-world/" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em>
 series running on nTersect, focused on the GPU’s importance and the 
future of parallel processing. Today, GPUs can operate faster and more 
cost-efficiently than CPUs in a range of increasingly important sectors,
 such as medicine, national security, natural resources and emergency 
services. For more information on GPUs and their applications, keep your
 eyes on </em><a href="http://feeds.feedburner.com/nTersect/parallel-world" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em>.</em></span></span></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/JuxFsL0Wa8Q" height="1" width="1"/>]]></content:encoded>



<category>CUDA</category>

<category>GPGPU</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Fri, 11 Jun 2010 10:59:25 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/06/the-world-is-parallel-teaching-cuda.html</feedburner:origLink></item>

<item>
<title>UsefulProgress and NVIDIA GPUs Generating 3D Medical Images</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/f12Qs4rENhE/usefulprogress-and-nvidia-gpus-generate-3d-medical-imaging.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/05/usefulprogress-and-nvidia-gpus-generate-3d-medical-imaging.html</guid>
<description>Periodically, we’re using this blog to profile some of the companies that participated in NVIDIA’s Emerging Companies Summit (ECS 2009). ECS 2010 will take place Sept. 20 - 23 at the San Jose Convention Center in San Jose, California, during the GPU Technology Conference. We’ve written before about how GPUs are transforming the field of medical imaging. In the case of Paris-based startup UsefulProgress, GPUs are enabling a kind of real-life X-ray vision. From a single scan, UsefulProgress technology can produce a high-definition 3D digital anatomy that reveals the underlying layers of bones, vessels, tissues and muscles in a body....</description>


<content:encoded><![CDATA[<em>Periodically, we’re using this blog to profile some of the companies that participated in NVIDIA’s Emerging Companies Summit (ECS 2009). <a href="http://www.nvidia.com/object/emerging_companies_summit.html" target="_blank">ECS 2010 will take place Sept. 20 - 23</a> at the San Jose Convention Center in San Jose, California, during the <a href="http://www.nvidia.com/object/gpu_technology_conference.html" target="_blank">GPU Technology Conference</a>.&#0160; </em><p>We’ve written before about how GPUs are <a href="http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpus-speed-medical-imaging.html" target="_blank">transforming the field of medical imaging</a>. In the case of Paris-based startup UsefulProgress, GPUs are enabling a kind of real-life X-ray vision. From a single scan, UsefulProgress technology can produce a high-definition 3D digital anatomy that reveals the underlying layers of bones, vessels, tissues and muscles in a body.&#0160;<a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd096ba970b-pi"><br /></a> <br />


<a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20134810229aa970c-pi"><img alt="ContorsionisteV2[1]" class="asset asset-image at-xid-6a00d834515fca69e20134810229aa970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20134810229aa970c-320wi" style="display: block; margin-left: auto; margin-right: auto;" /></a> <br /> </p>

<p>The imagery that the UsefulProgress technology produces is stunning. You don’t have to be a medical student to appreciate a detailed 3D fly-through of the human brain or skeletal system (see the <a href="http://www.usefulprogress.com/">UsefulProgress website</a> for examples). Surgeons can use UsefulProgress’s images as a pre-operative dry run, while doctors can employ them for non-invasive diagnostics. Students at the University of Paris Descartes medical school use the technology to study the human body and learn surgical techniques. </p>

<p>At the Emerging Companies Summit, the head of UsefulProgress, Sylvain Ordureau, met with NVIDIA Vice President of Business Development Jeff Herbst to talk about the technology. The proprietary software-hardware solution works with medical imaging technology such as CT, MRI and X-ray. The UsefulProgress solution takes the hundreds of image “slices” produced from such scans and stitches them together into a 3D volume. NVIDIA GPUs and CUDA are used for the image processing as well as the image display. The high resolution (8000x8000 pixels) of the images and large file sizes make this the sort of computational problem ideally suited to parallel processing. 
</p>
<p><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd096ba970b-pi"></a><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd0b4f0970b-pi"><img alt="Brain_18.02.2009_1000000719[1]" class="asset asset-image at-xid-6a00d834515fca69e20133edd0b4f0970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd0b4f0970b-320wi" style="display: block; margin-left: auto; margin-right: auto;" /></a> <span style="text-decoration: underline;"><br /></span></p><p>Doctors aren’t the only ones who want a way to peer inside the human body without performing surgery. Archaeologists who work with human mummies have the same need – in many cases their subjects are too fragile to withstand an autopsy. For instance, after scanning an enigmatic mummy at Paris’s Musée de l’Homme, the UsefulProgress technology can create the 3D images within seconds, allowing archaeologists to peer inside the skull cavity of the thousand-year-old subject, discovering clues to its origin and history.</p>

<p>Although medical imaging is its primary use case, UsefulProgress is finding other applications for its 3D volume rendering, including materials scanning, pharmaceutical research and even helping gemologists get a glimpse inside precious stones like diamonds before the stones are cut. </p><p><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd096ba970b-pi"><img alt="CVMain[1]" class="asset asset-image at-xid-6a00d834515fca69e20133edd096ba970b 
 " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133edd096ba970b-320wi" style="display: block; margin-left: auto; margin-right: auto;" /></a></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/f12Qs4rENhE" height="1" width="1"/>]]></content:encoded>



<category>3D</category>

<category>ECS</category>

<category>GTC</category>

<category>Parallel World</category>

<dc:creator>Alain Tiquet</dc:creator>
<pubDate>Mon, 17 May 2010 17:19:33 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/05/usefulprogress-and-nvidia-gpus-generate-3d-medical-imaging.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel: Mining Data on GPUs</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/tNITS8XrJ18/the-world-is-parallel-mining-data-on-gpus.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/05/the-world-is-parallel-mining-data-on-gpus.html</guid>
<description>In this series we’ve discussed software that takes advantage of GPU processing in the field of traditional “high-performance computing” or scientific computing domains, such as molecular dynamics, climate modeling, remote sensing and medical imaging. These applications tend to lend themselves naturally to parallel processing, and have a need for serious compute capability. Data mining, on the other hand, may not seem to be a natural fit for parallel processing. Yet at least one data mining software maker is scoring impressive performance gains using GPU processing for online business analytical processing (OLAP). OLAP is a technique for taking a deep dive...</description>


<content:encoded><![CDATA[<p>In <a href="http://blogs.nvidia.com/ntersect/parallel-world/" target="_blank">this series</a> we’ve discussed software that takes advantage of GPU processing in the field of traditional “high-performance computing” or scientific computing domains, such as <a href="http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallelgpus-in-chemistry-research.html" target="_blank">molecular dynamics</a>, climate modeling, <a href="http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpu-computing-tames-satellite-image-processing.html" target="_blank">remote sensing</a> and&#0160;<a href="http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpus-speed-medical-imaging.html" target="_blank">medical imaging</a>. These applications tend to lend themselves naturally to parallel processing, and have a need for serious compute capability. Data mining, on the other hand, may not seem to be a natural fit for parallel processing. Yet at least one data mining software maker is scoring impressive performance gains using GPU processing for <a href="http://en.wikipedia.org/wiki/Online_analytical_processing" target="”_blank”">online business analytical processing</a> (OLAP). </p>
<p>OLAP is a technique for taking a deep dive into a subset of what may be a very large database. Say, for example, you wanted to analyze sales by product, store location, and time of day. Data for those three variables are compacted into a “cube,” so called because each of the variables can be regarded as a dimension or axis and the entire data space can be visualized as a cube. In the real world, analyses typically involve more variables and the cube becomes an impossible to visualize hypercube of many dimensions. </p>
<p>The data compression is important because efficient processing requires that the data being analyzed be held completely in memory. Mattias Krämer, vice president for technology at OLAP software maker Jedox AG, says that multidimensional OLAP techniques can allow 20 GB of data to be compacted into a 2 GB cube, small enough to be stored in memory even on a relatively modest system. </p>
<p>Jedox, based in Freiburg, Germany, makes a set of tools called Palo Suite that, among other things, lets analysts run OLAP using familiar tools such as Microsoft Excel. The newest version of Palo, developed in cooperation with the University of Freiburg and the University of Western Australia, uses NVIDIA’s CUDA C to boost OLAP performance through GPU processing. </p>
<p style="text-align: center;"><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ed5b726c970b-pi" style="display: inline;"><img alt="Jedox" class="asset asset-image at-xid-6a00d834515fca69e20133ed5b726c970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ed5b726c970b-320wi" title="Jedox" /></a><br />Credit: Jedox AG </p>

<p></p>
<p>The company explains the advantage of the GPU approach by analogy. Say you had to deliver newspapers to a large number of homes. You could use a truck and find the most efficient route to visit the houses one after another. Or you could use a fleet of bicycles, each delivering a paper to one house. The bicycles (GPU processor cores) are slower and less powerful than the truck (the CPU), but the combination of sheer numbers and the elimination of the need to find an optimal route makes this method more efficient. </p>
<p>Interestingly, Palo GPU is part of the component of the suite that runs on a server. Servers are typically equipped with the most minimal graphics systems. They often lack displays altogether since they are frequently administered remotely, and even when a local display is present, it is rarely used for anything more demanding than server administration or scanning logs. </p>
<p>But Palo GPU makes a case for equipping a server with one or more high-end GPUs exclusively to get the benefit of GPU parallel processing. In fact, a number of server makers just announced integrated CPU-GPU servers and blade systems using <a href="http://blogs.nvidia.com/ntersect/2010/05/the-first-wave-of-tesla-20series-based-server-products.html" target="_blank">NVIDIA Tesla 20-series (Fermi) GPUs</a>.</p>
<p>“Depending on the structure of the OLAP cube, the company has been seeing performance boosts of 40X to 100X, compared to CPU processing. As performance [has been] the key factor in business intelligence applications for years now, these are really good numbers,” Krämer says. </p>
<p>“The tremendous computing power of today’s GPUs is achieved by using an array of processor cores that outnumbers current CPU cores by almost two orders of magnitude,” Jedox says in a soon-to-be-published white paper. “The massive parallelism offered by a GPU has been used to solve many problems with speedups ranging from tens to hundreds compared to a single processor. The popularity of general-purpose computing on graphics processing units (GPGPU) has gained further momentum with the releases of programming interfaces such as NVIDIA’s CUDA C and OpenCL or ATI Stream SDK which allow programmers to develop algorithms for GPUs using common languages such as C with only minimal extensions. The CUDA framework in particular has led to a dramatic increase in applications implemented for GPUs. Apart from graphics applications, GPUs are nowadays utilized in many other areas of computing, such as physics simulations, protein folding, cryptanalysis, and many more.” </p>
<p>While calling CUDA “a very good development tool for our purposes,” there are some improvements Krämer would like to see. One is better support for the C++ programming language, the object-oriented extension of C. Another, more basic, change would be to make it easier for CUDA applications to run as “services” under Windows, an approach that makes more reliable and easier to administer. NVIDIA recently released a new Tesla Compute Cluster driver that addresses this issue for Tesla products. </p>
<p><span style="font-size: 12px;"><span style="font-size: 11px;"><em>This post is an entry in </em><a href="http://blogs.nvidia.com/ntersect/parallel-world/" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em> series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on </em><a href="http://feeds.feedburner.com/nTersect/parallel-world" s_oc="null"><font color="#76b900"><em>The World Isn’t Flat, It’s Parallel</em></font></a><em>.</em></span></span></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/tNITS8XrJ18" height="1" width="1"/>]]></content:encoded>



<category>HPC</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<category>Tesla</category>

<category>Visual computing</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Fri, 07 May 2010 09:00:00 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/05/the-world-is-parallel-mining-data-on-gpus.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel: GPUs Speed Medical Imaging</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/sKwKcaUU9J8/the-world-is-parallel-gpus-speed-medical-imaging.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpus-speed-medical-imaging.html</guid>
<description>Medical imaging began with the discovery of X-rays in the 19th century and the old-fashioned X-ray remains a powerful diagnostic tool. These days, however, doctors can often do much, much better with modern imaging technologies. While the traditional X-ray requires nothing more than a source of the rays and a sheet of film, today’s imaging is computationally intense. Increasingly, that means the graphics systems of the computers at the heart of imaging technology end up doing double duty. High-performance graphics processors drive the super high-resolution displays that these devices demand. And the enormous processing power of those GPUs is used...</description>


<content:encoded><![CDATA[<p><a href="http://blogs.nvidia.com/ntersect/medical/" target="_blank">Medical</a> imaging began with the discovery of <a href="http://en.wikipedia.org/wiki/X-ray" target="_blank">X-rays</a> in the 19th century and the old-fashioned X-ray remains a powerful diagnostic tool. These days, however, doctors can often do much, much better with modern imaging technologies. While the traditional X-ray requires nothing more than a source of the rays and a sheet of film, today’s imaging is computationally intense.</p>
<p>Increasingly, that means the graphics systems of the computers at the heart of imaging technology end up doing double duty. <a href="http://blogs.nvidia.com/nTersect/hpc/" target="_blank">High-performance graphics</a> processors drive the super high-resolution displays that these devices demand. And the enormous processing power of those GPUs is used for the computational heavy lifting needed to create the images.</p>
<p><a href="http://en.wikipedia.org/wiki/X-ray_computed_tomography" target="_blank">Computerized tomography</a> (CT) is the oldest and still one of the most useful of the computerized imaging techniques. A CT scanner takes a series of X-ray pictures by rotating around the body. These pictures are then assembled to create an image of a cross-section of the body. A series of these slices can be used to construct a 3D image of a section of the body. </p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ecd8fc82970b-pi" style="display: inline;"><img alt="Ct_scan[1]" class="asset asset-image at-xid-6a00d834515fca69e20133ecd8fc82970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ecd8fc82970b-320wi" /></a> <br /><em>National Institutes of Health</em><br /></center>
<p>This sounds simple, but it takes a tremendous amount of math to get from individual exposures to the reconstructed image. Specifically, it involves the computation of a large number of Fourier transforms whose effect is to use data about neighboring points to increase the information about each point in the image. 
</p>

<p>If you have been <a href="http://blogs.nvidia.com/ntersect/parallel-world/" target="_blank">following this series</a>, you know that this is the sort of computation that profits tremendously from parallel processing, since the computation for each point can be carried out independently of any others. The effect can be dramatic. <a href="http://www.cs.sunysb.edu/%7Emueller/" target="_blank">Klaus Mueller’s</a> RapidCT project at the State University of New York-Stony Brook found that using GPU processing could reduce the time needed for a CT scan reconstruction from 135 seconds to less than seven seconds. And that was on an NV<a href="http://www.nvidia.com/page/geforce_8800.html" target="_blank">IDIA GeForce 8800 GTX GPU</a>, a fairly modest unit by current high-performance standards.</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e2013480090f64970c-pi" style="display: inline;"><img alt="Fmri" class="asset asset-image at-xid-6a00d834515fca69e2013480090f64970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e2013480090f64970c-320wi" /></a> <br /><em>UC San Diego</em></center>
<p>X-rays, including CT scans, depend on the fact that certain tissues, such as bones, absorb far more radiation than others. Magnetic resonance imaging (MRI) depends of the different response of tissue types to strong magnetic fields and is more effective than CT in resolving soft tissue. Like CT, MRI depends of mathematics to reconstruct the data into an image. MRI math is a lot more complicated, but again, it can be set up for efficient parallel processing.</p>
<p>The <a href="http://www.cabiatl.com/CABI/" target="”_blank”">Georgia Tech Center for Advanced Brain Imaging</a> (CABI) uses Jacket software from Accelereyes, which supports GPU processing in MathWorks MATLAB. CABI uses functional MRI (fMRI) scans to observe brain activity as it happens. </p>
<p>The key challenge in analyzing the scans, is “segmentation,” the process of dividing the brain image into the tissues, such as blood vessels, grey matter, white matter and cerebro-spinal fluid. CABI uses Accelereyes Statistical Parametric Mapping for bspline interpolation, a mathematical technique that accurately fits smooth curves to data points. </p>
<p>“The main factor that makes b-spline interpolation work nicely with parallelization is that it can be performed independently over individual data points,” says Ani Dasgupta, a Geogia Tech graduate student at CABI. “So as opposed to a CPU performing interpolation one data point at a time, a GPU would take up a chunk of points and interpolate them in parallel, since the interpolation of one point does not depend on the outcome of others.” </p>
<p>Parallel processing on the GPU has resulted in a 3.6X speedup of segmentation time, compared with CPU-only processing on an Intel quad core Nahalem-class processor. More recent tests point to a speedup of up to 15X.</p>
<p>Dasgupta also says that GPU-based parallel processing has gotten much easier: “Another important factor in favor of this is the ease of programming with GPUs after the advent of CUDA from nVIDIA. Traditionally, solving non-graphics problems on the GPU used to be very difficult, involving treating non-graphics data like vertices or pixel data points and using complicated graphics APIs to process this information. But with CUDA, a person who knows the C language can get around with writing a program for the GPU. So things are now better and easier for scientists from other fields, should they wish to do such work, than it was before.”</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e2013480091d4f970c-pi" style="display: inline;"><img alt="Ultrasound" class="asset asset-image at-xid-6a00d834515fca69e2013480091d4f970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e2013480091d4f970c-320wi" /></a> <br /><em>Spencer Technologies</em></center>
<p>Ultrasound imaging uses high-frequency sound waves to create pictures of internal tissues. But again, it takes a lot of calculation to convert what is basically a pattern of echoes into a usable image. <a href="http://www.spencertechnologies.com/" target="”_blank:">Spencer Technologies</a>, another Accelereyes customer, uses Doppler ultrasound, which, like Doppler radar, uses frequency shifts to detect motion, to analyze blood flow in the brain or motion of the brain relative to the skull. Using an NVIDIA GeForce GTX 280 and Accelereyes Jacket software, Spencer was able to calcuate brain displacement <a href="http://www.accelereyes.com/resources/braindisplacement" target="”_blank”">12 times faster</a> than with CPU processing alone. Spencer think is will soon be able to get a 16X improvement. </p>
<p>The bottom line on all these projects is that GPU-based parallel computing is producing better medical images faster. And as the tools get easier to use, the benefits should spread.</p>
<p><em><span style="font-size: 12px;"><span style="font-size: 11px;">This post is an entry in <a href="http://blogs.nvidia.com/ntersect/parallel-world/">The World Isn’t Flat, It’s Parallel</a> series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on <a href="http://feeds.feedburner.com/nTersect/parallel-world">The World Isn’t Flat, It’s Parallel</a>.</span></span></em></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/sKwKcaUU9J8" height="1" width="1"/>]]></content:encoded>



<category>GPGPU</category>

<category>HPC</category>

<category>Medical</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<category>Tesla</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Wed, 21 Apr 2010 16:55:44 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpus-speed-medical-imaging.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel: GPU Computing Tames Satellite Image Processing</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/LpWQMX44kjw/the-world-is-parallel-gpu-computing-tames-satellite-image-processing.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpu-computing-tames-satellite-image-processing.html</guid>
<description>Web mapping services and programs such as Google Earth have made all of us users of satellite images that not very long ago were available mainly to government intelligence agencies. There is a great deal more to these images than meets the eye, and extracting that information turns out to be a major computational challenge. But like most problems involving graphical data, the processing of remote sensing data such as satellite imagery is ideally suited for parallel processing and can be accomplished at relatively low cost by doing much of the computation on graphics processing units. Adobe Photoshop is an...</description>


<content:encoded><![CDATA[<p>Web mapping services and programs such as <a href="http://earth.google.com/" target="_blank">Google Earth</a> have made all of us users of satellite images that not very long ago were available mainly to government intelligence agencies. There is a great deal more to these images than meets the eye, and extracting that information turns out to be a major computational challenge. But like most problems involving graphical data, the processing of remote sensing data such as satellite imagery is ideally suited for parallel processing and can be accomplished at relatively low cost by doing much of the computation on graphics processing units. </p>
<p>Adobe Photoshop is an example of an image enhancement tool that many computer users are familiar with (and one that uses GPU processing when available to speed its often intense computations.) Photoshop, however, is primarily intended for photographers and other creative artists, while researchers generally need specialized, and often automated tools to extract the information they need. </p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e201347fbd0f80970c-pi" style="display: inline;"><img alt="Nvidiahq[1]" class="asset asset-image at-xid-6a00d834515fca69e201347fbd0f80970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e201347fbd0f80970c-320wi" /></a> <br /><em> Satellite view of NVIDIA’s Santa Clara, CA, campus in Google Earth</em></center>
<p>Intelligence work remains one of the important uses of image enhancement. The unmanned Predator unmanned aircrafts over Afghanistan and Iraq are flown by remote control from Nellis Air Force Base in Nevada, and depend on real time processing of images from on board cameras, satellites, and other sources to identify and verify targets. The capabilities of the U.S. National Reconnaissance Office’s Keyhole image intelligence satellites is classified, but it is generally believed that they are capable, with enhancement, of resolving features as small as a few centimeters (the U.S. government generally limits civilian satellite imagery to a resolution of one-half meter.) </p>
<p>Scientific research is also a huge user of image processing and climate change is an area of particular concern. Tracking the shrinking of polar ice caps and the extent of sea ice from space is relatively easy, but researchers are often interested in much more subtle alterations, such as the changes in species of trees and other plants that occurs with variation in temperature and rainfall. This requires much better resolution than spotting an iceberg. The mathematical techniques used to process these remote sensing images are computationally intensive and can benefit greatly from GPU computing techniques. 
</p>

<p>A number of image enhancement techniques can make the information hidden in satellite photos usable. The laws of physics limit how small an object can be resolved depending on the size of the lens or mirror and the distance to the target. But image processing can do better by bringing out features that the optics alone leave fuzzy. One way to improve on that is through edge detection, a mathematical technique that can be used to sharpen out-of-focus images by figuring out where sharp edges should be. </p>
<p>False color imaging is another technique that can be used to discover hidden data in images. Sometimes it is used to highlight features, such as differing types of vegetation, that would be hard to spot in a true color image, where all green plants, for example, look pretty much alike. A common use of false color is to capture data that would otherwise be invisible because it is captured outside the spectrum of visible light, usually in the infrared or ultraviolet. </p>
<p><a href="http://mcanty.homepage.t-online.de/" target="”_new”">Mort Canty</a> is a physicist at the Jülich Research Center in southwestern Germany, where he does research in remote sensing and image processing, including geometric and radiometric corrections and map projections of satellite data. He has become a convert to GPU processing to get the maximum computational bang for the buck. (For the technically inclined, he mostly uses the IDL image processing language and ENVI software from <a href="http://www.ittvis.com/ProductServices/ENVI.aspx" target="”_blank”">ITT Visual Information Systems</a> and it the author of <a href="http://www.crcpress.com/product/isbn/9781420087130" target="”_blank”">Image Analysis, Classification, and Change Detection in Remote Sensing: With Algorithms for ENVI/IDL</a>. He’s also been experimenting with Tech-X’s <a href="http://blogs.nvidia.com/ntersect/2010/03/the-world-is-parallel-techx-makes-gpu-processing-accessible.html" target="”_blank”">GPUlib</a> as another way to run his computations on a NVIDIA GPU using CUDA.</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e201347fbd12c8970c-pi" style="display: inline;"><img alt="Kkmeans[1]" class="asset asset-image at-xid-6a00d834515fca69e201347fbd12c8970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e201347fbd12c8970c-320wi" /></a> <br /><em> ENVI used to process false color image of nonlinear clustering (Mort Canty) </em></center>
<p>“Recently I have been working with nonlinear image transformations, which are quite computationally intensive,” he says. “GPULib is giving me a tenfold speedup on an old NVIDIA GeForce graphics card (which I hope to replace soon with a Fermi GPU).” He gives the main credit to CUDA’s extremely fast execution of matrix operations: “For someone with limited time or programming abilities, a high-level package like GPULib can open a door to parallel processing which might otherwise have remained closed.” </p>
<p>With satellite data now available either free (from Google and Microsoft) or in specialized forms at low cost, image enhancement software proliferating, and GPUs providing massive processing power, the use of image enhancement to extract information from remote sensing data is likely to explode. </p>

<p><em><span style="font-size: 12px;"><span style="font-size: 11px;">This 
post is an entry in <a href="http://blogs.nvidia.com/ntersect/parallel-world/">The World Isn’t 
Flat, It’s Parallel</a> series running on nTersect, focused on the GPU’s
 importance and the future of parallel processing. Today, GPUs can 
operate faster and more cost-efficiently than CPUs in a range of 
increasingly important sectors, such as medicine, national security, 
natural resources and emergency services. For more information on GPUs 
and their applications, keep your eyes on <a href="http://feeds.feedburner.com/nTersect/parallel-world">The World 
Isn’t Flat, It’s Parallel</a>.</span></span></em></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/LpWQMX44kjw" height="1" width="1"/>]]></content:encoded>



<category>GPGPU</category>

<category>HPC</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<category>Tesla</category>

<category>Visual computing</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Thu, 08 Apr 2010 14:00:00 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallel-gpu-computing-tames-satellite-image-processing.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel—GPUs in Chemistry Research</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/v500U5OpfgM/the-world-is-parallelgpus-in-chemistry-research.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallelgpus-in-chemistry-research.html</guid>
<description>We don’t usually think of chemicals as having shapes. But from the simple V of a water molecule to the intricate folds of a protein, the shape of a chemical compound plays a critical role in how it reacts with other molecules. For example, many drugs work by binding to specific receptors in cells, a process that depends on a precise match between the shape of the drug molecule and of the receptor. The shape of a molecule is determined by the interactions of the electrons in its constituent atoms, ultimately at the level of quantum physics. In the simple...</description>


<content:encoded><![CDATA[<p>We don’t usually think of chemicals as having shapes. But from the simple V of a water molecule to the intricate folds of a protein, the shape of a chemical compound plays a critical role in how it reacts with other molecules. For example, many drugs work by binding to specific receptors in cells, a process that depends on a precise match between the shape of the drug molecule and of the receptor.
</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ec631fc1970b-pi" style="display: inline;"><img alt="Gtx280_folding_2" class="asset asset-image at-xid-6a00d834515fca69e20133ec631fc1970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20133ec631fc1970b-320wi" /></a> <br /></center>
<p>
The shape of a molecule is determined by the interactions of the electrons in its constituent atoms, ultimately at the level of quantum physics. In the simple case of water, the physics causes the two hydrogen atoms to bond to oxygen at a 105° angle. Proteins, however, can contain thousands of atoms arranged in a helix that twists and turns into complex shapes.
</p>

<p>As in so many other fields, the immense power of computers to complete in seconds. computations that would take human beings lifetimes has revolutionized chemistry, giving rise to the collaboration of chemistry, physics, mathematics and computer science known as <a href="http://en.wikipedia.org/wiki/Computational_chemistry" target="_blank">computational chemistry</a>. Pharmaceutical research, for example, used to be a hit-or-miss process testing thousands of chemicals for pharmacological effects, with far more misses than hits. Now researchers are more likely to figure out what sort of molecule they need, set out to design it, and then figure out a way to synthesize it.
</p>

<p>For large molecules--and proteins and other biologically active molecules that are often very large--this becomes a daunting computational task. Fortunately, it is one that lends itself well to the efficiencies of parallel computing. Most of this work used to be carried out on supercomputers or custom-designed clusters of workstations and servers. More recently, the work is moving to massively multi-core graphics processing units, such as NVIDIA’s <a href="http://www.nvidia.com/object/tesla_computing_solutions.html" target="_blank">Tesla</a>. </p>



<p>The results of this can be very impressive. TeraChem, from <a href="http://www.petachem.com/" target="_blank">PetaChem LLC</a> is a quantum chemistry software package optimized for GPUs. Running analyses of several molecules on a workstation with four Tesla GPUs, TeraChem performed 8 to 50 times faster than the widely used General Atomic and Molecular Structure System (GAMESS) software running on a cluster of 256 quad core CPUs. A quad Tesla workstation is hardly your garden variety desktop—the 240-core Tesla C1060s go for about $1,300 apiece—but the setup outperformed far more expensive and complex hardware.
</p>

<p>
Harvard chemist <a href="http://aspuru.unix.fas.harvard.edu/About/" target="_blank">Alán Aspuru-Guzic</a> is a convert to GPU computing. His quantum chemistry research group analyses molecultes using electron correlation. This approach requires solutions to Schrödinger equations, differential equations that describe changes in the state of the system over time. An exact solution to a Schrödinger equation requires knowing all possible quantum states at the same time, a complicated version of the famous mind experiment of Schrödinger’s cat, which may be alive, dead, or both inside a sealed box. That’s a problem that can only be solved on a quantum computer, a device that unfortunately exists only in computer science labs, and there only in a primitive and not very usable state. Eventually, quantum computers will be available to computational chemists. But, says Aspuru-Guzik, “it will take a decade, maybe two decades. It’s hard to predict.”</p>
<p>Lacking quantum computers, researchers have to settle for close approximations to the exact solutions, but even these require tremendous computational effort. “In the meantime we have the GPU,” says Aspuru-Guzik. “The GPU is a very attractive alternative because it is cheap. It’s the future of computing.”</p>

<p>Aspuru-Guzik’s toolkit includes Q-Chem 3.1, a commercial quantum chemistry program, and <a href="http://developer.download.nvidia.com/compute/cuda/1_0/CUBLAS_Library_1.0.pdf%E2%80%9D" target="_blank">CUBLAS,</a> a high-powered linear algebra system based on CUDA, NVIDIA’s technology for general computing on GPUs. They found that using GPUs to assist in the multiplication of large matrices, another job ideally suited to parallel processing, sped the task by a factor of 13 over the use of the CPU alone. Most of us, of course, will never try to figure out the interactions of electrons in a molecule, nor will we have any use for a quantum computer. But the result of this work is important to all of us because it means better understanding of basic chemical processes and ultimately such things as faster development of new drugs. You can even get involved through a Stanford University project called <a folding.typepad.com="" href="http://folding.typepad.com/" http:="" target="_blank" ”="">Folding@home</a>. that uses idle time on thousands of computers to compute protein folding. And if your computer has a modern GPU, you’ll become part of the parallel revolution.</p>
<span style="font-size: 12px;"><span style="font-size: 11px;"><em>This post is an entry in </em><a href="http://blogs.nvidia.com/ntersect/parallel-world/"><em>The World Isn’t Flat, It’s Parallel</em></a><em> series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on </em><a href="http://feeds.feedburner.com/nTersect/parallel-world"><em>The World Isn’t Flat, It’s Parallel</em></a><em>.</em></span></span><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/v500U5OpfgM" height="1" width="1"/>]]></content:encoded>



<category>CUDA</category>

<category>GPGPU</category>

<category>HPC</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<category>Tesla</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Thu, 01 Apr 2010 12:20:05 -0700</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/04/the-world-is-parallelgpus-in-chemistry-research.html</feedburner:origLink></item>

<item>
<title>The World Is Parallel: Tech-X Makes GPU Processing Accessible</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/JzLcM6r8e8w/the-world-is-parallel-techx-makes-gpu-processing-accessible.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/03/the-world-is-parallel-techx-makes-gpu-processing-accessible.html</guid>
<description>Lots of researchers who do computationally intense work could use more processing power. Many of them actually have that power available on their computers, but haven’t found a way to take advantage of it. The computational clout is in the multiple processor cores of the computer’s graphics system, where it is not easily accessible. A tool like NVIDIA’s CUDA parallel computing model makes the GPU cores, up to 240 of them on the latest NVIDIA Tesla GPUs, available to programs. But to take maximum advantage of it, you have to be a skilled C or C++ programmer. The problem is...</description>


<content:encoded><![CDATA[<p>Lots of researchers who do computationally intense work could use more processing power. Many of them actually have that power available on their computers, but haven’t found a way to take advantage of it. The computational clout is in the multiple processor cores of the computer’s graphics system, where it is not easily accessible.</p>

<p>A tool like NVIDIA’s <a href="http://www.nvidia.com/object/GPU_Computing.html" target="”_blank”">CUDA parallel computing model</a> makes the GPU cores, up to 240 of them on the latest NVIDIA Tesla GPUs, available to programs. But to take maximum advantage of it, you have to be a skilled C or C++ programmer. The problem is that many of the people who would benefit most from high-performance computing are not software developers by profession. They write customized code out of necessity, but their primary work is in chemistry, geology, astronomy, physics or biology.</p>
<p>
<a href="http://www.txcorp.com" target="”_blank”">Tech-X Corp.</a>, a Boulder, CO, software and consulting company specializing in high-performance scientific computing, is working to change that. Its GPUlib is a tool that brings GPU-based computing into the high-level tools used by researchers, including <a href="http://www.ittvis.com/ProductServices/IDL.aspx" target="”blank”">ITT Visual Information Solutions’ IDL</a>, Mathworks’ <a href="http://www.mathworks.com/" target="”_blank”">MATLAB</a>, and that trusty old laboratory standby, Fortran.</p>

<p>“Parallel computing used to be a very elite field,” says Peter Messmer, vice president for space applications at Tech-X. “Few applications are designed to take advantage of it. GPU processing makes it much more mainstream.” Until GPU processing came along, the cheapest way to get very high performance in the lab was by building a cluster of relatively inexpensive PCs, but this took skills that researchers who weren’t computer scientists or electrical engineers often lacked. “The GPU makes it much more mainstream,” says Messmer.</p><p>
</p>


<p>GPU cores are best at <a href="http://en.wikipedia.org/wiki/Vector_processor" target="”_blank”">vector processing</a>, math in which large arrays of data are manipulated simultaneously, since that is what is needed for the GPU’s primary task of rendering graphics. That made Tech-X’s choice of working with IDL and MATLAB a natural, since these tools are already optimized for manipulating vector data. Typical uses include image processing for astronomical and remote sensing data and medical imaging.</p>

<p>Messmer says a major challenge is just getting researchers to try GPU computing. “People have heard a lot about GPU computing, but they are skeptical,” he says. “They remember the field-programmable gate array hype from a few years ago, where it turned out to be too complicated for people to do anything. GPUlib helps because it maps to how people already think about problems.”</p>

<p>GPUlib is free for academic use and $495 for commercial use. It is available for Windows, Mac, and Linux.</p>

<p><em><span style="font-size: 12px;"><span style="font-size: 11px;">This post is an entry in <a href="http://blogs.nvidia.com/ntersect/parallel-world/">The World Isn’t Flat, It’s Parallel</a> series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on <a href="http://feeds.feedburner.com/nTersect/parallel-world">The World Isn’t Flat, It’s Parallel</a>.</span></span></em></p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/JzLcM6r8e8w" height="1" width="1"/>]]></content:encoded>



<category>CUDA</category>

<category>GPGPU</category>

<category>HPC</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Thu, 11 Mar 2010 16:00:00 -0800</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/03/the-world-is-parallel-techx-makes-gpu-processing-accessible.html</feedburner:origLink></item>

<item>
<title>The World is Parallel: The Opportunity and Challenge of Parallel Computing </title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/q9YfBhnjzMs/the-world-is-parallel-the-opportunity-and-challenge-of-parallel-computing-.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/03/the-world-is-parallel-the-opportunity-and-challenge-of-parallel-computing-.html</guid>
<description>This post is an introduction to a series of reports on computer scientists and other researchers who are unlocking the high-performance computing potential of parallel programming using large numbers of processor cores. But first some background on the opportunity and the challenge of parallel computing. Some time around the middle of the last decade, the race to ever-faster computing hit the wall. Until then, designers had delivered soaring performance through three well-understood technologies: shrinking the already microscopic transistors, cramming more of them into each processor, and running them at higher speeds The problem was that faster processor performance translated into...</description>


<content:encoded><![CDATA[<p>This post is an introduction to a <a href="http://blogs.nvidia.com/ntersect/parallel-world/" target="_blank">series of reports</a> on computer scientists and other researchers who are unlocking the high-performance computing potential of parallel programming using large numbers of processor cores. But first some background on the opportunity and the challenge of parallel computing.</p>
<p>Some time around the middle of the last decade, the race to ever-faster computing hit the wall. Until then, designers had delivered soaring performance through three well-understood technologies: shrinking the already microscopic transistors, cramming more of them into each processor, and running them at higher speeds</p>
<p>The problem was that faster processor performance translated into higher power consumption and more heat, and even if you could find a way to get rid of the excess heat before the chips fried, continuation of the trend posed unacceptable economic and environmental costs.</p>
<p>An alternative route to faster computing had been around for some time. Instead of driving the processors harder, use more of them.&#0160; Mainframe computers and servers had long used multiple processors to handle heavy loads, but advances in chip technology made it possible to combine multiple processors on a single chip, an approach that is both more efficient and much cheaper. Today, high-performance computing is a story of dividing computational workloads over multiple processor cores. In the case of personal computers, this means both a handful of cores in the CPU and dozens, sometimes hundreds of cores in the graphics processing unit (GPU).</p><p>
</p>

<p>But multiprocessor hardware brings with it a significant software challenge. From the beginning of modern computing in the 1940s, programs had been designed to work sequentially. Funding, mostly by the Defense Advanced Projects Research Agency, produced some successes in systems with large numbers of processors designed to solve computations problems by breaking them into many pieces that could be run simultaneously, but these massively parallel systems never achieved commercial viability.</p>
<p>One reason is that most common computing problems, and the algorithms used to solve them, are not well suited to this sort of breakup. And sequential thinking seems to be wired into our brains. Neuroscientist Jill Bolte Taylor says the right hemisphere of the brain, which processes sensory signals, does parallel processing but the left hemisphere, which is responsible for analytic thinking, “functions like a serial processor.” For better or worse, programming is a left-brain activity.</p>
<center>
<object height="326" width="446"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" /><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent" /><param name="bgColor" value="#ffffff" /><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/JillBolteTaylor_2008-medium.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/JillBolteTaylor-2008.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=229&amp;introDuration=16500&amp;adDuration=4000&amp;postAdDuration=2000&amp;adKeys=talk=jill_bolte_taylor_s_powerful_stroke_of_insight;year=2008;theme=medicine_without_borders;theme=top_10_tedtalks;theme=master_storytellers;theme=how_the_mind_works;event=TED2008;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" />
 <embed allowfullscreen="true" bgcolor="#ffffff" flashvars="vu=http://video.ted.com/talks/dynamic/JillBolteTaylor_2008-medium.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/JillBolteTaylor-2008.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=229&amp;introDuration=16500&amp;adDuration=4000&amp;postAdDuration=2000&amp;adKeys=talk=jill_bolte_taylor_s_powerful_stroke_of_insight;year=2008;theme=medicine_without_borders;theme=top_10_tedtalks;theme=master_storytellers;theme=how_the_mind_works;event=TED2008;" height="326" pluginspace="http://www.macromedia.com/go/getflashplayer" src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" type="application/x-shockwave-flash" width="446" wmode="transparent" /></object></center>
<p>The biggest mathematical impediment to parallel approaches is that many processes are recursive: each step depends on the result of previous steps. Consider the simple problem of finding the greatest common divisor of two integers. The standard method of doing this, the Euclidean algorithm, has been known for over 2,000 years and uses repeated subtraction.</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e20120a8fc07eb970b-pi" style="display: inline;"><img alt="Euclid" class="asset asset-image at-xid-6a00d834515fca69e20120a8fc07eb970b " src="http://blogs.nvidia.com/.a/6a00d834515fca69e20120a8fc07eb970b-320wi" /></a> <br /></center>
<p>For example, if you want to find the greatest common divisor of 2,987 and 1,751, start by subtracting 1,751 from 2,987. Repeatedly subtract the difference (switching the order if needed to prevent negative numbers) until the result is 0. In this case, the two numbers have a largest common factor of 103. It’s a beautiful and efficient process, but it is inherently sequential because each subtraction depends on the previous result.</p>
<p>The great exception to the dominance of serial thinking is graphics. A very simple, common, and typical need in graphics is the need to rotate an image. If you remember some trigonometry, you may recall a simple formula to rotate point counterclockwise through an angle &amp; Theta:</p>
<center><a href="http://blogs.nvidia.com/.a/6a00d834515fca69e201310f62ee22970c-pi" style="display: inline;"><img alt="Rotate" class="asset asset-image at-xid-6a00d834515fca69e201310f62ee22970c " src="http://blogs.nvidia.com/.a/6a00d834515fca69e201310f62ee22970c-500wi" /></a> <br /></center>
<p>The importance of this is that each point can be processed independently of every other point. If you had as many processors as points, the entire transformation could be computed in a single, massively parallel operation. And the same is true of many more complex graphics tasks. </p>
<p>The parallel-friendly nature of graphics work led to the early incorporation of a multi-processor architecture into graphics processing units (GPU). NVIDIA’s top-of-the-line Tesla GPUs currently feature 240 processor cores. While these cores are not as flexible as CPU processors, they excel at certain tasks, such as the vector operations that lie at the heart of many intense computational problems.</p>
<p>Software for effective utilization of large numbers of cores, both CPU and GPU, remains a challenge but things are getting better. NVIDIA helped lead the way with the <a href="http://www.nvidia.com/object/GPU_Computing.html" target="_blank">CUDA parallel programming model</a> which enabled general-purpose computation on NVIDA GPUs and with extensions to the C programming language that made the processors accessible.&#0160;&#0160; Developers can thus program NVIDIA’s CUDA GPUs using languages such as C, C++, and Fortran via the CUDA toolkit and PGI’s CUDA Fortran compiler respectively and also using multiple driver-level APIs such as OpenCL and DirectCompute.</p>
<p>One of the biggest challenges facing software developers is that to get more performance for existing applications and to develop new more compute-intensive applications, they have no choice but to consider parallelizing their applications, whether they choose multi-core CPUs or many core GPUs.&#0160;&#0160; Based on the last few years of development, the CUDA parallel programming model has established itself as an “easier” way to do parallel programming (it still isn’t easy, but CUDA does make certain things easier).&#0160; Also, GPUs can offer a tremendous performance advantage over CPUs and so combining these two elements offers developers a way to develop more innovative applications.</p><img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/q9YfBhnjzMs" height="1" width="1"/>]]></content:encoded>



<category>CUDA</category>

<category>HPC</category>

<category>Parallel World</category>

<category>Steve Wildstrom</category>

<category>Tesla</category>

<dc:creator>Steve Wildstrom</dc:creator>
<pubDate>Thu, 04 Mar 2010 13:57:21 -0800</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/03/the-world-is-parallel-the-opportunity-and-challenge-of-parallel-computing-.html</feedburner:origLink></item>

<item>
<title>A 3D Ultrasound for Better Cancer Detection</title>
<link>http://feedproxy.google.com/~r/ntersect/parallel-world/~3/WW93C25fzzQ/a-3d-ultrasound-for-better-cancer-detection.html</link>
<guid isPermaLink="false">http://blogs.nvidia.com/ntersect/2010/02/a-3d-ultrasound-for-better-cancer-detection.html</guid>
<description>How can GPUs help women who are facing a possible diagnosis of breast cancer? The same computational speed-ups that GPUs bring to everything from oil exploration to drug discovery are also taking place in the field of medical imaging. In the case of breast imaging, TechniScan Medical Systems is using parallel computing to give women and their doctors fast, accurate information about breast health. Breast cancer detection today relies on 2D imaging systems and invasive procedures like biopsy. The former is often imprecise due to various factors such as operator expertise, the latter is painful and often unnecessary. But because...</description>


<content:encoded><![CDATA[<p>How can GPUs help women who are facing a possible diagnosis of breast cancer? The same computational speed-ups that GPUs bring to everything from oil exploration to drug discovery are also taking place in the field of medical imaging. In the case of breast imaging, <a href="http://www.techniscanmedicalsystems.com/" target=_blank>TechniScan Medical Systems</a> is using parallel computing to give women and their doctors fast, accurate information about breast health.</p>

<center><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/QswrcIbchnA&hl=en_US&fs=1&color1=0x234900&color2=0x4e9e00"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/QswrcIbchnA&hl=en_US&fs=1&color1=0x234900&color2=0x4e9e00" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="295"></embed></object></center>

<p>Breast cancer detection today relies on 2D imaging systems and invasive procedures like biopsy. The former is often imprecise due to various factors such as operator expertise, the latter is painful and often unnecessary. But because doctors don’t want to be wrong on something like cancer, they’ll act aggressively on anything that looks suspicious – even though 80% of biopsies come back negative. That’s a lot of unnecessary fear and distress for patients.</p>

<p>TechniScan’s solution uses 3D ultrasound to create a detailed picture that may help doctors in the process of finding and treating breast cancer. (This video gives a detailed look at how the system works.) Basically, the machine’s scanner rotates all the way around a patient’s breast, capturing a scan every 2 degrees, and then composites a detailed 3D image. Each image is around 8 to 9 million voxels (the 3D equivalent of a pixel) and requires more than 120 million Fast Fourier Transform (FFT) calculations to build. </p>



<p>Now comes the hard part – tackling a computational load of this size in a real-world hospital environment. For 3D ultrasound to make sense for hospitals, the process needed to be both cost effective and time effective. </p>

<p>TechniScan realized that – great as 3D ultrasound promised to be – it wouldn’t get broad traction unless calculations could completed within 30 minutes. That’s the magic number that allows a patient to get her results during her appointment and keeps the device running every hour. </p>

<p>Throw CPUs at this problem, and they could work – but not in the footprint, not at the cost and definitely not at the same speed as GPUs. TechniScan’s system uses two Tesla C1060 GPUs to process images in less than 30 minutes. It would take more than an hour for a quad-core CPU cluster to do the same job. For TechniScan – and for patients everywhere – that’s not a viable result. </p>

<p>TechniScan Medical Systems is in the process of applying for 510(k) clearance of the WBU System. The system is not cleared for sale at this time.</p>

<img src="http://feeds.feedburner.com/~r/ntersect/parallel-world/~4/WW93C25fzzQ" height="1" width="1"/>]]></content:encoded>



<category>3D</category>

<category>CUDA</category>

<category>HPC</category>

<category>Medical</category>

<category>Parallel World</category>

<dc:creator>Andy Walsh</dc:creator>
<pubDate>Thu, 25 Feb 2010 12:00:00 -0800</pubDate>

<feedburner:origLink>http://blogs.nvidia.com/ntersect/2010/02/a-3d-ultrasound-for-better-cancer-detection.html</feedburner:origLink></item>

</channel>
</rss><!-- ph=1 -->

