<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>GPU Software Blog</title>
	
	<link>http://blog.accelereyes.com/blog</link>
	<description>Helpful posts about GPU computing. Discussion of Jacket and ArrayFire. Real speedups on real code!</description>
	<lastBuildDate>Wed, 22 Feb 2012 18:05:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/gpu_software" /><feedburner:info uri="gpu_software" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>gpu_software</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Jacket Continues to Crush the Clone</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/yKvrh7MGOJ4/</link>
		<comments>http://blog.accelereyes.com/blog/2012/02/22/crushing-the-clone/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 18:05:46 +0000</pubDate>
		<dc:creator>melonakos</dc:creator>
				<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[Jacket]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2258</guid>
		<description><![CDATA[This morning, I woke up to find the following comment in the MATLAB® Newsgroup: Over two years ago, MathWorks® started to build a clone of Jacket, which you now know as the GPU computing support in the Parallel Computing Toolbox (TM).  At the time, there were many naysayers suggesting that Jacket would somehow be eclipsed [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>This morning, I woke up to find the following comment in the <a href="http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/3f96dbdffc13cb23">MATLAB® Newsgroup</a>:</p>
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/bsxfun_post.png"><img class="aligncenter size-full wp-image-2259" title="bsxfun_post" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/bsxfun_post.png" alt="" width="572" height="400" /></a></p>
<p>Over two years ago, MathWorks® started to build a clone of Jacket, which you now know as the GPU computing support in the Parallel Computing Toolbox (TM).  At the time, there were many naysayers suggesting that Jacket would somehow be eclipsed by the clone.  Made sense, right?</p>
<p>Wrong!  Here we are 2 years later and the clone is still a poor imitation. There are <a href="http://accelereyes.com/compare">several technical reasons</a> for this, but if you are serious about getting great performance from your GPU, Jacket is the better option.  Look at all the <a href="http://accelereyes.com/examples">real customers</a> that are getting big benefit. Here are some other recent benchmarks from the <a href="http://www.walkingrandomly.com/?p=4062">Walking Randomly Blog</a> that show Jacket on a laptop is faster than PCT (TM) on a Tesla:</p>
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/wr_bench.png"><img class="aligncenter size-full wp-image-2285" title="wr_bench" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/wr_bench.png" alt="" width="532" height="121" /></a></p>
<p>If it were easy to imitate Jacket, then MathWorks® would have siphoned away all the Jacket users.  The truth is that it is not easy to build great GPU software, and the Jacket user base continues to explode.  Jacket is not only better than PCT (TM) but is also getting better at a <a href="http://wiki.accelereyes.com/wiki/index.php/Release_Notes">faster rate</a>.  Here&#8217;s to another 2, 5, and 10+ years of great speeds for all of you Jacket programmers!</p>
<p>To Juliette and others out there, if you really want PCT (TM) to get better, you might consider asking MathWorks® to spending less time cloning and more time working with others who are adding value to the MATLAB® ecosystem.</p>
<div class="shr-publisher-2258"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F02%2F22%2Fcrushing-the-clone%2F' data-shr_title='Jacket+Continues+to+Crush+the+Clone'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/yKvrh7MGOJ4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/02/22/crushing-the-clone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/02/22/crushing-the-clone/</feedburner:origLink></item>
		<item>
		<title>CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/ziJptiq-43k/</link>
		<comments>http://blog.accelereyes.com/blog/2012/02/20/cuda-and-opencl-benchmarks/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 02:27:52 +0000</pubDate>
		<dc:creator>melonakos</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[keeneland]]></category>
		<category><![CDATA[ornl]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2249</guid>
		<description><![CDATA[Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week&#8217;s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>Today was Day 1 of the <a href="http://keeneland.gatech.edu/2012-02-20-workshop">Keeneland Workshop</a>.  Many great talks were given, across a broad range of GPU computing topics.</p>
<p>With last week&#8217;s <a href="http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/">ArrayFire Webinar</a> fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle&#8217;s <a href="http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/190-shoc.pdf">presentation</a>.  For numbers &gt;1, CUDA is faster.  For numbers &lt;1, OpenCL is faster.  Performance in most cases is close to equivalent.</p>
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/190-shoc_Page_071.png"><img class="aligncenter size-full wp-image-2251" title="190-shoc_Page_07" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/190-shoc_Page_071.png" alt="" width="600" height="450" /></a></p>
<p>Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel for OpenCL right now seems to be in the FFT and a few other cases related to texture memory optimizations.</p>
<p>Many other great talks were given at the workshop. ArrayFire and Jacket were also covered in the library talks.</p>
<p>If you are looking at solving HPC type problems with GPUs, you should follow the activities of the Keeneland group.  These guys are leaders in GPU computing for HPC.</p>
<p>&nbsp;</p>
<div class="shr-publisher-2249"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F02%2F20%2Fcuda-and-opencl-benchmarks%2F' data-shr_title='CUDA+and+OpenCL+Benchmarks+-+Keeneland+Workshop+Day+1'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/ziJptiq-43k" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/02/20/cuda-and-opencl-benchmarks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/02/20/cuda-and-opencl-benchmarks/</feedburner:origLink></item>
		<item>
		<title>ArrayFire Webinar Recap – OpenCL vs CUDA Comparisons</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/KuyJeaoQqUo/</link>
		<comments>http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 17:44:45 +0000</pubDate>
		<dc:creator>vishy</dc:creator>
				<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[tradeoffs]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2227</guid>
		<description><![CDATA[In case you missed it, we recently held an ArrayFire Webinar, focused on exploring the tradeoffs of OpenCL vs CUDA. This webinar is part of an ongoing series of webinars held each month to present new GPU software topics as well as programming techniques with Jacket and ArrayFire. For those of you who missed it, we [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>In case you missed it, we recently held an ArrayFire Webinar, focused on exploring the tradeoffs of OpenCL vs CUDA.</p>
<p>This webinar is part of an ongoing <a href="http://blog.accelereyes.com/blog/2012/01/12/accelereyes_webinars_2012q1/">series of webinars</a> held each month to present new GPU software topics as well as programming techniques with Jacket and ArrayFire.</p>
<p>For those of you who missed it, we provide a recap here. Lots of questions were fielded by our team, so it&#8217;s a must-watch. We hope to see you at <a href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes">the next one</a>!</p>
<h3>Recap</h3>
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf">Download the slides</a>.  Here is a transcript of the content portion of the webinar:</p>
<p>AccelerEyes is pleased to present today&#8217;s ArrayFire webinar looking at OpenCL and CUDA Trade-offs and Comparisons. Everyday, we interact with many programmers in various stages of GPU development projects. In making GPU project decisions, there is a lot of information to absorb from a variety of sources. The intent of this webinar is to condense this information into the key points that matter, to help you digest GPU computing software decisions.</p>
<p>Over the last 5 years, as we&#8217;ve collected information from our GPU computing customers, we&#8217;ve determined that there are 5 core GPU software features that programmers seek in a technology. These include:</p>
<ul>
<li>Performance:  This is the core motivation for GPU computing and relates to quality. &#8220;How good will my code end up?&#8221; is the question here. &#8220;How fast can I push it?&#8221;</li>
<li>Scalability:  This is related to labor costs and quality of results. &#8220;If I develop my code on a workstation, can I launch it on a cluster without major headaches?&#8221; is the question here.</li>
<li>Portability:  This is related to flexibility and costs, both labor costs and fixed hardware costs. It can also lead to a question of quality, and freedom to move code to better or newer hardware that emerges is important.</li>
<li>Community:  This is a broad term meant to encompass other terms like support, logevity, commitment, etc. It is important that the technology platform has a lot of users and momentum. It is important that investments made today persist and pay dividends down the road.</li>
<li>Programmability:  A sense of how much labor and effort are required to get good performance. This includes a sense of the software platforms maturity, including robustness to bugs and availability of functionality.</li>
</ul>
<div><span id="more-2227"></span></div>
<p>We&#8217;ve arranged this webinar to address these five GPU Software Features for the two major GPU computing platforms, CUDA and OpenCL. For each of these features, we will share some of our thoughts on how CUDA and OpenCL compare. Then we will conclude the discussion of each feature with some comments on how the feature relates to ArrayFire (those slides have an ArrayFire logo in the upper right hand corner of the slide).</p>
<p>The first feature is Performance. Both CUDA and OpenCL are fast, and on GPU devices they are much faster than the CPU for data-parallel codes, with 10X speedups commonly seen on data-parallel problems.</p>
<p>Both CUDA and OpenCL <em>can</em> fully utilize the hardware. They are both entirely sufficient to extract all the performance available in whatever hardware device. But we italicized the word &#8220;<em>can</em>&#8221; here, because the devil is really in the details. Performance depends upon a slew of variables, including hardware type, algorithm type, and code quality. It is nearly impossible to guess how much speedup you can extract from a piece of code. In our experience, nearly all science, engineering, and financial codes can get great benefit out of GPU hardware, but the big question is how difficult is it to transform your algorithm to realize those benefits.  We&#8217;ll discuss programmability later on.</p>
<p>We will present ArrayFire performance results in CUDA and OpenCL at the end of the webinar.</p>
<p>The second feature we&#8217;ll discuss is Scalability. Scalability can mean many things, so we&#8217;ve broken this discussion down into 3 kinds of scaling.</p>
<p>From Laptops to Single GPU machines, both CUDA and OpenCL codes scale without any code change. This is a very common use case. We see nearly half of GPU computing users leverage a laptop at some stage of GPU development and later move the code to a different &#8220;performance&#8221; hardware setup, like a workstation or cluster. Both CUDA and OpenCL make life easy for this use case.</p>
<p>From a Single GPU Machine to a Multi-GPU Machine, both CUDA and OpenCL require user managed code for low-level synchronization of communication between the multiple GPUs. This is a headache, but with patience, manageable by both CUDA and OpenCL.</p>
<p>From a Multi-GPU machine to a Cluster, neither CUDA nor OpenCL really offer much assistance. Rather, programmers tend to write their own MPI code that handles all the cluster communication and then use CUDA and OpenCL directly in each node.</p>
<p>With respect to Scability, there are some other interesting developments of note. The first is that there is some new technology in CUDA called GPUDirect that is aimed at reducing memory transfer overheads when communicating between multiple GPUs. It has optimizations to reduce overhead by allowing Peer-to-peer memory transfers between GPUs on the same PCI express bus. It also has optimization to reduce the overhead of moving data from GPU memory to a network interface card. This is certainly and interesting development, but it is too new for us to say if it is actually something people will use.</p>
<p>The second interesting development is in Mobile GPU Computing. OpenCL has quickly become the most pervasive way to do GPU computing on mobile devices, including smartphones and tablets. Companies like ARM, Imagination Technologies, Freescale, Qualcomm, Samsung, and others are all enabling their mobile GPUs to run OpenCL codes. There are more mobile devices sold each year than there are PCs, so this is a huge community that is beginning to put its support behind OpenCL. At AccelerEyes, we have done several GPU consulting projects on mobile GPUs and are believers that there is big benefit to accelerating apps, especially computer vision and video processing apps, directly on the phone or tablet.</p>
<p>In scaling from Laptops to Single GPU machines, ArrayFire&#8217;s just-in-time compiler automatically makes optimizations for the GPU type, without any code change. In this sense, both the CUDA and OpenCL versions of ArrayFire enjoy scalability here.</p>
<p>From a single GPU machine to a multi-GPU machine, ArrayFire has a big advantage. The ArrayFire deviceset() function makes mutli-GPU computing super simple. No need to mess with synchronization issues. ArrayFire automatically manages memory and queues up all GPUs in your system with a full workflow, ensuring good resource utilization.</p>
<p>From multi-GPU machines to clusters, ArrayFire is the same as CUDA and OpenCL. There is not really any added benefit and users write and manage their own MPI code.</p>
<p>The third feature is Portability. This is perhaps the most reconizable difference between CUDA and OpenCL. CUDA only runs on NVIDIA GPUs, while OpenCL is the open industry standard and runs on AMD, Intel, NVIDIA, and other hardware devices.</p>
<p>With respect to CUDA, there was a recent announcement at NVIDIA&#8217;s GPU Technology Conference in Asia that said CUDA would become more open, and the press carried it as saying that CUDA would become open source. This is definitely a step that GPU programmers are happy to see. But it remains to be seen what this actually means. There are two comments that we&#8217;ll make on this announcement:</p>
<ol>
<li>From what we can tell, parts of the CUDA compiler will be open sourced to a limited number of groups. These groups will likely try to build compiler adaptations that enable CUDA code to run on other devices, if you use their compilers. From the announcement, it appears that the CUDA libraries, like CUBLAS and CUFFT, will not be open sourced. This is a critical disctinction, because as we&#8217;ll show later on, libraries are key.</li>
<li>Creating a compiler that can automatically generate tuned code for various hardware devices with very different architectures is extremely difficult. These kinds of projects continue to remain popular in hardcore academic research, but have yet to mature their way into actually widespread utility. Ocelot is an example research project in this category.</li>
</ol>
<p>Also, with respect to portability, CUDA does not provide CPU fallback. Currently, developers using CUDA typically put if-statements in their code that distinguish between the presense or absense of a GPU device at runtime. In contract, with OpenCL, CPU fallback is supported and makes code maintenance much easier.</p>
<p>ArrayFire is fully portable. The same ArrayFire code runs on CUDA or OpenCL. The only difference is the version of the ArrayFire library that you link against in your code.</p>
<p>The main caveat here is that today, the OpenCL version of ArrayFire only supports a subset of the functionality available in the CUDA version. This is due to the fact that our CUDA code base has been around much longer. It is also due to the fact that there is less of an OpenCL software ecosystem, as we&#8217;ll discuss next.</p>
<p>The fourth feature is Community. This is the feature that encompasses support, longevity, commitment, etc. As those things are hard to measure, we put together a proxy. It is interesting to look at the number of Forum topics on NVIDIA&#8217;s CUDA forms at nearly 27 thousand and AMD&#8217;s OpenCL forums at 4 thousand. Also, on a neutral 3rd party site stackoverflow (btw if you don&#8217;t you stackoverflow, you should) has tags for CUDA and OpenCL, with the number of CUDA tags being over 3X the number of OpenCL tags. As you would expect, there are many more people doing CUDA programming today due to the great investmenet NVIDIA has put into building the ecosystem for GPU computing.</p>
<p>With respect to AccelerEyes, we have over 14 hundred GPU topics on our forums, which is the largest community of GPU programmers supported by any software company. The next largest is the veteran HPC company, PGI, which has 485 topics on their GPU forums.</p>
<p>Community also has to do with ecosystem and other tools. We will cover a discussion of those as they relate to libraries in a moment.</p>
<p>The fifth and final feature is Programmability. Both CUDA and OpenCL are low-level. It is time consuming to do GPU kernel development in either of those platforms. The bulk of that time is often spent in redesigning algorithms to exploit data-parallelism.</p>
<p>This is why the entire GPU computing market has lately shifted a major focus towards programmability.</p>
<p>To understand the landscape, let&#8217;s look at this simple two-by-two, where we have Faster vs Slower technologies on the y-axis, and time-consuming vs easy-to-use technologies on the x-axis. As a baseline, you can consider SSE or AVX instructions on the CPU as something that is time consuming to write and won&#8217;t end up giving you the data-parallel performance that you can expect out of a GPU.</p>
<p>Writing GPU kernels in CUDA or OpenCL leads to much faster code, but is likewise very time-consuming to develop.</p>
<p>In the opposite corner, compiler directives have recent become popular. The claim of these is that you can sprinkle a few pragmas into your code and that the compiler will figure out how to get the code to run well on the GPU. While in some simple cases you might get a little benefit, there is simply no compiler today that is capable of automatically generating good, fast code for GPUs from standard serial CPU code. Compilers simply can&#8217;t figure out how to morph serial algorithms into data-parallel algorithms.</p>
<p>Which is why libraries are so key to GPU computing. In a library, you get access to a set of functions that have already been hand-optimized and tuned to exploit data-parallelism. Libraries include within them the benefits of speed that come from writing kernels. But they are also written with ease-of-use in mind and merely require a similar level of intrusion as is required by the compiler directive. This is why ArrayFire has been and continues to be so successful.</p>
<p>Libraries really make all the difference in GPU computing. To compare and contract CUDA versus OpenCL, it is important to look at the comparison of libraries available. Raw math libraries available in CUDA include CUBLAS, CUFFT, CULA, and Magma. These are pretty much complete providing the majority of all routines necessary for dense matrix operations.</p>
<p>CUDA also has CUSPARSE which is a good start for sparse linear algebra routines, but still needs to mature.</p>
<p>CUDA libraries only run on NVIDIA GPUs. NVIDIA does not provide libraries for OpenCL.</p>
<p>Raw math libraries in AMD&#8217;s OpenCL have matured a lot recently. With clAmdBlas and clAmdFft, you get most of the important Blas routines, radix 2, 3, and 5 FFT routines (which covers the most common cases). There is no LAPACK function support and there are no sparse data support libraries.</p>
<p>But a very important point is that AMD&#8217;s libraries run not only on AMD devices, but on all OpenCL-compliant devices, including NVIDIA GPUs.</p>
<p>Due to these developments at AMD, AccelerEyes is proud to have recently supported OpenCL in both our Jacket product (which applies to MATLAB code) and our ArrayFire product (which applies to C/C++, Fortran, and Python).</p>
<p>Our OpenCL support is new and not nearly as mature as our support of CUDA. But our initial OpenCL support is better than our initial CUDA support was when we first launched our CUDA products. And we expect OpenCL to continue to mature rapidly in the near future.</p>
<p>This concludes the slide portion of the presentation. In what follows, we will spend some time showing benchmarks of OpenCL versus CUDA in ArrayFire code, with a particular focus on the raw math libraries we just discussed.</p>
<p>Checkout the video below for the benchmarks and Q/A session.  Enjoy!</p>
<h3>Video</h3>
<p><center><iframe src="http://www.youtube.com/embed/Pp8tTIIfVvU" frameborder="0" width="560" height="315"></iframe></center></p>
<div class="shr-publisher-2227"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F02%2F17%2Fopencl_vs_cuda_webinar_recap%2F' data-shr_title='ArrayFire+Webinar+Recap+-+OpenCL+vs+CUDA+Comparisons'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/KuyJeaoQqUo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/</feedburner:origLink></item>
		<item>
		<title>ArrayFire Support for CUDA 4.1</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/RyUZELMdbKw/</link>
		<comments>http://blog.accelereyes.com/blog/2012/02/15/arrayfire-support-for-cuda-4-1/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 21:58:52 +0000</pubDate>
		<dc:creator>melonakos</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[4.1]]></category>
		<category><![CDATA[ArrayFire]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[Jacket]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2200</guid>
		<description><![CDATA[The question above comes from María (@turbonegra).  She follows us @accelereyes.  Many of you are wondering when ArrayFire support for new CUDA version 4.1 will be released.  The answer: work is currently under way. CUDA 4.1 includes a new Fermi compiler, and many people in the GPU ecosystem have reported slowdowns from upgrading to the [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/maria_tweet1.png"><img class="aligncenter size-full wp-image-2202" title="maria_tweet" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/maria_tweet1.png" alt="" width="525" height="76" /></a></p>
<p>The question above comes from María (<a href="http://twitter.com/turbonegra">@turbonegra</a>).  She follows us <a href="http://twitter.com/accelereyes">@accelereyes</a>.  Many of you are wondering when ArrayFire support for new CUDA version 4.1 will be released.  The answer: work is currently under way.</p>
<p>CUDA 4.1 includes a <a href="http://www.tomshardware.com/news/nvidia-cuda-gpu-developer-llvm,14579.html">new Fermi compiler</a>, and many people in the GPU ecosystem have <a href="http://forums.nvidia.com/index.php?showtopic=222547">reported slowdowns</a> from upgrading to the new CUDA version. So we&#8217;ve delayed releasing ArrayFire and Jacket support for CUDA 4.1 because we want to verify performance and reliability across all our unit tests, performance regressions, and customer code samples.  Our tests sweep across various driver versions and everything from mobile GeForce cards through server-grade Tesla and Fermi chips.</p>
<p>We are still working through the testing and verification at the moment. While <a href="http://developer.download.nvidia.com/CUDA/training/CUDA_4_1_Webinar_v11-11-22.pdf">NVIDIA&#8217;s general claim of 10% performance benefit</a> are a bit fluffy, things do look <strong>good</strong> overall, and we believe CUDA 4.1 is ready for prime time. Various FFT, BLAS, and LAPACK routines show solid performance enhancements. In the few places where we have observed a performance degradation (e.g. linear texturing), we are working closely with NVIDIA to ensure these are addressed in coming driver and toolkit releases.</p>
<p>We&#8217;ll post here and on <a href="http://twitter.com/accelereyes">twitter</a> when CUDA 4.1 support ready in ArrayFire and Jacket. Stay tuned!</p>
<p>&nbsp;</p>
<p>PS:  Thanks to María for prompting this post <img src='http://blog.accelereyes.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<div class="shr-publisher-2200"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F02%2F15%2Farrayfire-support-for-cuda-4-1%2F' data-shr_title='ArrayFire+Support+for+CUDA+4.1'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/RyUZELMdbKw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/02/15/arrayfire-support-for-cuda-4-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/02/15/arrayfire-support-for-cuda-4-1/</feedburner:origLink></item>
		<item>
		<title>AccelerEyes Webinar Video – Medical Image Segmentation</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/gcTybajZ8ak/</link>
		<comments>http://blog.accelereyes.com/blog/2012/01/19/accelereyes-webinar-video-medica/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 21:59:37 +0000</pubDate>
		<dc:creator>vishy</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[Videos]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2173</guid>
		<description><![CDATA[In case you missed it, we recently held a webinar on how to accelerate common medical imaging applications using an easy, powerful programming library with Jacket for MATLAB®. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of Jacket and ArrayFire, while interacting with AccelerEyes [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>In case you missed it, we recently held a webinar on how to accelerate common medical imaging applications using an easy, powerful programming library with Jacket for MATLAB®.</p>
<p>This webinar was part of an ongoing <a href="http://blog.accelereyes.com/blog/2012/01/12/accelereyes_webinars_2012q1/">series of webinars</a> that will help you learn more about the many applications of Jacket and ArrayFire, while interacting with AccelerEyes GPU computing experts.  Gallagher Pryor, CTO of AccelerEyes, used the Bayesian Image Segmentation algorithm as a simple use-case to show how easy it is to convert CPU code to GPU code with Jacket (only 4 lines of CPU code needed to be changed!).</p>
<p>For those of you who missed it, we uploaded the webinar on Youtube. We hope to see you at <a href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes">the next one</a>!<br />
<center><iframe align="center" src="http://www.youtube.com/embed/yWaibjgdOEg" frameborder="0" width="420" height="315"></iframe></center></p>
<div class="shr-publisher-2173"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F01%2F19%2Faccelereyes-webinar-video-medica%2F' data-shr_title='AccelerEyes+Webinar+Video+-+Medical+Image+Segmentation'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/gcTybajZ8ak" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/01/19/accelereyes-webinar-video-medica/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/01/19/accelereyes-webinar-video-medica/</feedburner:origLink></item>
		<item>
		<title>Jacket over Remote Desktop for Tesla and Quadro GPUs</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/kII0CjucuOM/</link>
		<comments>http://blog.accelereyes.com/blog/2012/01/17/jacket-over-remote-desktop-for-tesla-and-quadro-gpus/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 22:19:30 +0000</pubDate>
		<dc:creator>vishy</dc:creator>
				<category><![CDATA[CUDA]]></category>
		<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[Parallel computing]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2166</guid>
		<description><![CDATA[We recently reported that Jacket could be used over Windows Remote Desktop connections as long as you had an NVIDIA Tesla device in TCC mode. With the latest NVIDIA driver updates, Tesla and Quadro devices can be put into TCC mode, making it possible to use Jacket over Remote Desktop with both Tesla and Quadro [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p style="text-align: justify;">We recently reported that Jacket <a title="CUDA over Remote Desktop now available for Tesla GPUs" href="http://blog.accelereyes.com/blog/2011/02/10/cuda_remote_desktop_for_tesla_gpus/">could be used over</a> Windows Remote Desktop connections as long as you had an NVIDIA Tesla device in TCC mode. With the latest NVIDIA driver updates, Tesla <em>and</em> Quadro devices can be put into TCC mode, making it possible to use Jacket over Remote Desktop with both Tesla <em>and</em> Quadro devices.</p>
<p style="text-align: justify;">We have tested this out with the NVIDIA Quadro 4000 as well as Quadro 6000 GPUs. The system had a Tesla C2050 connected to the display, and the Quadro in TCC mode. Here&#8217;s the ginfo output:</p>
<pre>&gt;&gt; ginfo
Jacket v2.0 (build 80c7ba4) by AccelerEyes (64-bit Windows)
License Type: Designated Computer ([JACKET_ROOT]\jacket\engine\jlicense.dat)
Addons: MGL4, JMC, SDK, DLA, SLA
CUDA toolkit 4.0, driver 285.62
GPU1 Quadro 4000, 2048 MB, Compute 2.0 (single,double)
Memory Usage: 1977 MB free (2048 MB total)</pre>
<p style="text-align: justify;">Jacket over Remote Desktop is documented extensively on the <a title="AccelerEyes Wiki - Jacket Over Remote Connections" href="http://wiki.accelereyes.com/wiki/index.php/Jacket_Over_Remote_Connections">AccelerEyes Wiki</a>. Please check that page for more information.</p>
<p style="text-align: justify;">
<div class="shr-publisher-2166"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F01%2F17%2Fjacket-over-remote-desktop-for-tesla-and-quadro-gpus%2F' data-shr_title='Jacket+over+Remote+Desktop+for+Tesla+and+Quadro+GPUs'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/kII0CjucuOM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/01/17/jacket-over-remote-desktop-for-tesla-and-quadro-gpus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/01/17/jacket-over-remote-desktop-for-tesla-and-quadro-gpus/</feedburner:origLink></item>
		<item>
		<title>AccelerEyes Webinar Series</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/bpu1dEeBFQs/</link>
		<comments>http://blog.accelereyes.com/blog/2012/01/12/accelereyes_webinars_2012q1/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 15:51:10 +0000</pubDate>
		<dc:creator>scott</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[announcement]]></category>
		<category><![CDATA[ArrayFire]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[Jacket]]></category>
		<category><![CDATA[matlab]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2133</guid>
		<description><![CDATA[AccelerEyes invites you to participate in series of webinars designed to help you learn more about Jacket for MATLAB® and ArrayFire for C/C++/Fortran/Python, a comprehensive library of GPU-accelerated functions. GPU Programming for Medical Image Segmentation: January 18, 2012 at 3:00 p.m. EST There&#8217;s a huge volume of data generated using acquisition modalities like computer tomography (CT), magnetic resonance imaging (MRI), positron emission tomography or [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>AccelerEyes invites you to participate in series of <a title="Register for Webinar Series" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?nomenu=true&amp;siteurl=accelereyes&amp;service=6&amp;rnd=0.4385461764274333&amp;main_url=https%3A%2F%2Faccelereyes.webex.com%2Fec0605ld%2Feventcenter%2Fprogram%2FprogramDetail.do%3FtheAction%3Ddetail%26siteurl%3Daccelereyes%26cProgViewID%3D0" target="_blank">webinars</a> designed to help you learn more about <a title="Learn about Jacket for MATLAB" href="http://www.accelereyes.com/products/jacket" target="_blank">Jacket</a> for MATLAB® and <a title="Learn about ArrayFire" href="http://www.accelereyes.com/products/arrayfire" target="_blank">ArrayFire</a> for C/C++/Fortran/Python, a comprehensive library of GPU-accelerated functions.</p>
<p><strong>GPU Programming for Medical Image Segmentation: <a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">January </a></strong><strong><a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">18, 2012</a> </strong><strong>at 3:00 p.m. EST</strong></p>
<p style="text-align: justify;"><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/01/brainimagesm.jpg"><img class="alignright  wp-image-2054" title="brainimagesm" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/01/brainimagesm-225x300.jpg" alt="" width="203" height="270" /></a>There&#8217;s a huge volume of data generated using acquisition modalities like computer tomography (CT), magnetic resonance imaging (MRI), positron emission tomography or nuclear medicine. A common need is to manipulate and transmit this data using compression techniques in as little time as possible. During this webinar we will show Jacket’s superior speed and handling volumes from subscripting to convolutions.  Come and learn how to accelerate common medical imaging applications using an easy, powerful programming library with Jacket for MATLAB®.</p>
<p><strong>OpenCL and CUDA Trade-Offs and Comparison: <a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">February 15</a></strong><strong><a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">, 2012</a></strong><strong> at 3:00 p.m. EST</strong></p>
<p style="text-align: justify;"><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/raindrop.png"><img class="alignleft size-medium wp-image-2040" title="raindrop" src="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/raindrop-300x140.png" alt="" width="300" height="140" /></a>The OpenCL standard continues to mature and is now (or soon will be) supported by a variety of GPUs and manycore processors. At AccelerEyes, we remain at the forefront of OpenCL development. ArrayFire OpenCL is a fast software library for GPU computing with a simple API.  In this informative webinar, our team of GPU experts will discuss OpenCL and CUDA trade-offs and comparisons.  In addition, you&#8217;ll get to see ArrayFire OpenCL in action with real code.</p>
<p><strong>GPU Programming for Financial Computing:  <a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">March 15</a><a title="Register for Webinar" href="https://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">, 2012</a> at 3:00 p.m. EST</strong></p>
<p style="text-align: justify;">Quantitative analysts are discovering the benefits of leveraging GPUs in tackling complex financial computing models. Using Jacket&#8217;s computational horsepower, analysts can employ a variety of functions to achieve speedups in trade signal generation, complex derivative pricing, evaluating risk scenarios, and more. In this webinar, we&#8217;ll discuss the latest developments in GPU programming for financial computing.</p>
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2012/01/financial_modeling-300x225.jpg"><img class="aligncenter size-full wp-image-2069" title="financial_modeling-300x225" src="http://blog.accelereyes.com/blog/wp-content/uploads/2012/01/financial_modeling-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p style="text-align: justify;">Each webinar will be conducted by AccelerEyes’ team of GPU computing experts and will include live demos of Jacket and ArrayFire.   We hope you will <a title="Register for Webinar" href="http://accelereyes.webex.com/mw0306ld/mywebex/default.do?siteurl=accelereyes" target="_blank">join us</a> as we discuss exciting developments in GPU computing software!</p>
<div class="shr-publisher-2133"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2012%2F01%2F12%2Faccelereyes_webinars_2012q1%2F' data-shr_title='AccelerEyes+Webinar+Series'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/bpu1dEeBFQs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2012/01/12/accelereyes_webinars_2012q1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2012/01/12/accelereyes_webinars_2012q1/</feedburner:origLink></item>
		<item>
		<title>GPU Computing with Python</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/60KJqvzCyI4/</link>
		<comments>http://blog.accelereyes.com/blog/2011/12/15/gpu-computing-with-python/#comments</comments>
		<pubDate>Thu, 15 Dec 2011 23:11:28 +0000</pubDate>
		<dc:creator>pavan</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[ArrayFire]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=2009</guid>
		<description><![CDATA[One of the biggest areas where GPUs are providing benefit is with scientific computing. With libraries like Sage and SciPy providing a huge collection of functions and algorithms for free, Python has become one of the favorite tools for developers around the world. Even though these libraries have C/C++ back-ends, performance on large problems quickly [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p>One of the biggest areas where GPUs are providing benefit is with scientific computing. With libraries like <a title="Sage" href="http://www.sagemath.org/">Sage</a> and <a title="SciPy" href="http://www.scipy.org/">SciPy</a> providing a huge collection of functions and algorithms for free, Python has become one of the favorite tools for developers around the world. Even though these libraries have C/C++ back-ends, performance on large problems quickly becomes an issue and can kill productivity.</p>
<p>On the heals of our free release of <a title="ArrayFire" href="http://www.accelereyes.com/products/arrayfire">Arrayfire C/C++</a>, we&#8217;re excited to release <a href="http://www.accelereyes.com/arrayfire_cuda/afpy.html">ArrayFire Python</a>. All of this is <strong>FREE</strong> for most users (see below for clarification)!</p>
<p>The structure of ArrayFire/Python is loosely based on <a title="NumPy" href="http://numpy.scipy.org/">NumPy</a> in that it uses a single <tt>array</tt> object that can contain multiple data types. You can convert NumPy arrays to ArrayFire arrays and vice versa. If you already have your application using NumPy arrays, this is a quick way to jump in and tweak critical sections.</p>
<pre lang="python">import numpy as np
import arrayfire as af
a = np.random.rand(5,5)
b = af.array(a)
c = b.host() # c is the same as a</pre>
<p>Alternatively, you can generate data on the device:</p>
<pre lang="python">r = af.randu(5, 5)
o = af.ones(5,5)
z = af.zeros(5,5)</pre>
<p>Once you have the data you need, you can utilize <a href="http://www.accelereyes.com/arrayfire_cuda/afpy.html">hundreds of functions</a> to convert your code entirely onto the GPU. You&#8217;ll find much of the API follows directly from <a href="http://www.scipy.org/Numpy_Functions_by_Category">NumPy</a> itself.</p>
<div class="wp-caption aligncenter" style="width: 600px">
	<a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/raindrop1.png"><img title="Raindrop Example" src="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/raindrop1-1024x478.png" alt="" width="600" height="300" /></a>
	<p class="wp-caption-text">Screenshot taken while running raindrop example</p>
</div>
<p>Jumping right in, here is an example showing the Monte-Carlo calculation of pi</p>
<pre lang="python">from arrayfire import *
def pi(samples=20000000):
    x = randu(samples, 1)
    y = randu(samples, 1)
    return 4 * sum(mul(x, x) + mul(y, y) &lt; 1) / samples</pre>
<p>You can visit our website to <a href="http://www.accelereyes.com/products/arrayfire">download</a> the latest version of ArrayFire. You can find the Python wrapper in <tt>arrayfire/python</tt> directory. Installation instructions are in <tt>README</tt>, and a few examples are included that show off both compute and visualizations.</p>
<p>Our <a href="http://forums.accelereyes.com/forums/viewforum.php?f=17">Forums</a> are the best place to get the latest info and help.</p>
<p>AccelerEyes provides this software for free in the hope that some of you might be interested in hiring us to port your code to the GPU.  If that is interesting, <a href="https://www.accelereyes.com/company/contact_us">let us know</a>!</p>
<p>* ArrayFire is free for use on a single GPU.  To run ArrayFire on larger hardware systems, contact <a href="mailto:sales@accelereyes.com">sales@accelereyes.com</a>.</p>
<div class="shr-publisher-2009"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2011%2F12%2F15%2Fgpu-computing-with-python%2F' data-shr_title='GPU+Computing+with+Python'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/60KJqvzCyI4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2011/12/15/gpu-computing-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2011/12/15/gpu-computing-with-python/</feedburner:origLink></item>
		<item>
		<title>Jacket v2.0 Now Available</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/cEK8Tp2cJGY/</link>
		<comments>http://blog.accelereyes.com/blog/2011/12/08/jacket-version-2-0/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 19:03:23 +0000</pubDate>
		<dc:creator>scott</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Jacket]]></category>
		<category><![CDATA[matlab]]></category>
		<category><![CDATA[parallel computing toolbox]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=1984</guid>
		<description><![CDATA[New Multi-GPU functionality , added support for OpenCL devices, and much more&#8230; AccelerEyes announces the release of Jacket version 2.0, adding GPU computing capabilities for use with MATLAB®.  Version 2.0 delivers even more speed through a host of new improvements, maximizing GPU device performance and utilization. Notable new features include a multi-GPU interface and support [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p style="text-align: left;" align="center"><strong><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/jacket_logo.png"><img class="alignright" title="jacket_logo" src="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/jacket_logo-300x235.png" alt="" width="240" height="188" /></a></strong></p>
<p style="text-align: left;" align="center"><strong>New Multi-GPU functionality <strong>, added support for OpenCL devices, and much more&#8230;</strong></strong></p>
<p style="text-align: left;" align="center"><strong><strong></strong></strong>AccelerEyes announces the release of Jacket version 2.0, adding GPU computing capabilities for use with MATLAB®.  Version 2.0 delivers even more speed through a host of new improvements, maximizing GPU device performance and utilization.</p>
<p>Notable new features include a multi-GPU interface and support for OpenCL devices. With Jacket v2.0, your M-code is now portable across all major GPU devices, including AMD/ATI, Intel, and NVIDIA chips.</p>
<p><a title="About Jacket" href="http://www.accelereyes.com/products/jacket">Jacket</a> is the premier GPU software plugin for MATLAB®, <a title="Compare Jacket" href="http://www.accelereyes.com/products/compare">better</a> than alternative solutions.  It is relied upon by thousands of organizations for rapid prototyping and problem solving across a range of government, manufacturing, energy, media, biomedical, financial, and scientific research applications.</p>
<p><strong>Multi-GPU Details:</strong></p>
<ul>
<li>Control over all GPUs in your program through simple, fast GPU selection functions.  Jacket automatically handles communication between the GPU devices, without the need to launch bulky parallel computing workers</li>
<li>GINFO, GSELECT, GSYNC all extended to handle multiple devices</li>
</ul>
<p><strong>OpenCL Details:</strong></p>
<ul>
<li>Supports single precision, floating point, real, and complex types</li>
<li>Supports array math, FFTs, element-wise operations, and more</li>
<li>Selection of any OpenCL compliant device listed in GINFO</li>
<li>Currently available as a FREE beta feature, <a title="Download Jacket" href="https://accelereyes.com/licenses_jacket">download now</a></li>
</ul>
<p><strong>Other Notable Improvements:</strong></p>
<ul>
<li>New Base Jacket Functions, such as MEDIAN and PROD</li>
<li>Additional Image Processing Library functions</li>
<li>Additional Statistics Library functions</li>
<li>Additional Signal Processing Library functions</li>
<li>Support for binary to decimal conversion with BI2DE, DE2BI</li>
<li>New Demos (included in every download):
<ul>
<li>Defense Optical Flow Tracking example, Music Visualizer example, and new Jacket CPU v. GPU demo</li>
</ul>
</li>
<li>Financial example of Black-Scholes with GCOMPILE is 35X faster than CPU<a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/12/jacket_logo.png"><br />
</a></li>
</ul>
<p>Visit our <a href="http://www.accelereyes.com/">company website</a> and see the <a href="http://wiki.accelereyes.com/wiki/index.php/Release_Notes">v2.0 release notes</a> for the full list of enhancements.</p>
<p><strong>Pricing and Availability</strong></p>
<p>Jacket v2.0 is available for download on the AccelerEyes website.  Pricing for a Jacket base license with support for a single GPU is $999.00 USD for commercial and $350.00 USD for academic customers.  AccelerEyes provides 12 months of software maintenance and updates with each software license.  Volume packages and development bundles are also <a href="http://www.accelereyes.com/purchase/special_offers"><span style="color: #0000ff;">now available</span></a> at special price points.</p>
<p><strong>Try our Professional Services</strong></p>
<p>AccelerEyes provides professional GPU consulting services.  Our team of engineers guarantees great results from GPU computing.  Equipped with Jacket and years of experience, our experts deliver results in fewer hours than any other consulting firms.  Set up a <a href="mailto:support@accelereyes.com?subject=FREE%20GPU%20Computing%20Consultation&amp;body=I%20would%20like%20to%20request%20a%20FREE%20GPU%20computing%20consultation%20session%20with%20one%20of%20your%20GPU%20experts.%20%20I%20am%20available%20for%20a%20phone%20call%20during%20the%20following%20times%3A%0A%0A%3Clist%20available%20times%3E%0A">free GPU consultation</a> today.</p>
<div class="shr-publisher-1984"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2011%2F12%2F08%2Fjacket-version-2-0%2F' data-shr_title='Jacket+v2.0+Now+Available'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/cEK8Tp2cJGY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2011/12/08/jacket-version-2-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2011/12/08/jacket-version-2-0/</feedburner:origLink></item>
		<item>
		<title>Jacket on Lenovo Systems</title>
		<link>http://feedproxy.google.com/~r/gpu_software/~3/eKqTejSYeGQ/</link>
		<comments>http://blog.accelereyes.com/blog/2011/11/23/jacket-on-lenovo-systems/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 21:47:52 +0000</pubDate>
		<dc:creator>scott</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[MATLAB®]]></category>
		<category><![CDATA[Parallel computing]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Jacket]]></category>
		<category><![CDATA[Lenovo]]></category>
		<category><![CDATA[matlab]]></category>

		<guid isPermaLink="false">http://blog.accelereyes.com/blog/?p=1891</guid>
		<description><![CDATA[Lenovo and AccelerEyes have a joint solution for optimizing M code on Lenovo workstations.  The combined HPC solution combines high Intel Xeon CPU performance for daily productivity with unprecedented NVIDIA graphics (GPU) performance for parallel computing with Jacket. Jacket’s comprehensive benchmark suite, when run on Lenovo ThinkStation systems, shows tremendous amounts of speedups for a [...]]]></description>
			<content:encoded><![CDATA[<p></p><!-- Start Shareaholic LikeButtonSetTop Automatic --><!-- End Shareaholic LikeButtonSetTop Automatic --><p style="text-align: justify;">Lenovo and AccelerEyes have a joint solution for optimizing M code on Lenovo workstations.  The combined HPC solution combines high Intel Xeon CPU performance for daily productivity with unprecedented NVIDIA graphics (GPU) performance for parallel computing with Jacket. Jacket’s comprehensive benchmark suite, when run on Lenovo ThinkStation systems, shows tremendous amounts of speedups for a wide variety of computationally-intensive applications.</p>
<div class="mceTemp">
<div class="mceTemp">
<p><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/11/Lenovo-ThinkStations.png"><img class="aligncenter size-full wp-image-1901" title="Lenovo ThinkStations" src="http://blog.accelereyes.com/blog/wp-content/uploads/2011/11/Lenovo-ThinkStations.png" alt="" width="682" height="312" /></a></p>
<p style="text-align: justify;">Jacket is the world’s fastest and broadest GPU software accelerating the M-language commonly found in MATLAB®.  Thousands of customers around the world have used Jacket to accelerate their MATLAB code.</p>
<p style="text-align: justify;">Lenovo ThinkStation systems are ideally suited for running real-world high-performance applications using Jacket. While the high-end CPUs are ideal for daily productivity tasks, Jacket and the Quadro GPUs perform HPC operations with ease.</p>
<p style="text-align: justify;">To demonstrate the value gained by upgrading to a ThinkStation with an NVIDIA Quadro, benchmarks were run on the E20, S20 and D20 systems with Jacket and a variety of GPUs. We combined each of the three systems with three different GPUs in a good-better-best configuration, to create 9 different hardware test environments for the Jacket benchmark suite.</p>
<p style="text-align: justify;">The resulting speed-ups achieved over the baseline system show tremendous speed advantages that get wider as the configuration gets better.  It is worth noting that with the Jacket MGL add-on, you can run code on multiple GPUs on the same machine. We observed a performance boost of up to 90% with each additional GPU added to the system.</p>
<p style="text-align: justify;"><a href="http://blog.accelereyes.com/blog/wp-content/uploads/2011/11/Lenovo-Measured-Speedups-e1320954198284.png"><img class="aligncenter size-full wp-image-1905" title="Lenovo Measured Speedups" src="http://blog.accelereyes.com/blog/wp-content/uploads/2011/11/Lenovo-Measured-Speedups-e1320954198284.png" alt="" width="600" height="348" /></a></p>
<p style="text-align: justify;">Jacket has a wide range of domain-specific library functions available for free. Functions for Image, Signal and Video Processing, statistics and graphics are included with the Jacket package. This allows domain professionals to get going right away without the added hassle of choosing which packages to buy.  Jacket combines high-level programmability in M-code with the ability to control the nuts and bolts. Using the Jacket SDK, you can create customized computational kernels for your domain-specific algorithms using the same code that many of Jacket’s functions are written in. Functions that use Jacket SDK plug in effortlessly to Jacket’s core and benefit from Jacket’s automated optimizations.  Jacket code is deployable to machines without a MATLAB or Jacket license. Using the Jacket JMC add-on, your code can be compiled either into an executable package or a library that can be linked into other programs.</p>
<p style="text-align: justify;">The Lenovo ThinkStation with Jacket is a high-performance, power-efficient advanced workstation HPC platform solution that brings supercomputing power to MATLAB users for a fraction of the cost. With its demonstrated ability to achieve high speedups across a variety of applications, Jacket for MATLAB will help you harness the ThinkStation’s full computing potential.</p>
<p style="text-align: justify;">
</div>
</div>
<div class="shr-publisher-1891"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='false' data-shr_href='http%3A%2F%2Fblog.accelereyes.com%2Fblog%2F2011%2F11%2F23%2Fjacket-on-lenovo-systems%2F' data-shr_title='Jacket+on+Lenovo+Systems'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetBottom Automatic --><img src="http://feeds.feedburner.com/~r/gpu_software/~4/eKqTejSYeGQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.accelereyes.com/blog/2011/11/23/jacket-on-lenovo-systems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.accelereyes.com/blog/2011/11/23/jacket-on-lenovo-systems/</feedburner:origLink></item>
	</channel>
</rss>

