IEEE Spectrum

Startup Says It Can Make a 100x Faster CPU

Dina Genkina — Fri, 20 Sep 2024 14:00:49 +0000

In an era of fast-evolving AI accelerators, general purpose CPUs don’t get a lot of love. “If you look at the CPU generation by generation, you see incremental improvements,” says Timo Valtonen, CEO and co-founder of Finland-based Flow Computing.

Valtonen’s goal is to put CPUs back in their rightful, ‘central’ role. In order to do that, he and his team are proposing a new paradigm. Instead of trying to speed up computation by putting 16 identical CPU cores into, say, a laptop, a manufacturer could put 4 standard CPU cores and 64 of Flow Computing’s so-called parallel processing unit (PPU) cores into the same footprint, and achieve up to 100 times better performance. Valtonen and his collaborators laid out their case at the IEEE Hot Chips conference in August.

The PPU provides a speed-up in cases where the computing task is parallelizable, but a traditional CPU isn’t well equipped to take advantage of that parallelism, yet offloading to something like a GPU would be too costly.

“Typically, we say, ‘okay, parallelization is only worthwhile if we have a large workload,’ because otherwise the overhead kills lot of our gains,” says Jörg Keller, professor and chair of parallelism and VLSI at FernUniversität in Hagen, Germany, who is not affiliated with Flow Computing. “And this now changes towards smaller workloads, which means that there are more places in the code where you can apply this parallelization.”

Computing tasks can roughly be broken up into two categories: sequential tasks, where each step depends on the outcome of a previous step, and parallel tasks, which can be done independently. Flow Computing CTO and co-founder Martti Forsell says a single architecture cannot be optimized for both types of tasks. So, the idea is to have separate units that are optimized for each type of task.

“When we have a sequential workload as part of the code, then the CPU part will execute it. And when it comes to parallel parts, then the CPU will assign that part to PPU. Then we have the best of both words,” Forsell says.

According to Forsell, there are four main requirements for a computer architecture that’s optimized for parallelism: tolerating memory latency, which means finding ways to not just sit idle while the next piece of data is being loaded from memory; sufficient bandwidth for communication between so-called threads, chains of processor instructions that are running in parallel; efficient synchronization, which means making sure the parallel parts of the code execute in the correct order; and low-level parallelism, or the ability to use the multiple functional units that actually perform mathematical and logical operations simultaneously. For Flow Computing new approach, “we have redesigned, or started designing an architecture from scratch, from the beginning, for parallel computation,” Forsell says.

Any CPU can be potentially upgraded

To hide the latency of memory access, the PPU implements multi-threading: when each thread calls to memory, another thread can start running while the first thread waits for a response. To optimize bandwidth, the PPU is equipped with a flexible communication network, such that any functional unit can talk to any other one as needed, also allowing for low-level parallelism. To deal with synchronization delays, it utilizes a proprietary algorithm called wave synchronization that is claimed to be up to 10,000 times more efficient than traditional synchronization protocols.

To demonstrate the power of the PPU, Forsell and his collaborators built a proof-of-concept FPGA implementation of their design. The team says that the FPGA performed identically to their simulator, demonstrating that the PPU is functioning as expected. The team performed several comparison studies between their PPU design and existing CPUS. “Up to 100x [improvement] was reached in our preliminary performance comparisons assuming that there would be a silicon implementation of a Flow PPU running at the same speed as one of the compared commercial processors and using our microarchitecture,” Forsell says.

Now, the team is working on a compiler for their PPU, as well as looking for partners in the CPU production space. They are hoping that a large CPU manufacturer will be interested in their product, so that they could work on a co-design. Their PPU can be implemented with any instruction set architecture, so any CPU can be potentially upgraded.

“Now is really the time for this technology to go to market,” says Keller. “Because now we have the necessity of energy efficient computing in mobile devices, and at the same time, we have the need for high computational performance.”

Predicting Malicious Behavior on X Before It Happens

Michelle Hampson — Wed, 18 Sep 2024 14:00:03 +0000

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

The rise of social media use is impacting society—and not always in a good way, with increasing instances of malicious behavior online, such as coordinated campaigns to spread disinformation. To address this issue, a group of researchers in Europe created a new machine learning algorithm that can predict future malicious activity on X (formerly known as Twitter).

In their study, published 12 July in IEEE Transactions on Computational Social Systems, the researchers tested their model on three real-world datasets where malicious behavior took place—in China, Iran, and Russia. They found that the machine-learning model outperforms a conventional state-of-the-art prediction model by 40 percent.

Malicious behavior on social media can have profoundly negative effects, for example by spreading disinformation, discord, and hate. Rubén Sánchez-Corcuera, an engineering professor at the University of Deusto, in Spain, who was involved in the study, says he sees the need for social networks that allow people to communicate or stay informed without being subject to attacks.

“Personally, I believe that by reducing hate and idea induction that can occur through social networks, we can reduce the levels of polarization, hatred, and violence in society,” he says. “This can have a positive impact not only on digital platforms but also on people’s overall well-being.

This prompted him and his colleagues to develop their novel prediction model. They took an existing type of model named Jointly Optimizing Dynamics and Interactions for Embeddings (JODIE), which predicts future interactions on social media, and incorporating additional machine learning algorithms to predict if a user would be malicious over increments of time.

“This is achieved by applying a recurrent neural network that considers the user’s past interactions and the time elapsed between interactions,” explains Sánchez-Corcuera. “The model leverages time-sensitive features, making it highly suitable for environments where user behavior changes frequently.”

In their study, they used three different datasets comprising millions of tweets. The three datasets included 936 accounts linked to the People’s Republic of China that aimed to spur political unrest during the Hong Kong Protests in 2019; 1,666 Twitter accounts linked to the Iranian government, publishing biased tweets that favored Iran’s diplomatic and strategic perspectives on global news in 2019; and 1,152 Twitter accounts active in 2020 that were associated with a media website called Current Policy, which engages in state-backed political propaganda within Russia.

They found that their model was fairly accurate at predicting who would go on to engage in malicious behavior. For example, it was able to accurately predict 75 percent of malicious users by analyzing only 40 percent of interactions in the Iranian dataset. When they compared their model to another state-of-the-art prediction model, theirs outperformed it by 40 percent. Curiously, the results show that the new model was less accurate in identifying malicious users in the Russian dataset, although the reasons for this disparity in accuracy are unclear.

Sánchez-Corcuera says their approach to predicting malicious behavior on social media could apply to networks with text and comments, like X, but that applying it to multimedia-based networks like TikTok or Instagram may require a different approach.

Regardless of which platform these types of models are applied to, Sánchez-Corcuera sees value in them. “Creating a model that can predict malicious activities before they happen would allow for preventive action, protecting users and maintaining a safer and more constructive online space,” he says.

Neuromorphic Wires Amplify Their Own Signals

Katherine Bourzac — Mon, 16 Sep 2024 14:16:05 +0000

Taking inspiration from biology and direction from some very weird math, a team of engineers have made electric wires that amplify signals traveling along them. Without the help of amplifiers or other devices, signals carried on wires as long as 1 millimeter came out stronger than they went in. The team hopes these devices, which are analogous to the axons that carry signals from our nerve cells, will enable future engineers to completely rethink how computer chips are designed. This work was published on 11 September in the journal Nature.

In electrical engineering, “we just take it for granted that the signal decays” as it travels, says Timothy Brown, a postdoc in materials physics at Sandia National Lab who was part of the group of researchers who made the self-amplifying device. Even the best wires and chip interconnects put up resistance to the flow of electrons, degrading signal quality over even relatively small distances. This constrains chip designs—lossy interconnects are broken up into ever smaller lengths, and signals are bolstered by buffers and drivers. A 1-square-centimeter chip has about 10,000 repeaters to drive signals, estimates R. Stanley Williams, a professor of computer engineering at Texas A&M University.

Williams is one of the pioneers of neuromorphic computing, which takes inspiration from the nervous system. Axons, the electrical cables that carry signals from the body of a nerve cell to synapses where they connect with projections from other cells, are made up of electrically resistant materials. Yet they can carry high fidelity signals over long distances. The longest axons in the human body are about 1 meter, running from the base of the spine to the feet. Blue whales are thought to have 30 m long axons stretching to the tips of their tails. If something bites the whale’s tail, it will react rapidly. Even from 30 meters away, “the pulses arrive perfectly,” says Williams. “That’s something that doesn’t exist in electrical engineering.”

Signals on chip interconnects often must be amplified multiple times to make travel long distances.

That’s because axons are active transmission lines: they provide gain to the signal along their length. Williams says he started pondering how to mimic this in an inorganic system 12 years ago. A grant from the US Department of Energy enabled him to build a team with the necessary resources to make it happen. The team included Williams, Brown, and Suhas Kumar, a materials physicist at Sandia.

Axons are coated with an insulating layer called the myelin sheath. Where there are gaps in the sheath, negatively charged sodium ions and positively charged potassium ions can move in and out of the axon, changing the voltage across the cell membrane and pumping in energy in the process. Some of that energy gets taken up by the electrical signal, amplifying it.

Neuromorphic Devices at the Edge of Chaos

Williams and his team wanted to mimic this in a simple structure. They didn’t try to mimic all the physical structures in axons—instead, they sought guidance in a mathematical description of how they amplify signals. Axons operate in a mode called the “edge of chaos,” which combines stable and unstable qualities. This may seem inherently contradictory. Brown likens this kind of system to a saddle that’s curved with two dips. The saddle curves up towards the front and the back, keeping you stable as you rock back and forth. But if you get jostled from side to side, you’re more likely to fall off. When you’re riding in the saddle, you’re operating at the edge of chaos, in a semistable state. In the abstract space of electrical engineering, that jostling is equivalent to wiggles in current and voltage.

The mathematics behind the edge of chaos were worked out by Leon Chua, professor emeritus of electrical engineering and computer science at the University of California, Berkeley. (Chua also developed the theory behind the memristor, a device that holds a memory of the current that has passed through it; Williams was the first to build one.) The math tells us that this self amplifying, edge-of-chaos behavior should exist in materials with the right properties. But Williams and his team had to find the right material, one that could do what the math says is possible.

The team chose lanthanum colbalt oxide—it seemed to have the right properties, and they were able to obtain some. The key material property they sought was nonlinearity. This material has a nonlinear change in its resistance as a function of temperature. (Imagine a current versus voltage curve shaped like an “S”.) As current flows into the material, its temperature rises, and its resistance changes. Under the right set of conditions—when seated just right in the electrical saddle—this material should apply negative resistance. That is, it should amplify a signal.

Signals traveling down neuronal axons exhibit a self-amplifying behavior because they exist at a so-called edge of chaos (EOC) state. Researchers found a way to mimic that in the behavior of lanthanum cobalt oxide.

The team demonstrated this in a device is made up of a layer of LaCoO₃ with a 1 mm metal line on top of it. They biased the LaCoO₃ with a direct current, and passed an alternating current signal through the metal.

Williams says he almost fell out of his chair when he saw Brown’s oscilloscope measurements. Not only did the signal not degrade as it passed through this device—it came out the other side of the wire amplified by as much as 70 percent. “We’ve shown that edge of chaos is a property of materials—it’s physically real,” Williams says.

There’s a long way to go from this first experimental demonstration to a reimagining of computer chip interconnects. The team is providing samples for other researchers who want to verify their measurements. And they’re trying other materials to see how well they do—LaCoO₃ is only the first one they’ve tested.

Williams hopes this research will show electrical engineers new ideas about how to move forward. “The dream is to redesign chips,” he says. Electrical engineers have long known about nonlinear dynamics, but have hardly ever taken advantage of them, Williams says. “This requires thinking about things and doing measurements differently than they have been done for 50 years,” he says.

Challengers Are Coming for Nvidia’s Crown

Matthew S. Smith — Mon, 16 Sep 2024 14:00:03 +0000

It’s hard to overstate Nvidia’s AI dominance. Founded in 1993, Nvidia first made its mark in the then-new field of graphics processing units (GPUs) for personal computers. But it’s the company’s AI chips, not PC graphics hardware, that vaulted Nvidia into the ranks of the world’s most valuable companies. It turns out that Nvidia’s GPUs are also excellent for AI. As a result, its stock is more than 15 times as valuable as it was at the start of 2020; revenues have ballooned from roughly US $12 billion in its 2019 fiscal year to $60 billion in 2024; and the AI powerhouse’s leading-edge chips are as scarce, and desired, as water in a desert.

Access to GPUs “has become so much of a worry for AI researchers, that the researchers think about this on a day-to-day basis. Because otherwise they can’t have fun, even if they have the best model,” says Jennifer Prendki, head of AI data at Google DeepMind. Prendki is less reliant on Nvidia than most, as Google has its own homespun AI infrastructure. But other tech giants, like Microsoft and Amazon, are among Nvidia’s biggest customers, and continue to buy its GPUs as quickly as they’re produced. Exactly who gets them and why is the subject of an antitrust investigation by the U.S. Department of Justice, according to press reports.

Nvidia’s AI dominance, like the explosion of machine learning itself, is a recent turn of events. But it’s rooted in the company’s decades-long effort to establish GPUs as general computing hardware that’s useful for many tasks besides rendering graphics. That effort spans not only the company’s GPU architecture, which evolved to include “tensor cores” adept at accelerating AI workloads, but also, critically, its software platform, called Cuda, to help developers take advantage of the hardware.

“They made sure every computer-science major coming out of university is trained up and knows how to program CUDA,” says Matt Kimball, principal data-center analyst at Moor Insights & Strategy. “They provide the tooling and the training, and they spend a lot of money on research.”

Released in 2006, CUDA helps developers use an Nvidia GPU’s many cores. That’s proved essential for accelerating highly parallelized compute tasks, including modern generative AI. Nvidia’s success in building the CUDA ecosystem makes its hardware the path of least resistance for AI development. Nvidia chips might be in short supply, but the only thing more difficult to find than AI hardware is experienced AI developers—and many are familiar with CUDA.

That gives Nvidia a deep, broad moat with which to defend its business, but that doesn’t mean it lacks competitors ready to storm the castle, and their tactics vary widely. While decades-old companies like Advanced Micro Devices (AMD) and Intel are looking to use their own GPUs to rival Nvidia, upstarts like Cerebras and SambaNova have developed radical chip architectures that drastically improve the efficiency of generative AI training and inference. These are the competitors most likely to challenge Nvidia.

AMD: The other GPU maker

Pro: AMD GPUs are convincing Nvidia alternatives

Con: Software ecosystem can’t rival Nvidia’s CUDA

AMD has battled Nvidia in the graphics-chip arena for nearly two decades. It’s been, at times, a lopsided fight. When it comes to graphics, AMD’s GPUs have rarely beaten Nvidia’s in sales or mindshare. Still, AMD’s hardware has its strengths. The company’s broad GPU portfolio extends from integrated graphics for laptops to AI-focused data-center GPUs with over 150 billion transistors. The company was also an early supporter and adopter of high-bandwidth memory (HBM), a form of memory that’s now essential to the world’s most advanced GPUs.

“If you look at the hardware…it stacks up favorably” to Nvidia, says Kimball, referring to AMD’s Instinct MI325X, a competitor of Nvidia’s H100. “AMD did a fantastic job laying that chip out.”

The MI325X, slated to launch by the end of the year, has over 150 billion transistors and 288 gigabytes of high-bandwidth memory, though real-world results remain to be seen. The MI325X’s predecessor, the MI300X, earned praise from Microsoft, which deploys AMD hardware, including the MI300X, to handle some ChatGPT 3.5 and 4 services. Meta and Dell have also deployed the MI300X, and Meta used the chips in parts of the development of its latest large language model, Llama 3.1.

There’s still a hurdle for AMD to leap: software. AMD offers an open-source platform, ROCm, to help developers program its GPUs, but it’s less popular than CUDA. AMD is aware of this weakness, and in July 2024, it agreed to buy Europe’s largest private AI lab, Silo AI, which has experience doing large-scale AI training using ROCm and AMD hardware. AMD has also plans to purchase ZT Systems, a company with expertise in data-center infrastructure, to help the company serve customers looking to deploy its hardware at scale. Building a rival to CUDA is no small feat, but AMD is certainly trying.

Intel: Software success

Pro: Gaudi 3 AI accelerator shows strong performance

Con: Next big AI chip doesn’t arrive until late 2025

Intel’s challenge is the opposite of AMD’s.

While Intel lacks an exact match for Nvidia’s CUDA and AMD’s ROCm, it launched an open-source unified programming platform, OneAPI, in 2018. Unlike CUDA and ROCm, OneAPI spans multiple categories of hardware, including CPUs, GPUs, and FPGAs. So it can help developers accelerate AI tasks (and many others) on any Intel hardware. “Intel’s got a heck of a software ecosystem it can turn on pretty easily,” says Kimball.

Hardware, on the other hand, is a weakness, at least when compared to Nvidia and AMD. Intel’s Gaudi AI accelerators, the fruit of Intel’s 2019 acquisition of AI hardware startup Habana Labs, have made headway, and the latest, Gaudi 3, offers performance that’s competitive with Nvidia’s H100.

However, it’s unclear precisely what Intel’s next hardware release will look like, which has caused some concern. “Gaudi 3 is very capable,” says Patrick Moorhead, founder of Moor Insights & Strategy. But as of July 2024 “there is no Gaudi 4,” he says.

Intel instead plans to pivot to an ambitious chip, code-named Falcon Shores, with a tile-based modular architecture that combines Intel x86 CPU cores and Xe GPU cores; the latter are part of Intel’s recent push into graphics hardware. Intel has yet to reveal details about Falcon Shores’ architecture and performance, though, and it’s not slated for release until late 2025.

Cerebras: Bigger is better

Pro: Wafer-scale chips offer strong performance and memory per chip

Con: Applications are niche due to size and cost

Make no mistake: AMD and Intel are by far the most credible challengers to Nvidia. They share a history of designing successful chips and building programming platforms to go alongside them. But among the smaller, less proven players, one stands out: Cerebras.

The company, which specializes in AI for supercomputers, made waves in 2019 with the Wafer Scale Engine, a gigantic, wafer-size piece of silicon packed with 1.2 trillion transistors. The most recent iteration, Wafer Scale Engine 3, ups the ante to 4 trillion transistors. For comparison, Nvidia’s largest and newest GPU, the B200, has “just” 208 billion transistors. The computer built around this wafer-scale monster, Cerebras’s CS-3, is at the heart of the Condor Galaxy 3, which will be an 8-exaflop AI supercomputer made up of 64 CS-3s. G42, an Abu Dhabi–based conglomerate that hopes to train tomorrow’s leading-edge large language models, will own the system.

“It’s a little more niche, not as general purpose,” says Stacy Rasgon, senior analyst at Bernstein Research. “Not everyone is going to buy [these computers]. But they’ve got customers, like the [United States] Department of Defense, and [the Condor Galaxy 3] supercomputer.”

Cerebras’s WSC-3 isn’t going to challenge Nvidia, AMD, or Intel hardware in most situations; it’s too large, too costly, and too specialized. But it could give Cerebras a unique edge in supercomputers, because no other company designs chips on the scale of the WSE.

SambaNova: A transformer for transformers

Pro: Configurable architecture helps developers squeeze efficiency from AI models

Con: Hardware still has to prove relevance to mass market

SambaNova, founded in 2017, is another chip-design company tackling AI training with an unconventional chip architecture. Its flagship, the SN40L, has what the company calls a “reconfigurable dataflow architecture” composed of tiles of memory and compute resources. The links between these tiles can be altered on the fly to facilitate the quick movement of data for large neural networks.

Prendki believes such customizable silicon could prove useful for training large language models, because AI developers can optimize the hardware for different models. No other company offers that capability, she says.

SambaNova is also scoring wins with SambaFlow, the software stack used alongside the SN40L. “At the infrastructure level, SambaNova is doing a good job with the platform,” says Moorhead. SambaFlow can analyze machine learning models and help developers reconfigure the SN40L to accelerate the model’s performance. SambaNova still has a lot to prove, but its customers include SoftBank and Analog Devices.

Groq: Form for function

Pro: Excellent AI inference performance

Con: Application currently limited to inference

Yet another company with a unique spin on AI hardware is Groq. Groq’s approach is focused on tightly pairing memory and compute resources to accelerate the speed with which a large language model can respond to prompts.

“Their architecture is very memory based. The memory is tightly coupled to the processor. You need more nodes, but the price per token and the performance is nuts,” says Moorhead. The “token” is the basic unit of data a model processes; in an LLM, it’s typically a word or portion of a word. Groq’s performance is even more impressive, he says, given that its chip, called the Language Processing Unit Inference Engine, is made using GlobalFoundries’ 14-nanometer technology, several generations behind the TSMC technology that makes the Nvidia H100.

In July, Groq posted a demonstration of its chip’s inference speed, which can exceed 1,250 tokens per second running Meta’s Llama 3 8-billion parameter LLM. That beats even SambaNova’s demo, which can exceed 1,000 tokens per second.

Qualcomm: Power is everything

Pro: Broad range of chips with AI capabilities

Con: Lacks large, leading-edge chips for AI training

Qualcomm, well known for the Snapdragon system-on-a-chip that powers popular Android phones like the Samsung Galaxy S24 Ultra and OnePlus 12, is a giant that can stand toe-to-toe with AMD, Intel, and Nvidia.

But unlike those peers, the company is focusing its AI strategy more on AI inference and energy efficiency for specific tasks. Anton Lokhmotov, a founding member of the AI benchmarking organization MLCommons and CEO of Krai, a company that specializes in AI optimization, says Qualcomm has significantly improved the inference of the Qualcomm Cloud AI 100 servers in an important benchmark test. The servers’ performance increased from 180 to 240 samples-per-watt in ResNet-50, an image-classification benchmark, using “essentially the same server hardware,” Lokhmotov notes.

Efficient AI inference is also a boon on devices that need to handle AI tasks locally without reaching out to the cloud, says Lokhmotov. Case in point: Microsoft’s Copilot Plus PCs. Microsoft and Qualcomm partnered with laptop makers, including Dell, HP, and Lenovo, and the first Copilot Plus laptops with Qualcomm chips hit store shelves in July. Qualcomm also has a strong presence in smartphones and tablets, where its Snapdragon chips power devices from Samsung, OnePlus, and Motorola, among others.

Qualcomm is an important player in AI for driver assist and self-driving platforms, too. In early 2024, Hyundai’s Mobius division announced a partnership to use the Snapdragon Ride platform, a rival to Nvidia’s Drive platform, for advanced driver-assist systems.

The Hyperscalers: Custom brains for brawn

Pros: Vertical integration focuses design

Cons: Hyperscalers may prioritize their own needs and uses first

Hyperscalers—cloud-computing giants that deploy hardware at vast scales—are synonymous with Big Tech. Amazon, Apple, Google, Meta, and Microsoft all want to deploy AI hardware as quickly as possible, both for their own use and for their cloud-computing customers. To accelerate that, they’re all designing chips in-house.

Google began investing in AI processors much earlier than its competitors: The search giant’s Tensor Processing Units, first announced in 2015, now power most of its AI infrastructure. The sixth generation of TPUs, Trillium, was announced in May and is part of Google’s AI Hypercomputer, a cloud-based service for companies looking to handle AI tasks.

Prendki says Google’s TPUs give the company an advantage in pursuing AI opportunities. “I’m lucky that I don’t have to think too hard about where I get my chips,” she says. Access to TPUs doesn’t entirely eliminate the supply crunch, though, as different Google divisions still need to share resources.

And Google is no longer alone. Amazon has two in-house chips, Trainium and Inferentia, for training and inference, respectively. Microsoft has Maia, Meta has MTIA, and Apple is supposedly developing silicon to handle AI tasks in its cloud infrastructure.

None of these compete directly with Nvidia, as hyperscalers don’t sell hardware to customers. But they do sell access to their hardware through cloud services, like Google’s AI Hypercomputer, Amazon’s AWS, and Microsoft’s Azure. In many cases, hyperscalers offer services running on their own in-house hardware as an option right alongside services running on hardware from Nvidia, AMD, and Intel; Microsoft is thought to be Nvidia’s largest customer.

David Plunkert

Chinese chips: An opaque future

Another category of competitor is born not of technical needs but of geopolitical realities. The United States has imposed restrictions on the export of AI hardware that prevents chipmakers from selling their latest, most-capable chips to Chinese companies. In response, Chinese companies are designing homegrown AI chips.

Huawei is a leader. The company’s Ascend 910B AI accelerator, designed as an alternative to Nvidia’s H100, is in production at Semiconductor Manufacturing International Corp., a Shanghai-based foundry partially owned by the Chinese government. However, yield issues at SMIC have reportedly constrained supply. Huawei is also selling an “AI-in-a-box” solution, meant for Chinese companies looking to build their own AI infrastructure on-premises.

To get around the U.S. export control rules, Chinese industry could turn to alternative technologies. For example, Chinese researchers have made headway in photonic chips that use light, instead of electric charge, to perform calculations. “The advantage of a beam of light is you can cross one [beam with] another,” says Prendki. “So it reduces constraints you’d normally have on a silicon chip, where you can’t cross paths. You can make the circuits more complex, for less money.” It’s still very early days for photonic chips, but Chinese investment in the area could accelerate its development.

Room for more

It’s clear that Nvidia has no shortage of competitors. It’s equally clear that none of them will challenge—never mind defeat—Nvidia in the next few years. Everyone interviewed for this article agreed that Nvidia’s dominance is currently unparalleled, but that doesn’t mean it will crowd out competitors forever.

“Listen, the market wants choice,” says Moorhead. “I can’t imagine AMD not having 10 or 20 percent market share, Intel the same, if we go to 2026. Typically, the market likes three, and there we have three reasonable competitors.” Kimball says the hyperscalers, meanwhile, could challenge Nvidia as they transition more AI services to in-house hardware.

And then there’s the wild cards. Cerebras, SambaNova, and Groq are the leaders in a very long list of startups looking to nibble away at Nvidia with novel solutions. They’re joined by dozens of others, including d-Matrix, Untether, Tenstorrent, and Etched, all pinning their hopes on new chip architectures optimized for generative AI. It’s likely many of these startups will falter, but perhaps the next Nvidia will emerge from the survivors.

This article appears in the October 2024 print issue.

Amazon's Secret Weapon in Chip Design Is Amazon

Samuel K. Moore — Sun, 15 Sep 2024 13:00:02 +0000

Big-name makers of processors, especially those geared toward cloud-based AI, such as AMD and Nvidia, have been showing signs of wanting to own more of the business of computing, purchasing makers of software, interconnects, and servers. The hope is that control of the “full stack” will give them an edge in designing what their customers want.

Amazon Web Services (AWS) got there ahead of most of the competition, when they purchased chip designer Annapurna Labs in 2015 and proceeded to design CPUs, AI accelerators, servers, and data centers as a vertically-integrated operation. Ali Saidi, the technical lead for the Graviton series of CPUs, and Rami Sinno, director of engineering at Annapurna Labs, explained the advantage of vertically-integrated design and Amazon-scale and showed IEEE Spectrum around the company’s hardware testing labs in Austin, Tex., on 27 August.

What kind of engineer a vertically-integrated cloud company needs
How the Graviton series of CPUs evolved
How chip design is different at AWS
How the lab location speeds design

What brought you to Amazon Web Services, Rami?

Rami SinnoAWS

Rami Sinno: Amazon is my first vertically integrated company. And that was on purpose. I was working at Arm, and I was looking for the next adventure, looking at where the industry is heading and what I want my legacy to be. I looked at two things:

One is vertically integrated companies, because this is where most of the innovation is—the interesting stuff is happening when you control the full hardware and software stack and deliver directly to customers.

And the second thing is, I realized that machine learning, AI in general, is going to be very, very big. I didn’t know exactly which direction it was going to take, but I knew that there is something that is going to be generational, and I wanted to be part of that. I already had that experience prior when I was part of the group that was building the chips that go into the Blackberries; that was a fundamental shift in the industry. That feeling was incredible, to be part of something so big, so fundamental. And I thought, “Okay, I have another chance to be part of something fundamental.”

Does working at a vertically-integrated company require a different kind of chip design engineer?

Sinno: Absolutely. When I hire people, the interview process is going after people that have that mindset. Let me give you a specific example: Say I need a signal integrity engineer. (Signal integrity makes sure a signal going from point A to point B, wherever it is in the system, makes it there correctly.) Typically, you hire signal integrity engineers that have a lot of experience in analysis for signal integrity, that understand layout impacts, can do measurements in the lab. Well, this is not sufficient for our group, because we want our signal integrity engineers also to be coders. We want them to be able to take a workload or a test that will run at the system level and be able to modify it or build a new one from scratch in order to look at the signal integrity impact at the system level under workload. This is where being trained to be flexible, to think outside of the little box has paid off huge dividends in the way that we do development and the way we serve our customers.

“By the time that we get the silicon back, the software’s done” —Ali Saidi, Annapurna Labs

At the end of the day, our responsibility is to deliver complete servers in the data center directly for our customers. And if you think from that perspective, you’ll be able to optimize and innovate across the full stack. A design engineer or a test engineer should be able to look at the full picture because that’s his or her job, deliver the complete server to the data center and look where best to do optimization. It might not be at the transistor level or at the substrate level or at the board level. It could be something completely different. It could be purely software. And having that knowledge, having that visibility, will allow the engineers to be significantly more productive and delivery to the customer significantly faster. We’re not going to bang our head against the wall to optimize the transistor where three lines of code downstream will solve these problems, right?

Do you feel like people are trained in that way these days?

Sinno: We’ve had very good luck with recent college grads. Recent college grads, especially the past couple of years, have been absolutely phenomenal. I’m very, very pleased with the way that the education system is graduating the engineers and the computer scientists that are interested in the type of jobs that we have for them.

The other place that we have been super successful in finding the right people is at startups. They know what it takes, because at a startup, by definition, you have to do so many different things. People who’ve done startups before completely understand the culture and the mindset that we have at Amazon.

[back to top]

What brought you to AWS, Ali?

Ali SaidiAWS

Ali Saidi: I’ve been here about seven and a half years. When I joined AWS, I joined a secret project at the time. I was told: “We’re going to build some Arm servers. Tell no one.”

We started with Graviton 1. Graviton 1 was really the vehicle for us to prove that we could offer the same experience in AWS with a different architecture.

The cloud gave us an ability for a customer to try it in a very low-cost, low barrier of entry way and say, “Does it work for my workload?” So Graviton 1 was really just the vehicle demonstrate that we could do this, and to start signaling to the world that we want software around ARM servers to grow and that they’re going to be more relevant.

Graviton 2—announced in 2019—was kind of our first… what we think is a market-leading device that’s targeting general-purpose workloads, web servers, and those types of things.

It’s done very well. We have people running databases, web servers, key-value stores, lots of applications... When customers adopt Graviton, they bring one workload, and they see the benefits of bringing that one workload. And then the next question they ask is, “Well, I want to bring some more workloads. What should I bring?” There were some where it wasn’t powerful enough effectively, particularly around things like media encoding, taking videos and encoding them or re-encoding them or encoding them to multiple streams. It’s a very math-heavy operation and required more [single-instruction multiple data] bandwidth. We need cores that could do more math.

We also wanted to enable the [high-performance computing] market. So we have an instance type called HPC 7G where we’ve got customers like Formula One. They do computational fluid dynamics of how this car is going to disturb the air and how that affects following cars. It’s really just expanding the portfolio of applications. We did the same thing when we went to Graviton 4, which has 96 cores versus Graviton 3’s 64.

[back to top]

How do you know what to improve from one generation to the next?

Saidi: Far and wide, most customers find great success when they adopt Graviton. Occasionally, they see performance that isn’t the same level as their other migrations. They might say “I moved these three apps, and I got 20 percent higher performance; that’s great. But I moved this app over here, and I didn’t get any performance improvement. Why?” It’s really great to see the 20 percent. But for me, in the kind of weird way I am, the 0 percent is actually more interesting, because it gives us something to go and explore with them.

Most of our customers are very open to those kinds of engagements. So we can understand what their application is and build some kind of proxy for it. Or if it’s an internal workload, then we could just use the original software. And then we can use that to kind of close the loop and work on what the next generation of Graviton will have and how we’re going to enable better performance there.

What’s different about designing chips at AWS?

Saidi: In chip design, there are many different competing optimization points. You have all of these conflicting requirements, you have cost, you have scheduling, you’ve got power consumption, you’ve got size, what DRAM technologies are available and when you’re going to intersect them… It ends up being this fun, multifaceted optimization problem to figure out what’s the best thing that you can build in a timeframe. And you need to get it right.

One thing that we’ve done very well is taken our initial silicon to production.

How?

Saidi: This might sound weird, but I’ve seen other places where the software and the hardware people effectively don’t talk. The hardware and software people in Annapurna and AWS work together from day one. The software people are writing the software that will ultimately be the production software and firmware while the hardware is being developed in cooperation with the hardware engineers. By working together, we’re closing that iteration loop. When you are carrying the piece of hardware over to the software engineer’s desk your iteration loop is years and years. Here, we are iterating constantly. We’re running virtual machines in our emulators before we have the silicon ready. We are taking an emulation of [a complete system] and running most of the software we’re going to run.

So by the time that we get to the silicon back [from the foundry], the software’s done. And we’ve seen most of the software work at this point. So we have very high confidence that it’s going to work.

The other piece of it, I think, is just being absolutely laser-focused on what we are going to deliver. You get a lot of ideas, but your design resources are approximately fixed. No matter how many ideas I put in the bucket, I’m not going to be able to hire that many more people, and my budget’s probably fixed. So every idea I throw in the bucket is going to use some resources. And if that feature isn’t really important to the success of the project, I’m risking the rest of the project. And I think that’s a mistake that people frequently make.

Are those decisions easier in a vertically integrated situation?

Saidi: Certainly. We know we’re going to build a motherboard and a server and put it in a rack, and we know what that looks like… So we know the features we need. We’re not trying to build a superset product that could allow us to go into multiple markets. We’re laser-focused into one.

What else is unique about the AWS chip design environment?

Saidi: One thing that’s very interesting for AWS is that we’re the cloud and we’re also developing these chips in the cloud. We were the first company to really push on running [electronic design automation (EDA)] in the cloud. We changed the model from “I’ve got 80 servers and this is what I use for EDA” to “Today, I have 80 servers. If I want, tomorrow I can have 300. The next day, I can have 1,000.”

We can compress some of the time by varying the resources that we use. At the beginning of the project, we don’t need as many resources. We can turn a lot of stuff off and not pay for it effectively. As we get to the end of the project, now we need many more resources. And instead of saying, “Well, I can’t iterate this fast, because I’ve got this one machine, and it’s busy.” I can change that and instead say, “Well, I don’t want one machine; I’ll have 10 machines today.”

Instead of my iteration cycle being two days for a big design like this, instead of being even one day, with these 10 machines I can bring it down to three or four hours. That’s huge.

How important is Amazon.com as a customer?

Saidi: They have a wealth of workloads, and we obviously are the same company, so we have access to some of those workloads in ways that with third parties, we don’t. But we also have very close relationships with other external customers.

So last Prime Day, we said that 2,600 Amazon.com services were running on Graviton processors. This Prime Day, that number more than doubled to 5,800 services running on Graviton. And the retail side of Amazon used over 250,000 Graviton CPUs in support of the retail website and the services around that for Prime Day.

[back to top]

The AI accelerator team is colocated with the labs that test everything from chips through racks of servers. Why?

Sinno: So Annapurna Labs has multiple labs in multiple locations as well. This location here is in Austin… is one of the smaller labs. But what’s so interesting about the lab here in Austin is that you have all of the hardware and many software development engineers for machine learning servers and for Trainium and Inferentia [AWS’s AI chips] effectively co-located on this floor. For hardware developers, engineers, having the labs co-located on the same floor has been very, very effective. It speeds execution and iteration for delivery to the customers. This lab is set up to be self-sufficient with anything that we need to do, at the chip level, at the server level, at the board level. Because again, as I convey to our teams, our job is not the chip; our job is not the board; our job is the full server to the customer.

How does vertical integration help you design and test chips for data-center-scale deployment?

Sinno: It’s relatively easy to create a bar-raising server. Something that’s very high-performance, very low-power. If we create 10 of them, 100 of them, maybe 1,000 of them, it’s easy. You can cherry pick this, you can fix this, you can fix that. But the scale that the AWS is at is significantly higher. We need to train models that require 100,000 of these chips. 100,000! And for training, it’s not run in five minutes. It’s run in hours or days or weeks even. Those 100,000 chips have to be up for the duration. Everything that we do here is to get to that point.

We start from a “what are all the things that can go wrong?” mindset. And we implement all the things that we know. But when you were talking about cloud scale, there are always things that you have not thought of that come up. These are the 0.001-percent type issues.

In this case, we do the debug first in the fleet. And in certain cases, we have to do debugs in the lab to find the root cause. And if we can fix it immediately, we fix it immediately. Being vertically integrated, in many cases we can do a software fix for it. We use our agility to rush a fix while at the same time making sure that the next generation has it already figured out from the get go.

[back to top]

From Punch Cards to Python

Joanna Goodrich — Fri, 13 Sep 2024 18:00:03 +0000

In today’s digital world, it’s easy for just about anyone to create a mobile app or write software, thanks to Java, JavaScript, Python, and other programming languages.

But that wasn’t always the case. Because the primary language of computers is binary code, early programmers used punch cards to instruct computers what tasks to complete. Each hole represented a single binary digit.

That changed in 1952 with the A-0 compiler, a series of specifications that automatically translates high-level languages such as English into machine-readable binary code.

The compiler, now an IEEE Milestone, was developed by Grace Hopper, who worked as a senior mathematician at the Eckert-Mauchly Computer Corp., now part of Unisys, in Philadelphia.

IEEE Fellow’s innovation allowed programmers to write code faster and easier using English commands. For her, however, the most important outcome was the influence it had on the development of modern programming languages, making writing code more accessible to everyone, according to a Penn Engineering Today article.

The dedication of the A-0 compiler as an IEEE Milestone was held in Philadelphia on 7 May at the University of Pennsylvania. That’s where the Eckert-Mauchly Computer Corp. got its start.

“This milestone celebrates the first step of applying computers to automate the tedious portions of their own programming,” André DeHon, professor of electrical systems, engineering, and computer science, said at the dedication ceremony.

Eliminating the punch-card system

To program a computer, early technicians wrote out tasks in assembly language—a human-readable way to write machine code, which is made up of binary numbers. They then manually translated the assembly language into machine code and punched holes representing the binary digits into cards, according to a Medium article on the method. The cards were fed into a machine that read the holes and input the data into the computer.

The punch-card system was laborious; it could take days to complete a task. The cards couldn’t be used with even a slight defect such as a bent corner. The method also had a high risk of human error.

After leading the development of the Electronic Numerical Integrator and Computer (ENIAC) at Penn, computer scientists J. Presper Eckert and John W. Mauchly set about creating a replacement for punch cards. ENIAC was built to improve the accuracy of U.S. artillery during World War II, but the two men wanted to develop computers for commercial applications, according to a Pennsylvania Center for the Book article.

The machine they designed was the first known large-scale electronic computer, the Universal Automatic, or UNIVAC I. Hopper was on its development team.

UNIVAC I used 6,103 vacuum tubes and took up a 33-square-meter room. The machine had a memory unit. Instead of punch cards, the computer used magnetic tape to input data. The tapes, which could hold audio, video, and written data, were up to 457 meters long. Unlike previous computers, the UNIVAC I had a keyboard so an operator could input commands, according to the Pennsylvania Center for the Book article.

“This milestone celebrates the first step of applying computers to automate the tedious portions of their own programming.” —André DeHon

Technicians still had to manually feed instructions into the computer, however, to run any new program.

That time-consuming process led to errors because “programmers are lousy copyists,” Hopper said in a speech for the Association for Computing Machinery. “It was amazing how many times a 4 would turn into a delta, which was our space symbol, or into an A. Even B’s turned into 13s.”

According to a Hidden Heroes article, Hopper had an idea for simplifying programming: Have the computer translate English to machine code.

She was inspired by computer scientist Betty Holberton’s sort/merge generator and Mauchly’s Short Code. Holberton is one of six women who programmed the ENIAC to calculate artillery trajectories in seconds, and she worked alongside Hopper on the UNIVAC I. Her sort/merge program, invented in 1951 for the UNIVAC I, handled the large data files stored on magnetic tapes. Hopper defined the sort/merge program as the first version of virtual memory because it made use of overlays automatically without being directed to by the programmer, according to a Stanford presentation about programming languages. The Short Code, which was developed in the 1940s, allowed technicians to write programs using brief sequences of English words corresponding directly to machine code instructions. It bridged the gap between human-readable code and machine-executable instructions.

“I think the first step to tell us that we could actually use a computer to write programs was the sort/merge generator,” Hopper said in the presentation. “And Short Code was the first step in moving toward something which gave a programmer the actual power to write a program in a language which bore no resemblance whatsoever to the original machine code.”

IEEE Fellow Grace Hopper inputting call numbers into the Universal Automatic (UNIVAC I), which allows the computer to find the correct instructions to complete. The A-0 compiler translates the English instructions into machine-readable binary code.Computer History Museum

Easier, faster, and more accurate programming

Hopper, who figured computers should speak human-like languages, rather than requiring humans to speak computer languages, began thinking about how to allow programmers to call up specific codes using English, according to an IT Professional profile.

But she needed a library of frequently used instructions for the computer to reference and a system to translate English to machine code. That way, the computer could understand what task to complete.

Such a library didn’t exist, so Hopper built her own. It included tapes that held frequently used instructions for tasks that she called subroutines. Each tape stored one subroutine, which was assigned a three-number call sign so that the UNIVAC I could locate the correct tape. The numbers represented sets of three memory addresses: one for the memory location of the subroutine, another for the memory location of the data, and the third for the output location, according to the Stanford presentation.

“All I had to do was to write down a set of call numbers, let the computer find them on the tape, and do the additions,” she said in a Centre for Computing History article. “This was the first compiler.”

The system was dubbed the A-0 compiler because code was written in one language, which was then “compiled” into a machine language.

What previously had taken a month of manual coding could now be done in five minutes, according to a Cockroach Labs article.

Hopper presented the A-0 to Eckert-Mauchly Computer executives. Instead of being excited, though, they said they didn’t believe a computer could write its own programs, according to the article.

“I had a running compiler, and nobody would touch it, because they carefully told me computers could only do arithmetic; they could not do programs,” Hopper said. “It was a selling job to get people to try it. I think with any new idea, because people are allergic to change, you have to get out and sell the idea.”

It took two years for the company’s leadership to accept the A-0.

In 1954, Hopper was promoted to director of automatic programming for the UNIVAC division. She went on to create the first compiler-based programming languages including Flow-Matic, the first English language data-processing compiler. It was used to program UNIVAC I and II machines.

Hopper also was involved in developing COBOL, one of the earliest standardized computer languages. It enabled computers to respond to words in addition to numbers, and it is still used in business, finance, and administrative systems. Hopper’s Flow-Matic formed the foundation of COBOL, whose first specifications were made available in 1959.

A plaque recognizing the A-0 is now displayed at the University of Pennsylvania. It reads:

During 1951–1952, Grace Hopper invented the A-0 Compiler, a series of specifications that functioned as a linker/loader. It was a pioneering achievement of automatic programming as well as a pioneering utility program for the management of subroutines. The A-0 Compiler influenced the development of arithmetic and business programming languages. This led to COBOL (Common Business-Oriented Language), becoming the dominant high-level language for business applications.

The IEEE Philadelphia Section sponsored the nomination.

Administered by the IEEE History Center and supported by donors, the Milestone program recognizes outstanding technical developments worldwide.

About Grace Hopper

Hopper didn’t start as a computer programmer. She was a mathematician at heart, earning bachelor’s degrees in mathematics and physics in 1928 from Vassar College, in Poughkeepsie, N.Y. She then received master’s and doctoral degrees in mathematics and mathematical physics from Yale in 1930 and 1934, respectively.

She taught math at Vassar, but after the bombing of Pearl Harbor and the U.S. entry into World War II, Hopper joined the war effort. She took a leave of absence from Vassar to join the U.S. Naval Reserve (Women’s Reserve) in December 1943. She was assigned to the Bureau of Ships Computation Project at Harvard, where she worked for mathematician Howard Aiken. She was part of Aiken’s team that developed the Mark I, one of the earliest electromechanical computers. Hopper was the third person and the first woman to program the machine.

After the war ended, she became a research fellow at the Harvard Computation Laboratory. In 1946 she joined the Eckert-Mauchly Computer Corp., where she worked until her retirement in 1971. During 1959 she was an adjunct lecturer at Penn’s Moore School of Electrical Engineering.

Her work in programming earned her the nickname “Amazing Grace,” according to an entry about her on the Engineering and Technology History Wiki.

Hopper remained a member of the Naval Reserve and, in 1967, was recalled to active duty. She led the effort to standardize programming languages for the military, according to the ETHW entry. She was eventually promoted to rear admiral. When she retired from the Navy at the age of 79 in 1989, she was the oldest serving officer in all the U.S. armed forces.

Among her many honors was the 1991 U.S. National Medal of Technology and Innovation “for her pioneering accomplishments in the development of computer programming languages that simplified computer technology and opened the door to a significantly larger universe of users.”

She received 40 honorary doctorates from universities, and the Navy named a warship in her honor.

Transistor-like Qubits Hit Key Benchmark

Dina Genkina — Wed, 11 Sep 2024 12:00:03 +0000

A team in Australia has recently demonstrated a key advance in metal-oxide-semiconductor-based (or MOS-based) quantum computers. They showed that their two-qubit gates—logical operations that involve more than one quantum bit, or qubit—perform without errors 99 percent of the time. This number is important, because it is the baseline necessary to perform error correction, which is believed to be necessary to build a large-scale quantum computer. What’s more, these MOS-based quantum computers are compatible with existing CMOS technology, which will make it more straightforward to manufacture a large number of qubits on a single chip than with other techniques.

“Getting over 99 percent is significant because that is considered by many to be the error correction threshold, in the sense that if your fidelity is lower than 99 percent, it doesn’t really matter what you’re going to do in error correction,” says Yuval Boger, CCO of quantum computing company QuEra and who wasn’t involved in the work. “You’re never going to fix errors faster than they accumulate.”

There are many contending platforms in the race to build a useful quantum computer. IBM, Google and others are building their machines out of superconducting qubits. Quantinuum and IonQ use individual trapped ions. QuEra and Atom Computing use neutrally-charged atoms. Xanadu and PsiQuantum are betting on photons. The list goes on.

In the new result, a collaboration between the University of New South Wales (UNSW) and Sydney-based startup Diraq, with contributors from Japan, Germany, Canada, and the U.S., has taken yet another approach: trapping single electrons in MOS devices. “What we are trying to do is we are trying to make qubits that are as close to traditional transistors as they can be,” says Tuomo Tanttu, a research fellow at UNSW who led the effort.

Qubits That Act Like Transistors

These qubits are indeed very similar to a regular transistor, gated in such a way as to have only a single electron in the channel. The biggest advantage of this approach is that it can be manufactured using traditional CMOS technologies, making it theoretically possible to scale to millions of qubits on a single chip. Another advantage is that MOS qubits can be integrated on-chip with standard transistors for simplified input, output, and control, says Diraq CEO Andrew Dzurak.

The drawback of this approach, however, is that MOS qubits have historically suffered from device-to-device variability, causing significant noise on the qubits.

“The sensitivity in [MOS] qubits is going to be more than in transistors, because in transistors, you still have 20, 30, 40 electrons carrying the current. In a qubit device, you’re really down to a single electron,” says Ravi Pillarisetty, a senior device engineer for Intel quantum hardware who wasn’t involved in the work.

The team’s result not only demonstrated the 99 percent accurate functionality on two-qubit gates of the test devices, but also helped better understand the sources of device-to-device variability. The team tested three devices with three qubits each. In addition to measuring the error rate, they also performed comprehensive studies to glean the underlying physical mechanisms that contribute to noise.

The researchers found that one of the sources of noise was isotopic impurities in the silicon layer, which, when controlled, greatly reduced the circuit complexity necessary to run the device. The next leading cause of noise was small variations in electric fields, likely due to imperfections in the oxide layer of the device. Tanttu says this is likely to improve by transitioning from a laboratory clean room to a foundry environment.

“It’s a great result and great progress. And I think it’s setting the right direction for the community in terms of thinking less about one individual device, or demonstrating something on an individual device, versus thinking more longer term about the scaling path,” Pillarisetty says.

Now, the challenge will be to scale up these devices to more qubits. One difficulty with scaling is the number of input/output channels required. The quantum team at Intel, who are pursuing a similar technology, has recently pioneered a chip they call Pando Tree to try to address this issue. Pando Tree will be on the same plane as the quantum processor, enabling faster inputs and outputs to the qubits. The Intel team hopes to use it to scale to thousands of qubits. “A lot of our approach is thinking about, how do we make our qubit processor look more like a modern CPU?” says Pillarisetty.

Similarly, Diraq CEO Dzurak says his team plan to scale their technology to thousands of qubits in the near future through a recently announced partnership with Global Foundries. “With Global Foundries, we designed a chip that will have thousands of these [MOS qubits]. And these will be interconnected by using classical transistor circuitry that we designed. This is unprecedented in the quantum computing world,” Dzurak says.

Where VR Gaming Took a Wrong Turn

Marcus Carter — Tue, 10 Sep 2024 13:00:02 +0000

In 2017 Mark Zuckerberg stated a bold goal: He wanted one billion people to try virtual reality (VR) by 2027. While he still has a few years to pull it off, the target remains impossibly farfetched. The most recent estimates place total worldwide VR headset sales at only 34 million.

VR Gaming was expected to lead this uptake, but why hasn’t it? We believe that VR gaming has been held back by game developers who are committed to a fantasy. In this fantasy, VR games align with the values of “hardcore” gamer culture, with advanced graphics and wholly immersive play. Aspirational attempts to reach this flawed fantasy have squashed the true potential of VR for gaming.

VR Gaming’s Contemporary Emergence

The 1990s and 2000s saw several ill-fated attempts to launch VR gaming systems—including the Sega VR system, which the company promoted breathlessly but then never released, because it gave players motion sickness and headaches. But VR gaming’s contemporary emergence really began in August 2009, when then-17-year-old Palmer Luckey began posting on a VR enthusiasts forum about his plan to make a head-mounted VR gaming device. One early reader of Luckey’s posts was John Carmack, lead programmer for several of the most influential first-person shooter games, including Doom and Wolfenstein.

Palmer Luckey, shown here in 2013 at the age of 20, holds an early Oculus Rift virtual reality head-mounted display.Allen J. Schaben/Los Angeles Times/Getty Images

While working on the remaster of Doom 3—which included support for 3D displays—Carmack was experimenting with different VR headsets that were available at the time. The two connected through their forum posts, and Luckey sent one of his prototype VR headsets to Carmack. When Carmack took the prototype to the major gaming expo E3 in 2012, it catalyzed an avalanche of interest in the project.

Carmack’s involvement put Luckey’s newly formed company, Oculus VR, on a trajectory towards a particular kind of gaming: the hyper-violent games with high-fidelity graphics that hardcore gamers revere. Carmack, far beyond anyone else, pioneered the genre of hardcore games with his first-person shooter games.

Sega Visions Magazine promoted Sega VR in its August/September 1993 issue. Sega

Here’s how the gaming scholar Shira Chess sums up the genre: “Traditionally, ‘hardcore’ describes games that are difficult to learn, expensive, and unforgiving of mistakes and that must be played over longer periods of time. Conversely, casual games can be learned quickly, are forgiving of mistakes and cheap or free, and can be played for either longer or shorter periods of time, depending on one’s schedule.”

Oculus’s Kickstarter campaign in 2012 was proudly “designed for gamers, by gamers.” Soon after, Meta (then Facebook) acquired Oculus for US $3 billion in March 2014. The acquisition enraged many of those in the gaming community and those who had backed the original Kickstarter. Facebook was already an unpopular platform with the tech-enthusiast community, associated more closely with data collection and surveillance than gaming. If Facebook was associated with gaming, it was with casual social media games like Farmville and Bejeweled. But as it turns out, Meta went on to invest billions in VR, a level of investment highly unlikely if Oculus had remained independent.

The Three Wrong Assumptions of VR Gaming

VR’s origin in hardcore gaming culture resulted in VR game development being underpinned by three false assumptions about the types of experiences that would (or could) make VR gaming successful. These assumptions were that gamers wanted graphical realism and fast-paced violence, and that they didn’t want casual play experiences.

Over the past three decades, “AAA” game development—a term used in the games industry to signify high-budget games distributed by large publishers—has driven the massive expansion of computing power in consumer gaming devices. Particularly in PC gaming, part of what made a game hardcore was the computing power needed to run it at “maximum settings,” with the most detailed and textured graphics available.

The enormous advances in game graphics over the past 30 years contributed to significant improvements in player experience. This graphical realism became closely entwined with the concept of immersion.

For VR—which sold itself as “ truly immersive”—this meant that hardcore gamers expected graphically real VR experiences. But VR environments need to be rendered smoothly in order to not cause motion sickness, something made harder by a commitment to graphical realism. This aspiration saddling VR games with a nearly impossible compute burden.

One game that sidesteps this issue—and has subsequently become one of the most celebrated VR games—is Superhot VR, an action puzzle with basic graphics in which enemy avatars and their bullets only move when the player moves their body.

The video game Superhot VR remains one of the top-selling VR games years after its release due to its unique experience of time manipulation through body movements.Superhot VR

Play begins with the player surrounded by attacking enemies, with death immediately returning the player to the starting moment. Play thus involves discovering what sequence of movements and attacks can get the player out of this perilous situation. It’s a learning curve reminiscent of the 2014 science-fiction film Edge of Tomorrow, in which a hapless soldier (played by Tom Cruise) quicky becomes an elite, superhuman soldier while stuck in a time loop.

The attention in Superhot’s gameplay is not to visual fidelity or sensory immersion, but what genuinely makes VR distinct: embodiment. The effect of its conceit is a superhuman-like control of time manipulation, with players deftly contorting their bodies to evade slow moving bullets while dispatching enemies with an empowering ease. Superhot VR provides an experience worth donning a headset for, and it consequently remains one of VR gaming’s top selling titles eight years after its release.

When Immersion Is Too Much

John Carmack’ Doom and Wolfenstein, on which VR’s gaming fantasy was based, are first-person shooters that closely map to hardcore gaming ideals. They’re hyperviolent, fast-paced, and difficult; they have a limited focus on story; and they feature some of the goriest scenes in games. In the same way that VR gaming has been detrimentally entwined with the pursuit of photorealism, VR gaming has been co-opted by these hardcore values that ultimately limit the medium. They lack mainstream appeal and valorise experiences that simply aren’t as appealing in VR as it is in a flat screen.

In a discussion around the design of Half Life: Alyx—one of the only high-budget VR-only games—designers Greg Coomer and Robin Walker explain that VR changes the way that people interact with virtual environments. As Coomer says, “people are slower to traverse space, and they want to slow down and be more interactive with more things in each environment. It has affected, on a fundamental level, how we’ve constructed environments and put things together.” Walker adds that the changes aren’t “because of some constraint around how they move through the world, it’s just because they pay so much more attention to things and poke at things.” Environments in VR games are much denser; on PC they feel small, but in VR they feel big.

This in part explains why few games originally designed for flat screens and “ported” to VR have been successful. The rapidly paced hyperviolence best characterized by Doom is simply sensory overload in VR, and the “intensity of being there”—one of Carmack’s aspirations—is unappealing. In VR, unrelenting games are unpleasurable: Most of us aren’t that coordinated, and we can’t play for extended periods of time in VR. It’s physically exhausting.

Casual Virtual Reality?

Beat Saber is a prime example of a game that might be derided as casual, if it weren’t the bestselling VR game of all time. Beat Saber is a music rhythm-matching game, a hybrid of Dance Dance Revolution, Guitar Hero, and Fruit Ninja. In time with electronic music, a playlist of red or blue boxes streams towards the player. Armed with two neon swords—commonly described as light sabers—the player must strike these boxes in the correct direction, denoted by a subtle white arrow.

Striking a box releases a note in the accompanying song, resulting in an experience that is half playing an instrument, and half dance. Well patterned songs create sweeping movements and rhythms reminiscent of the exaggerated gestures used by Nintendo Wii players.

Beat Saber youtube

Beat Saber’s appeal is immersion-through-embodiment, also achieved by disregarding VR’s gaming fantasy of hardcore experiences. With each song being, well, song length, Beat Saber supports a shorter, casual mode of engagement that isn’t pleasurable because it is difficult or competitive, but simply because playing a song feels good.

Gaming in VR has been subjected to a vicious self-reinforcing cycle wherein VR developers create hardcore games, which appeal to a certain kind of hardcore gamer user, whose purchasing habits in turn drive further development of those kinds of games, and not others. Attempts to penetrate this feedback loop have been met with the hostility of VR’s online gaming culture, appropriated from gamer culture at large.

As a result, the scope of VR games remains narrow, and oblivious to the kinds of games that might take VR to its billionth user. Maybe then, the one thing that could save VR gaming is the one possibility that VR enthusiasts decried the most when Facebook purchased Oculus in 2014: Farmville VR.

AI Inference Competition Heats Up

Dina Genkina — Wed, 28 Aug 2024 15:07:26 +0000

While the dominance of Nvidia GPUs for AI training remains undisputed, we may be seeing early signs that, for AI inference, the competition is gaining on the tech giant, particularly in terms of power efficiency. The sheer performance of Nvidia’s new Blackwell chip, however, may be hard to beat.

This morning, ML Commons released the results of its latest AI inferencing competition, ML Perf Inference v4.1. This round included first-time submissions from teams using AMD Instinct accelerators, the latest Google Trillium accelerators, chips from Toronto-based startup UntetherAI, as well as a first trial for Nvidia’s new Blackwell chip. Two other companies, Cerebras and FuriosaAI, announced new inference chips but did not submit to MLPerf.

Much like an Olympic sport, MLPerf has many categories and subcategories. The one that saw the biggest number of submissions was the “datacenter-closed” category. The closed category (as opposed to open) requires submitters to run inference on a given model as-is, without significant software modification. The data center category tests submitters on bulk processing of queries, as opposed to the edge category, where minimizing latency is the focus.

Within each category, there are 9 different benchmarks, for different types of AI tasks. These include popular use cases such as image generation (think Midjourney) and LLM Q&A (think ChatGPT), as well as equally important but less heralded tasks such as image classification, object detection, and recommendation engines.

This round of the competition included a new benchmark, called Mixture of Experts. This is a growing trend in LLM deployment, where a language model is broken up into several smaller, independent language models, each fine-tuned for a particular task, such as regular conversation, solving math problems, and assisting with coding. The model can direct each query to an appropriate subset of the smaller models, or “experts”. This approach allows for less resource use per query, enabling lower cost and higher throughput, says Miroslav Hodak, MLPerf Inference Workgroup Chair and senior member of technical staff at AMD.

The winners on each benchmark within the popular datacenter-closed benchmark were still submissions based on Nvidia’s H200 GPUs and GH200 superchips, which combine GPUs and CPUs in the same package. However, a closer look at the performance results paint a more complex picture. Some of the submitters used many accelerator chips while others used just one. If we normalize the number of queries per second each submitter was able to handle by the number of accelerators used, and keep only the best performing submissions for each accelerator type, some interesting details emerge. (It’s important to note that this approach ignores the role of CPUs and interconnects.)

On a per accelerator basis, Nvidia’s Blackwell outperforms all previous chip iterations by 2.5x on the LLM Q&A task, the only benchmark it was submitted to. Untether AI’s speedAI240 Preview chip performed almost on-par with H200’s in its only submission task, image recognition. Google’s Trillium performed just over half as well as the H100 and H200s on image generation, and AMD’s Instinct performed about on-par with H100s on the LLM Q&A task.

The power of Blackwell

One of the reasons for Nvidia Blackwell’s success is its ability to run the LLM using 4-bit floating-point precision. Nvidia and its rivals have been driving down the number of bits used to represent data in portions of transformer models like ChatGPT to speed computation. Nvidia introduced 8-bit math with the H100, and this submission marks the first demonstration of 4-bit math on MLPerf benchmarks.

The greatest challenge with using such low-precision numbers is maintaining accuracy, says Nvidia’s product marketing director Dave Salvator. To maintain the high accuracy required for MLPerf submissions, the Nvidia team had to innovate significantly on software, he says.

Another important contribution to Blackwell’s success is it’s almost doubled memory bandwidth, 8 terabytes/second, compared to H200’s 4.8 terabytes/second.

Nvidia GB2800 Grace Blackwell SuperchipNvidia

Nvidia’s Blackwell submission used a single chip, but Salvator says it’s built to network and scale, and will perform best when combined with Nvidia’s NVLink interconnects. Blackwell GPUs support up to 18 NVLink 100 gigabyte-per-second connections for a total bandwidth of 1.8 terabytes per second, roughly double the interconnect bandwidth of H100s.

Salvatore argues that with the increasing size of large language models, even inferencing will require multi-GPU platforms to keep up with demand, and Blackwell is built for this eventuality. “Blackwell is a platform,” Salvator says.

Nvidia submitted their Blackwell chip-based system in the preview subcategory, meaning it is not for sale yet but is expected to be available before the next MLPerf release, six months from now.

Untether AI shines in power use and at the edge

For each benchmark, MLPerf also includes an energy measurement counterpart, which systematically tests the wall plug power that each of the systems draws while performing a task. The main event (the datacenter-closed energy category) saw only two submitters this round: Nvidia and Untether AI. While Nvidia competed in all the benchmarks, Untether only submitted for image recognition.

Submitter	Accelerator	Number of accelerators	Queries per second	Watts	Queries per second per Watt
NVIDIA	NVIDIA H200-SXM-141GB	8	480,131.00	5,013.79	95.76
UntetherAI	UntetherAI speedAI240 Slim	6	309,752.00	985.52	314.30

The startup was able to achieve this impressive efficiency by building chips with an approach it calls at-memory computing. UntetherAI’s chips are built as a grid of memory elements with small processors interspersed directly adjacent to them. The processors are parallelized, each working simultaneously with the data in the nearby memory units, thus greatly decreasing the amount of time and energy spent shuttling model data between memory and compute cores.

“What we saw was that 90 percent of the energy to do an AI workload is just moving the data from DRAM onto the cache to the processing element,” says Untether AI vice president of product Robert Beachler. “So what Untether did was turn that around ... Rather than moving the data to the compute, I’m going to move the compute to the data.”

This approach proved particularly successful in another subcategory of MLPerf: edge-closed. This category is geared towards more on-the-ground use cases, such as machine inspection on the factory floor, guided vision robotics, and autonomous vehicles—applications where low energy use and fast processing are paramount, Beachler says.

Submitter	GPU type	Number of GPUs	Single Stream Latency (ms)	Multi-Stream Latency (ms)	Samples/s
Lenovo	NVIDIA L4	2	0.39	0.75	25,600.00
Lenovo	NVIDIA L40S	2	0.33	0.53	86,304.60
UntetherAI	UntetherAI speedAI240 Preview	2	0.12	0.21	140,625.00

On the image recognition task, again the only one UntetherAI reported results for, the speedAI240 Preview chip beat NVIDIA L40S’s latency performance by 2.8x and its throughput (samples per second) by 1.6x. The startup also submitted power results in this category, but their Nvidia-accelerated competitors did not, so it is hard to make a direct comparison. However, the nominal power draw per chip for UntetherAI’s speedAI240 Preview chip is 150 Watts, while for Nvidia’s L40s it is 350 W, leading to a nominal 2.3x power reduction with improved latency.

Cerebras, Furiosa skip MLPerf but announce new chips

Furiosa’s new chip implements the basic mathematical function of AI inference, matrix multiplication, in a different, more efficient way. Furiosa

Yesterday at the IEEE Hot Chips conference at Stanford, Cerebras unveiled its own inference service. The Sunnyvale, Calif. company makes giant chips, as big as a silicon wafer will allow, thereby avoiding interconnects between chips and vastly increasing the memory bandwidth of their devices, which are mostly used to train massive neural networks. Now it has upgraded its software stack to use its latest computer CS3 for inference.

Although Cerebras did not submit to MLPerf, the company claims its platform beats an H100 by 7x and competing AI startup Groq’s chip by 2x in LLM tokens generated per second. “Today we’re in the dial up era of Gen AI,” says Cerebras CEO and cofounder Andrew Feldman. “And this is because there’s a memory bandwidth barrier. Whether it’s an H100 from Nvidia or MI 300 or TPU, they all use the same off chip memory, and it produces the same limitation. We break through this, and we do it because we’re wafer-scale.”

Hot Chips also saw an announcement from Seoul-based Furiosa, presenting their second-generation chip, RNGD (pronounced “renegade”). What differentiates Furiosa’s chip is its Tensor Contraction Processor (TCP) architecture. The basic operation in AI workloads is matrix multiplication, normally implemented as a primitive in hardware. However, the size and shape of the matrixes, more generally known as tensors, can vary widely. RNGD implements multiplication of this more generalized version, tensors, as a primitive instead. “During inference, batch sizes vary widely, so its important to utilize the inherent parallelism and data re-use from a given tensor shape,” Furiosa founder and CEO June Paik said at Hot Chips.

Although it didn’t submit to MLPerf, Furiosa compared the performance of its RNGD chip on MLPerf’s LLM summarization benchmark in-house. It performed on-par with Nvidia’s edge-oriented L40S chip while using only 185 Watts of power, compared to L40S’s 320 W. And, Paik says, the performance will improve with further software optimizations.

IBM also announced their new Spyre chip designed for enterprise generative AI workloads, to become available in the first quarter of 2025.

At least, shoppers on the AI inference chip market won’t be bored for the foreseeable future.

A Match Made in Yorktown Heights

Harry Goldstein — Mon, 26 Aug 2024 20:06:50 +0000

It pays to have friends in fascinating places. You need look no further than the cover of this issue and the article “ IBM’s Big Bet on the Quantum-Centric Supercomputer” for evidence. The article by Ryan Mandelbaum, Antonio D. Córcoles, and Jay Gambetta came to us courtesy of the article’s illustrator, the inimitable graphic artist Carl De Torres, a longtime IEEE Spectrum contributor as well as a design and communications consultant for IBM Research.

Story ideas typically originate with Spectrum’s editors and pitches from expert authors and freelance journalists. So we were intrigued when De Torres approached Spectrum about doing an article on IBM Research’s cutting-edge work on quantum-centric supercomputing.

De Torres has been collaborating with IBM in a variety of capacities since 2009, when, while at Wired magazine creating infographics, he was asked by the ad agency Ogilvy to work on Big Blue’s advertising campaign “Let’s build a Smarter Planet.” That project went so well that De Torres struck out on his own the next year. His relationship with IBM expanded, as did his engagements with other media, such as Spectrum, Fortune, and The New York Times. “My interest in IBM quickly grew beyond helping them in a marketing capacity,” says De Torres, who owns and leads the design studio Optics Lab in Berkeley, Calif. “What I really wanted to do is get to the source of some of the smartest work happening in technology, and that was IBM Research.”

Last year, while working on visualizations of a quantum-centric supercomputer with Jay Gambetta, vice president and lead scientist of IBM Quantum at the Thomas J. Watson Research Center in Yorktown Heights, N.Y., De Torres was inspired to contact Spectrum’s creative director, Mark Montgomery, with an idea.

“I really loved this process because I got to bring together two of my favorite clients to create something really special.” —Carl De Torres

“I thought, ‘You know, I think IEEE Spectrum would love to see this work,’” De Torres told me. “So with Jay’s permission, I gave Mark a 30-second pitch. Mark liked it and ran it by the editors, and they said that it sounded very promising.” De Torres, members of the IBM Quantum team, and Spectrum editors had a call to brainstorm what the article could be. “From there everything quickly fell into place, and I worked with Spectrum and the IBM Quantum team on a visual approach to the story,” De Torres says.

As for the text, we knew it would take a deft editorial hand to help the authors explain what amounts to the peanut butter and chocolate of advanced computing. Fortunately for us, and for you, dear reader, Associate Editor Dina Genkina has a doctorate in atomic physics, in the subfield of quantum simulation. As Genkina explained to me, that speciality is “adjacent to quantum computing, but not quite the same—it’s more like the analog version of QC that’s not computationally complete.”

Genkina was thrilled to work with De Torres to make the technical illustrations both accurate and edifying. Spectrum prides itself on its tech illustrations, which De Torres notes are increasingly rare in the space-constrained era of mobile-media consumption.

“Working with Carl was so exciting,” Genkina says. “It was really his vision that made the article happen, and the scope of his ambition for the story was at times a bit terrifying. But it’s the kind of story where the illustrations make it come to life.”

De Torres was happy with the collaboration, too. “I really loved this process because I got to bring together two of my favorite clients to create something really special.”

This article appears in the September 2024 print issue.

Top Programming Languages Methodology 2024

Stephen Cass — Thu, 22 Aug 2024 15:04:34 +0000

In our goal of trying to estimate a programming language’s popularity, we realized that no one can look over the shoulder of every person writing code, whether that be a child writing a Java script for a personal Minecraft server, a mobile app developer hoping to hit it big, or an aerospace engineer writing mission-critical code for a voyage to Mars. Our Top Programming Languages interactive tries to tackle the problem of estimating a language’s popularity by looking for proxy signals.

We do this by constructing measures of popularity from a variety of data sources that we believe are good proxies for active interest for each programming language. In total, we identify 63 programming languages. We then weight each data source to create an overall index of popularity, excluding some of the lowest scorers. Below, we describe the sources of data we use to get the measures, and the weighting scheme we use to produce the overall indices.

By popularity, we mean we are trying to rank languages that are in active use. We look at three different aspects of popularity: languages in active use among typical IEEE members and working software engineers (the “Spectrum” ranking), languages that are in demand by employers (the “Jobs” ranking), and languages that are in the zeitgeist (the “Trending” ranking).

We gauged the popularity of languages using the following sources for a total of eight metrics (see below). We gathered the information for all metrics in July—August 2024. The data were gathered manually to avoid results being biased due to API changes or terminations and because many of the programming language’s names (C++, Scheme) collided with common terms found in research papers and job ads or were difficult for a search engine to parse. When a large number of search results made it impractical to resolve ambiguities by examining all of the results individually, we used a sample of each data source, and determined the relevant sample size based on estimating the true mean with 95 percent confidence. Not all data sources contain information for each programming language and we interpret this information as the programming language having “no hits” (that is, not being popular).

The results from each metric are normalized to produce a relative popularity score between 0 and 1. Then the individual metrics are multiplied by a weight factor, combined, and the result renormalized to produce an aggregate popularity score.

In aggregating metrics, we hope to compensate for statistical quirks that might distort a language’s popularity score in any particular source of data. Varying the weight factors allows us to create the different results for the Spectrum, Jobs, and Trending rankings. We fully acknowledge that, while these weights are subjective, they are based on our understanding of the sources and our prior coverage of software topics. Varying the weight factors allows us to emphasize different types of popularity and produce the different rankings. We then combined each weighted data source for each program and then renormalized the resulting frequency to produce an aggregate popularity score.

The Top Programming Languages was originally created by data journalist Nick Diakopoulos. Our statistical methodology advisor is Hilary Wething. Research assistance was provided by Elizabeth Wood. Rankings are computed using R.

Google

Google is the leading search engine in the world, making it an ideal fit for estimating language popularity. We measured the number of hits for each language by searching on the template, “X programming language” (with quotation marks) and manually recorded the number of results that were returned by the search. We took the measurement in July 2024. We like this measure because it indicates the volume of online information resources about each programming language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We recorded the number of questions tagged to each program within the last week prior to our search (August 2024). For the Mathematica/Wolfram language, we relied on the sister “Stack” for the Mathematica platform and tallied the number of programming-related questions asked in the past week. These data were gathered manually. This measure indicates what programming languages are currently trending.

IEEE Xplore Digital Library

IEEE maintains a digital library with millions of conference and journal articles covering a wide array of scientific and engineering disciplines. We searched for articles that mention each of the languages in the template “X programming” for the years 2023 and 2024, because this is the smallest timeframe for which we could access articles. For search results that returned thousands of articles, we identified the correct sample size for a 95 percent confidence interval (usually a little over 300) and pulled that number of articles. For each language we sampled, we identified the share of articles that utilize the programming language and then multiplied the total number of articles by this share to tally the likely total number of articles that reference a given programming language. We conducted this search in July 2024. This metric captures the prevalence of the different programming languages as used and referenced in engineering scholarship.

IEEE Job Site

We measured the demand for different programming languages in job postings on the IEEE Job Site. For search results that returned thousands of listings, we identified the correct sample size for a 95 percent confidence interval (usually around 300 results) and pulled that number of job listings to manually examine. For each language we sampled, we identified the share of listings that utilize the programming language and then multiplied the total number of job listings by this share to tally the likely total number of job listings that reference a given programming language. Additionally, because some of the languages we track could be ambiguous in plain text—such as lD, Go, J, Ada, and R—we searched for job postings with those words in the job description and then manually examined the results, again sampling entries if the number of results was large. The search was conducted in July 2024. We like the IEEE Job Site for its large number of non-U.S. listings, making it an ideal to measure global popularity.

CareerBuilder

We measured the demand for different programming languages on the CareerBuilder job site. We searched for “Developer” jobs offered within the United States, as this is the most popular job title for programmers. We sampled 400 job ads and manually examined them to identify which languages employers mentioned in the postings. The search was conducted in July 2024. We like the career builder site to identify the popularity of programmer jobs in the United States.

GitHub

GitHub is a public repository for many volunteer-driven open-source software projects. We used data gathered by GitHut 2.0, which measures the top 50 languages used by the number of repositories tagged with that language and draws from GitHub’s public API. We use two metrics from GitHub: repositories that have been “starred” by users to reflect long-term interests, and the number of pull requests to indicate current activity. The data cover the second quarter of 2024. These measures indicate what languages coders choose to work in when they have a personal choice.

Trinity College Dublin Library

The library of Trinity College Dublin is one of six legal deposit libraries in Ireland and the United Kingdom. A copy must be deposited with the library of any book published or distributed in Ireland, and on request any U.K. publisher or distributor must also deposit a book. We searched for all books published in the year to date that had their subject matter categorized as computer programming and totaled the number of returns. The search was conducted in June 2024. We like this library collection because it represents a large and categorized sample of works, primarily in the English language.

Discord

Discord is popular chat-room platform where many programmers exchange information. We counted the number of tags that correspond to each language. In the case of languages that could also be names of nonprogramming topics, (many nonprogramming-related topics also have dedicated Discord servers; for example, “Julia” could refer to the programming language or the Sesame Street puppet), results were manually examined. Disboard was searched in August 2024. Disboard lists many public discord servers and many young coders use the site, contributing a different demographic of coders.

The Top Programming Languages 2024

Stephen Cass — Thu, 22 Aug 2024 15:03:03 +0000

Welcome to IEEE Spectrum’s 11th annual rankings of the most popular programming languages. As always, we combine multiple metrics from different sources to create three meta rankings. The “Spectrum” ranking is weighted towards the profile of the typical IEEE member, the “Trending” ranking seeks to spot languages that are in the zeitgeist, and the “Jobs” ranking measures what employers are looking for.

You can find a full breakdown of our methodology here, but let’s jump into our results. At the top, Python continues to cement its overall dominance, buoyed by things like popular libraries for hot fields such as AI as well as its pedagogical prominence. (For most students today, if they learn one programming language in school, it’s Python.) Python’s pretty popular with employers too, although there its lead over other general purpose languages is not as large and, like last year, it plays second fiddle to the database query language SQL, which employers like to see paired with another language. SQL popularity with employers is a natural extension of today’s emphasis on networked and cloud-based system architectures, where databases become the natural repository for all the bytes a program’s logic is chewing on.

Top Programming Languages 2024 is brought to you by the IEEE Computer Society. Get connected with the world’s largest community empowering computer science and engineering professionals.

Stalwarts like Java, Javascript, and C++ also retain high rankings, but it’s what’s going on a little further down that’s particularly interesting. Typescript—a superset of Javascript—moves up several places on all the rankings, especially for Jobs, where it climbs to fourth place, versus 11th last year. Typescript’s primary differentiator over Javascript is that it enforces static typing of variables, where the type of a variable—integer, floating point, text, and so forth—must be declared before it can be used. This allows for more error checking when Typescript programs are compiled to Javascript, and the increase in reliability has proven appealing.

Another climber is Rust, a language aimed at creating system software, like C or C++. But unlike those two languages, Rust is “memory safe”, meaning it uses a variety of techniques to ensure programs can’t write to locations in memory that they are not supposed to. Such errors are a major source of security vulnerabilities. Rust’s profile has been rising sharply, boosted by things like a February cybersecurity report from the White House calling for memory safe languages to replace C and and C++. Indeed, C’s popularity appears to be on the wane, falling from fourth to ninth place on the Spectrum ranking and from 7th to 13th on the Jobs ranking.

Two languages have entered the rankings for the first time: Apex and Solidity. Apex is designed for building business applications that use a Salesforce server as a back end, and Solidity is designed for creating smart contracts on the Ethereum blockchain.

This year also saw several languages drop out of the rankings. This doesn’t mean a language is completely dead, it just means that these languages’ signal is too weak to allow them to be meaningfully ranked. Languages that dropped out included Forth, a personal favorite of mine that’s still popular with folks building 8-bit retro systems because of its tiny footprint. A weak signal is also why we haven’t included some buzzy languages such as Zig, although those proficient in it can apparently command some high salaries.

As these other languages come and go from the rankings, I have to give the shout out to the immortals, Fortran and Cobol. Although they are around 65 years old, you can still find employers looking for programmers in both. For Fortran, this tends to be for a select group of people who are also comfortable with high-energy physics, especially the kind of high-energy physics that goes boom (and with the security clearances to match). Cobol is more broadly in demand, as many government and financial systems still rely on decades-old infrastructure—and the recent paralyzing impact of the Cloudstrike/Microsoft Windows outage incident probably hasn’t done much to encourage their replacement!

IBM’s Big Bet on the Quantum-Centric Supercomputer

Jay Gambetta — Wed, 21 Aug 2024 13:00:03 +0000

Back in June 2022, Oak Ridge National Laboratory debuted Frontier—the world’s most powerful supercomputer. Frontier can perform a billion billion calculations per second. And yet there are computational problems that Frontier may never be able to solve in a reasonable amount of time.

Some of these problems are as simple as factoring a large number into primes. Others are among the most important facing Earth today, like quickly modeling complex molecules for drugs to treat emerging diseases, and developing more efficient materials for carbon capture or batteries.

However, in the next decade, we expect a new form of supercomputing to emerge unlike anything prior. Not only could it potentially tackle these problems, but we hope it’ll do so with a fraction of the cost, footprint, time, and energy. This new supercomputing paradigm will incorporate an entirely new computing architecture, one that mirrors the strange behavior of matter at the atomic level—quantum computing.

For decades, quantum computers have struggled to reach commercial viability. The quantum behaviors that power these computers are extremely sensitive to environmental noise, and difficult to scale to large enough machines to do useful calculations. But several key advances have been made in the last decade, with improvements in hardware as well as theoretical advances in how to handle noise. These advances have allowed quantum computers to finally reach a performance level where their classical counterparts are struggling to keep up, at least for some specific calculations.

For the first time, we here at IBM can see a path toward useful quantum computers, and we can begin imagining what the future of computing will look like. We don’t expect quantum computing to replace classical computing. Instead, quantum computers and classical computers will work together to run computations beyond what’s possible on either alone. Several supercomputer facilities around the world are already planning to incorporate quantum-computing hardware into their systems, including Germany’s Jupiter, Japan’s Fugaku, and Poland’s PSNC. While it has previously been called hybrid quantum-classical computing, and may go by other names, we call this vision quantum-centric supercomputing.

A Tale of Bits and Qubits

At the heart of our vision for a quantum-centric supercomputer is the quantum hardware, which we call a quantum processing unit (QPU). The power of the QPU to perform better than classical processing units in certain tasks comes from an operating principle that’s fundamentally different, one rooted in the physics of quantum mechanics.

In the standard or “classical” model of computation, we can reduce all information to strings of binary digits, bits for short, which can take on values of either 0 or 1. We can process that information using simple logic gates, like AND, OR, NOT, and NAND, which act on one or two bits at a time. The “state” of a classical computer is determined by the states of all its bits. So, if you have N bits, then the computer can be in just one of 2^N states.

But a quantum computer has access to a much richer repertoire of states during computation. A quantum computer also has bits. But instead of just 0 and 1, its quantum bits— qubits—via a quantum property known as superposition, represent 0, 1, or a linear combination of both. While a digital computer can be in just one of those 2^N states, a quantum computer can be in many logical states at once during the computation. And the superpositions the different qubits are in can be correlated with one another in a fundamental way, thanks to another quantum property known as entanglement. At the end of the computation, the qubit assumes just one state, chosen based on probabilities generated during the running of the quantum algorithm.

It’s not obvious how this computing paradigm can outperform the classical one. But in 1994, Peter Shor, a mathematician at MIT, discovered an algorithm that, using the quantum-computing paradigm, could divide large numbers into their prime factors exponentially faster than the best classical algorithm. Two years later, Lov Grover discovered a quantum algorithm that could find a particular entry in a database much faster than a classical one could.

Perhaps most importantly, since quantum computers follow the laws of quantum mechanics, they are the right tool for simulating the fundamentally quantum phenomena of our world, such as molecular interactions for drug discovery or materials design.

The Quantum-Centric Supercomputer’s Center

Before we can build a quantum-centric supercomputer, we have to make sure it’s capable of doing something useful. Building a capable enough QPU relies on constructing hardware that can re-create counterintuitive quantum behaviors.

Here at IBM, the basic building block of a quantum computation—the qubit—is made out of superconducting components. Each physical qubit consists of two superconducting plates, which act as a capacitor, wired to components called Josephson junctions, which act as a special lossless, nonlinear inductor.

The current flowing across Josephson junctions is quantized—fixed to discrete values. The Josephson junctions ensure that only two of those values (or their superpositions) are realistically accessible. The qubit is encoded in two current levels, one representing a 0, the other a 1. But, as mentioned, the qubit can also exist in a superposition of the 0 and 1 states.

Because superconductors need frigid temperatures to maintain superconductivity, the qubits and some of their control circuitry are held inside a specialty liquid-helium fridge called a dilution refrigerator.

We change the qubit states and couple qubits together with quantum instructions, commonly known as gates. These are a series of specially crafted microwave waveforms. A QPU includes all of the hardware responsible for accepting a set of quantum instructions—called a quantum circuit—and returning a single output represented by a binary string. The QPU includes the qubits plus components that amplify signals, the control electronics, and the classical computation required for tasks such as holding the instructions in memory, accumulating and separating signals from noise, and creating single binary outputs. We etch components like qubits, resonators for readouts, output filters, and quantum buses into a superconducting layer deposited on top of a silicon chip.

But it’s a challenge trying to control qubits at the supersensitive quantum level. External noise, noise from the electronics, and cross talk between control signals for different qubits all destroy the fragile quantum properties of the qubits. Controlling these noise sources has been key in reaching the point where we can envision useful quantum-centric supercomputers.

Getting the Quantum Stuff up to Snuff

No one has yet conclusively demonstrated quantum advantage—that is, a quantum computer that outperforms the best classical one on a real-world relevant task. Demonstrating true quantum advantage would herald a new era of computing, where previously intractable tasks would now be within reach.

Before we can approach this grandiose goal, we have to set our sights a bit lower, to a target we call quantum utility. Quantum utility is the ability of quantum hardware to outperform brute-force classical calculations of a quantum circuit. In other words, it’s the point where quantum hardware is better at doing quantum computations than a traditional computer is.

This may sound underwhelming, but it is a necessary stepping-stone on the way to quantum advantage. In recent years, the quantum community has finally reached this threshold. Demonstrating quantum utility of our QPU, which we did in 2023, has convinced us that our quantum hardware is advanced enough to merit being built into a quantum-centric supercomputer. Achieving this milestone has taken a combination of advances, including both hardware and algorithmic improvements.

Since 2019, we’ve been incorporating advances in semiconductor fabrication to introduce 3D integration to our chips. This gave us access to qubits from a controller chip placed below the qubit plane to reduce the wiring on the chip, a potential source of noise. We also introduced readout multiplexing, which allows us to access the information from several qubits with a single wire, drastically reducing the amount of hardware we have to put in the dilution refrigerator.

In 2023, we implemented a new way to perform quantum gates—the steps of a program that change the value of the qubits—on our hardware, using components called tunable couplers. Previously, we prevented cross talk by fabricating the qubits that respond to different frequencies so that they wouldn’t react to microwave pulses meant for other qubits. But this made it too difficult for the qubits to perform the essential task of talking to one another, and it also made the processors slow. With tunable couplers, we don’t need the frequency-specific fabrication. Instead, we introduced a sort of “on-off” switch, using magnetic fields to decide whether or not a qubit should talk to another qubit. The result: We virtually eliminated cross-talk errors between qubits, allowing us to run much faster, more reliable gates.

As our hardware improved, we also demonstrated that we could deal with some noise using an error mitigation algorithm. Error mitigation can be done in many ways. In our case, we run quantum programs, analyze how the noise in our system changes the program outputs, and then create a noise model. Then we can use classical computing and our noise model to recover what a noise-free result would look like. The surrounding hardware and software of our quantum computer therefore includes classical computing capable of performing error mitigation, suppression, and eventually, error correction.

Alongside ever-improving hardware advances, we teamed up with the University of California, Berkeley, to demonstrate in 2023 that a quantum computer running our 127-qubit quantum chip, Eagle, could run circuits beyond the ability of brute-force classical simulation—that is, methods where the classical computer exactly simulates the quantum computer in order to run the circuit, reaching quantum utility. And we did so for a real condensed-matter physics problem—namely, finding the value of a property called magnetization for a system of simplified atoms with a structure that looked like the layout of our processors’ qubits.

Left: A quantum processing unit is more than just a chip. It includes the interconnects, amplifiers, and signal filtering. It also requires the classical hardware, including the room-temperature classical computers needed to receive and apply instructions and return outputs. Right: At the heart of an IBM quantum computer is a multilayer semiconductor chip etched with superconducting circuits. These circuits comprise the qubits used to perform calculations. Chips are divided into a layer with the qubits, a layer with resonators for readout, and multiple layers of wiring for input and output.

Error Correction to the Rescue

We were able to demonstrate the ability of our quantum hardware outperforming brute-force classical simulation without leveraging the most powerful area of quantum-computing theory: quantum error correction.

Unlike error mitigation, which deals with noise after a computation, quantum error correction can remove noise as it arises during the process. And it works for a more general kind of noise; you don’t need to figure out a specific noise model first. Plus, while error mitigation is limited in its ability to scale as the complexity of quantum circuits grows, error correction will continue to work at large scales.

Error Correction

But quantum error correction comes at a huge cost: It requires more qubits, more connectivity, and more gates. For every qubit you want to compute with, you may need many more to enable error correction. Recent advances in improving hardware and finding better error-correcting codes have allowed us to envision an error-corrected supercomputer that can make those costs worthwhile.

Quantum error-correcting schemes are a bit more involved than error correction in traditional binary computers. To work at all, these quantum schemes require that the hardware error rate is below a certain threshold. Since quantum error correction’s inception, theorists have devised new codes with more relaxed thresholds, while quantum-computer engineers have developed better-performing systems. But there hasn’t yet been a quantum computer capable of using error correction to perform large-scale calculations.

Meanwhile, error-correction theory has continued to advance. One promising finding by Moscow State University physicists Pavel Panteleev and Gleb Kalachev inspired us to pursue a new kind of error-correcting code for our systems. Their 2021 paper demonstrated the theoretical existence of “good codes,” codes where the number of extra qubits required to perform error correction scales more favorably.

This led to an explosion of research into a family of codes called quantum low-density parity check codes, or qLDPC codes. Earlier this year, our team published a qLDPC code with an error threshold high enough that we could conceivably implement it on near-term quantum computers; the amount of required connectivity between qubits was only slightly beyond what our hardware already supplies. This code would need only a tenth the number of qubits as previous methods to achieve error correction at the same level.

These theoretical developments allow us to envision an error-corrected quantum computer at experimentally accessible scales, provided we can connect enough quantum processing power together, and leverage classical computing as much as possible.

Hybrid Classical-Quantum Computers for the Win

To take advantage of error correction, and to reach large enough scales to solve human-relevant problems with quantum computers, we need to build larger QPUs or connect multiple QPUs together. We also need to incorporate classical computing with the quantum system.

Quantum-centric supercomputers will include thousands of error-corrected qubits to unlock the full power of quantum computers. Here’s how we’ll get there.

2024

Heron

→ 156 qubits

→ 5K gates before errors set in

2025

Flamingo

→ Introduce l-couplers between chips

→ Connect 7 chips for 7 x 156 = 1,092 qubits

→ 5K gates before errors set in

2027

Flamingo

→ l-couplers between chips

→ 7 x 156 = 1,092 qubits

→ Improved hardware and error mitigation

→ 10K gates before errors set in

2029

Starling

→ 200 qubits

→ l-, m-, and c-couplers combined

→ Error correction

→ 100M gates

2033

BlueJay

→ 2,000 qubits

→ Error correction

→ 1B gates

Last year, we released a machine we call the IBM Quantum System Two, which we can use to start prototyping error mitigation and error correction in a scalable quantum computing system. System Two relies on larger, modular cryostats, allowing us to place multiple quantum processors into a single refrigerator with short-range interconnects, and then combine multiple fridges into a bigger system, kind of like adding more racks to a traditional supercomputer.

Along with the System Two release, we also detailed a 10-year plan for realizing our vision. Much of the early hardware work on that road map has to do with interconnects. We’re still developing the interconnects required to connect quantum chips into larger chips like Lego blocks, which we call m-couplers. We’re also developing interconnects to transfer quantum information between more distant chips, called l-couplers. We hope to prototype both m- and l-couplers by the end of this year. We’re also developing on-chip couplers that link qubits on the same chip that are more distant than their nearest neighbors—a requirement of our newly developed error-correction code. We plan to deliver this c-coupler by the end of 2026. In the meantime, we’ll be improving error mitigation so that by 2028, we can run a quantum program across seven parallel quantum chips, each chip capable of performing up to 15,000 accurate gates before the errors set in, on 156 qubits.

We’re also continuing to advance error correction. Our theorists are always looking for codes that require fewer extra qubits for more error-correcting power and allow for higher error thresholds. We must also determine the best way to run operations on information that’s encoded into the error-correcting code, and then decode that information in real time. We hope to demonstrate those by the end of 2028. That way, in 2029, we can debut our first quantum computer incorporating both error mitigation and error correction that can run up to 100 million gates until the errors take hold, on 200 qubits. Further advances in error correction will allow us to run a billion gates on 2,000 qubits by 2033.

Knitting Together a Quantum-Centric Supercomputer

The ability to mitigate and correct errors removes a major roadblock in the way of full-scale quantum computing. But we still don’t think it’ll be enough to tackle the largest, most valuable problems. For that reason, we’ve also introduced a new way of running algorithms, where multiple quantum circuits and distributed classical computing are woven together into a quantum-centric supercomputer.

Many envision the “quantum computer” as a single QPU, working on its own to run programs with billions of operations on millions of physical qubits. Instead, we envision computers incorporating multiple QPUs, running quantum circuits in parallel with distributed classical computers.

Combining the strengths of quantum and classical

Quantum-centric supercomputing leverages quantum and classical resources in parallelized workloads to run computations larger than what was possible before. A quantum-centric supercomputer is a system optimized to orchestrate work across the quantum computers and advanced classical compute clusters in the same data center.

Recent work has demonstrated techniques that let us run quantum circuits much more efficiently by incorporating classical computing with quantum processing. These techniques, called circuit knitting, break down a single quantum-computing problem into multiple quantum-computing problems and then run them in parallel on quantum processors. And then a combination of quantum and classical computers knit the circuit results together for the final answer.

Another technique uses the classical computer to run all but the core, intrinsically quantum part of the calculation. It is this last vision that we believe will realize quantum advantage first.

Therefore, a quantum computer doesn’t just include one quantum processor, its control electronics, and its dilution refrigerator—it also includes the classical processing required to perform error correction, and error mitigation.

We haven’t realized a fully integrated quantum-centric supercomputer yet. But we’re laying the groundwork with System Two, and Qiskit, our full-stack quantum-computing software for running large quantum workloads. We are building middleware capable of managing circuit knitting, and of provisioning the appropriate computing resources when and where they’re required. The next step is to mature our hardware and software infrastructure so that quantum and classical can extend one another to do things beyond the capabilities of either.

Today’s quantum computers are now scientific tools capable of running programs beyond the brute-force ability of classical simulation, at least when simulating certain quantum systems. But we must continue improving both our quantum and classical infrastructure so that, combined, it’s capable of speeding up solutions for problems relevant to humanity. With that in mind, we hope that the broader computing community will continue researching new algorithms incorporating circuit knitting, parallelized quantum circuits, and error mitigation in order to find use cases that can benefit from quantum in the near term.

And we look forward to a day when the Top 500 list of most powerful supercomputers will include machines that have quantum processors at their hearts.

Nasir Ahmed: An Unsung Hero of Digital Media

Willie D. Jones — Mon, 19 Aug 2024 12:00:02 +0000

Stop for a second and think about the Internet without digital images or video. There would be no faces on Facebook. Instagram and TikTok probably wouldn’t exist. Those Zoom meetings that took the place of in-person gatherings for school or work during the height of the COVID-19 pandemic? Not an option.

Digital audio’s place in our Internet-connected world is just as important as still images and video. It has changed the music business—from production to distribution to the way fans buy, collect, and store their favorite songs.

What do those millions of profiles on LinkedIn, dating apps, and social media platforms (and the inexhaustible selection of music available for download online) have in common? They rely on a compression algorithm called the discrete cosine transform, or DCT, which played a major role in allowing digital files to be transmitted across computer networks.

“DCT has been one of the key components of many past image- and video-coding algorithms for more than three decades,” says Touradj Ebrahimi, a professor at Ecole Polytechnique Fédérale de Lausanne, in Switzerland, who currently serves as chairman of the JPEG standardization committee. “Only a few image-compression standards not using DCT exist today,” he adds.

The Internet applications people use every day but largely take for granted were made possible by scientists and engineers who, for the most part, toiled in anonymity. One such “hidden figure” is Nasir Ahmed, the Indian-American engineer who figured out an elegant way to cut down the size of digital image files without sacrificing their most critical visual details.

Ahmed published his seminal paper about the discrete cosine transform compression algorithm he invented in 1974, a time when the fledgling Internet was exclusively dial-up and text-based. There were no pictures accompanying the words, nor could there have been, because Internet data was transmitted over standard copper telephone landlines, which was a major limitation on speed and bandwidth.

“Only a few image-compression standards not using DCT exist today.” –Touradj Ebrahimi, EPFL

These days, with the benefit of superfast chips and optical-fiber networks, data download speeds for a laptop with a fiber connection reach 1 gigabit per second. So, a music lover can download a 4-minute song to their laptop (or more likely a smartphone) in a second or two. In the dial-up era, when Internet users’ download speeds topped out at 56 kilobits per second (and were usually only half that fast), pulling down the same song from a server would have taken nearly all day. Getting a picture to appear on a computer’s screen was a process akin to watching grass grow.

Ahmed was convinced there had to be a way to cut down the size of digital files and speed up the process. He set off on a quest to represent with ones and zeros what is critical to an image being legible, while tossing aside the bits that are less important. The answer, which built on the earlier work of mathematician and information-theory pioneer Claude Shannon, took a while to come into focus. But because of Ahmed’s determination and unwavering belief in the value of what he was doing, he persevered even after others told him that it was not worth the effort.

Raised to Love Technology

It seemed almost preordained that Ahmed would have a career in one of the STEM fields. Nasir, who was born in Bengaluru, India, in 1940, was raised by his maternal grandparents. Ahmed’s grandfather was an electrical engineer who told him that he had been sent to the United States in 1919 to work at General Electric‘s location in Schenectady, N.Y. He shared tales of his time in the United States with his grandson and encouraged young Nasir to emigrate there. In 1961, after earning a bachelor’s degree in electrical engineering at the University of Visvesvaraya College of Engineering, in Bengaluru, Ahmed did just that, leaving India that fall for graduate school at the University of New Mexico, in Albuquerque. Ahmed earned a master’s degree and a Ph.D. in electrical engineering in 1963 and 1966, respectively.

During his first year in Albuquerque, he met Esther Parente, a graduate student from Argentina. They soon became inseparable and were married while he was working toward his doctorate. Sixty years later, they are still together.

The Seed of an Idea

In 1966, Ahmed, fresh out of grad school with his Ph.D., was hired as a principal research engineer at Honeywell’s newly created computer division. While there, Ahmed was first exposed to Walsh functions, a technique for analyzing digital representations of analog signals. The fast algorithms that could be created based on Walsh functions had many potential applications. Ahmed focused on using these signal-processing and analysis techniques to reduce the file size of a digital image without losing too much of the visual detail in the uncompressed version.

That research focus remained his primary interest when he returned to academia, taking a job as a professor in the electrical and computer engineering department at Kansas State University, in 1968.

Ahmed, like dozens of other researchers around the globe, was obsessed with finding the answer to a single question: How do you create a mathematical formula for deciphering which of the ones and zeros that represent a digital image need to be kept and which can be thrown away? The things he’d learned at Honeywell gave him a framework for understanding the elements of the problem and how to attack it. But the majority of the credit for the eventual breakthrough has to go to Ahmed’s steely determination and willingness to take a gamble on himself.

In 1972, he sought grant funding that would let him afford to spend the months between Kansas State’s spring and fall semesters furthering his ideas. He applied for a U.S. National Science Foundation grant, but was denied. Ahmed recalls the moment: “I had a strong intuition that I could find an efficient way to compress digital signal data. But to my surprise, the reviewers said the idea was too simple, so they rejected the proposal.”

Undaunted, Ahmed and his wife worked to make the salary he earned during the nine-month school year last through the summer so he could focus on his research. Money was tight, the couple recalls, but that moment of financial belt-tightening only seemed to heighten Ahmed’s industriousness. They persevered, and Ahmed’s long days and late nights in the lab eventually yielded the desired result.

DCT Compression Comes Together

Ahmed took a technique for turning the array of image-processing data representing an image’s pixels into a waveform, effectively rendering it as a series of waves with oscillating frequencies, and combined it with cosine functions that were already being used to model phenomena such as light waves, sound waves, and electric current. The result was a long string of numbers with values bounded by 1 and –1. Ahmed realized that by quantizing this string of values and performing a Fourier transformation to break the function into its constituent frequencies, each pixel’s data could be represented in a way that was helpful for deciding what data points must be kept and what could be omitted. Ahmed observed that the lower-frequency waves corresponded to the necessary or “high information” regions of the image, while the higher-frequency waves represented the bits that were less important and could therefore be approximated. The compressed-image files he and his team produced were one-tenth the size of the originals. What’s more, the process could be reversed, and a shrunken data file would yield an image that was sufficiently similar to the original.

After another two years of laborious testing, with he and his two collaborators running computer programs written on decks of data punch cards, the trio published a paper in IEEE Transactions On Computers titled “Discrete Cosine Transform” in January 1974. Though the paper’s publication did not make it immediately clear, the worldwide search for a reliable method of doing the lossy compression that Claude Shannon had postulated in the 1940s was over.

JPEGs, MPEGs, and More

It wasn’t until 1983 that the International Organization for Standardization (ISO) began working on the technology that would allow photo-quality images to accompany text on the screens of computer terminals. To that end, ISO established the Joint Photographic Experts Group, better known by the ubiquitous acronym JPEG. By the time the first JPEG standard was published in 1992, DCT and advances made by a cadre of other researchers had come to be recognized by the group as basic elements of their method for the digital compression and coding of still images. “This is the beauty of standardization, where several dozen bright minds are behind the success of advances such as JPEG,” says Ebrahimi.

And because video can be described as a succession of still images, Ahmed’s technique was also well suited to making video files smaller. DCT was the compression technique of choice when ISO and the international Electrotechnical Commission (IEC) established the Moving Picture Experts Group, or MPEG, for the compression and coding of audio, video, graphics, and genomic data in 1988. When the first MPEG standard was published in 1993, the World Wide Web that now includes Google Maps, dating apps, and e-commerce businesses was just four years old.

The ramping up of computer speeds and network bandwidth during that decade—along with the ability to transmit pictures and video via much smaller files—quickly transformed the Internet before anyone knew that Amazon would eventually let readers judge millions of books by their covers.

Having solved the problem that had monopolized his time and attention for several years, Ahmed resumed his career in academia. In 1993, the year the first MPEG standard went on the books, Ahmed left Kansas State and returned to the University of New Mexico. There he was a presidential professor of electrical and computer engineering until 1989, when he was promoted to chair of the ECE department. Five years after that, he became dean of UNM’s school of engineering. Ahmed held that post for two years until he was named associate provost for research and dean of graduate studies. He stayed in that job until he retired from the university in 2001 and was named professor emeritus.

NIST Announces Post-Quantum Cryptography Standards

Dina Genkina — Tue, 13 Aug 2024 10:01:02 +0000

Today, almost all data on the Internet, including bank transactions, medical records, and secure chats, is protected with an encryption scheme called RSA (named after its creators Rivest, Shamir, and Adleman). This scheme is based on a simple fact—it is virtually impossible to calculate the prime factors of a large number in a reasonable amount of time, even on the world’s most powerful supercomputer. Unfortunately, large quantum computers, if and when they are built, would find this task a breeze, thus undermining the security of the entire Internet.

Luckily, quantum computers are only better than classical ones at a select class of problems, and there are plenty of encryption schemes where quantum computers don’t offer any advantage. Today, the U.S. National Institute of Standards and Technology (NIST) announced the standardization of three post-quantum cryptography encryption schemes. With these standards in hand, NIST is encouraging computer system administrators to begin transitioning to post-quantum security as soon as possible.

“Now our task is to replace the protocol in every device, which is not an easy task.” —Lily Chen, NIST

These standards are likely to be a big element of the Internet’s future. NIST’s previous cryptography standards, developed in the 1970s, are used in almost all devices, including Internet routers, phones, and laptops, says Lily Chen, head of the cryptography group at NIST who lead the standardization process. But adoption will not happen overnight.

“Today, public key cryptography is used everywhere in every device,” Chen says. “Now our task is to replace the protocol in every device, which is not an easy task.”

Why we need post-quantum cryptography now

Most experts believe large-scale quantum computers won’t be built for at least another decade. So why is NIST worried about this now? There are two main reasons.

First, many devices that use RSA security, like cars and some IoT devices, are expected to remain in use for at least a decade. So they need to be equipped with quantum-safe cryptography before they are released into the field.

“For us, it’s not an option to just wait and see what happens. We want to be ready and implement solutions as soon as possible.” —Richard Marty, LGT Financial Services

Second, a nefarious individual could potentially download and store encrypted data today, and decrypt it once a large enough quantum computer comes online. This concept is called “harvest now, decrypt later“ and by its nature, it poses a threat to sensitive data now, even if that data can only be cracked in the future.

Security experts in various industries are starting to take the threat of quantum computers seriously, says Joost Renes, principal security architect and cryptographer at NXP Semiconductors. “Back in 2017, 2018, people would ask ‘What’s a quantum computer?’” Renes says. “Now, they’re asking ‘When will the PQC standards come out and which one should we implement?’”

Richard Marty, chief technology officer at LGT Financial Services, agrees. “For us, it’s not an option to just wait and see what happens. We want to be ready and implement solutions as soon as possible, to avoid harvest now and decrypt later.”

NIST’s competition for the best quantum-safe algorithm

NIST announced a public competition for the best PQC algorithm back in 2016. They received a whopping 82 submissions from teams in 25 different countries. Since then, NIST has gone through 4 elimination rounds, finally whittling the pool down to four algorithms in 2022.

This lengthy process was a community-wide effort, with NIST taking input from the cryptographic research community, industry, and government stakeholders. “Industry has provided very valuable feedback,” says NIST’s Chen.

These four winning algorithms had intense-sounding names: CRYSTALS-Kyber, CRYSTALS-Dilithium, Sphincs+, and FALCON. Sadly, the names did not survive standardization: The algorithms are now known as Federal Information Processing Standard (FIPS) 203 through 206. FIPS 203, 204, and 205 are the focus of today’s announcement from NIST. FIPS 206, the algorithm previously known as FALCON, is expected to be standardized in late 2024.

The algorithms fall into two categories: general encryption, used to protect information transferred via a public network, and digital signature, used to authenticate individuals. Digital signatures are essential for preventing malware attacks, says Chen.

Every cryptography protocol is based on a math problem that’s hard to solve but easy to check once you have the correct answer. For RSA, it’s factoring large numbers into two primes—it’s hard to figure out what those two primes are (for a classical computer), but once you have one it’s straightforward to divide and get the other.

“We have a few instances of [PQC], but for a full transition, I couldn’t give you a number, but there’s a lot to do.” —Richard Marty, LGT Financial Services

Two out of the three schemes already standardized by NIST, FIPS 203 and FIPS 204 (as well as the upcoming FIPS 206), are based on another hard problem, called lattice cryptography. Lattice cryptography rests on the tricky problem of finding the lowest common multiple among a set of numbers. Usually, this is implemented in many dimensions, or on a lattice, where the least common multiple is a vector.

The third standardized scheme, FIPS 205, is based on hash functions—in other words, converting a message to an encrypted string that’s difficult to reverse

The standards include the encryption algorithms’ computer code, instructions for how to implement it, and intended uses. There are three levels of security for each protocol, designed to future-proof the standards in case some weaknesses or vulnerabilities are found in the algorithms.

Lattice cryptography survives alarms over vulnerabilities

Earlier this year, a pre-print published to the arXiv alarmed the PQC community. The paper, authored by Yilei Chen of Tsinghua University in Beijing, claimed to show that lattice-based cryptography, the basis of two out of the three NIST protocols, was not, in fact, immune to quantum attacks. On further inspection, Yilei Chen’s argument turned out to have a flaw—and lattice cryptography is still believed to be secure against quantum attacks.

On the one hand, this incident highlights the central problem at the heart of all cryptography schemes: There is no proof that any of the math problems the schemes are based on are actually “hard.” The only proof, even for the standard RSA algorithms, is that people have been trying to break the encryption for a long time, and have all failed. Since post-quantum cryptography standards, including lattice cryptography, are newer, there is less certainty that no one will find a way to break them.

That said, the failure of this latest attempt only builds on the algorithm’s credibility. The flaw in the paper’s argument was discovered within a week, signaling that there is an active community of experts working on this problem. “The result of that paper is not valid, that means the pedigree of the lattice-based cryptography is still secure,” says NIST’s Lily Chen (no relation to Tsinghua University’s Yilei Chen). “People have tried hard to break this algorithm. A lot of people are trying, they try very hard, and this actually gives us confidence.”

NIST’s announcement is exciting, but the work of transitioning all devices to the new standards has only just begun. It is going to take time, and money, to fully protect the world from the threat of future quantum computers.

“We’ve spent 18 months on the transition and spent about half a million dollars on it,” says Marty of LGT Financial Services. “We have a few instances of [PQC], but for a full transition, I couldn’t give you a number, but there’s a lot to do.”

This article appears in the October 2024 print issue as “Cryptographic Standards for a Post-Quantum World.”

Amazon Vies for Nuclear-Powered Data Center

Andrew Moseman — Mon, 12 Aug 2024 18:36:36 +0000

When Amazon Web Services paid US $650 million in March for another data center to add to its armada, the tech giant thought it was buying a steady supply of nuclear energy to power it, too. The Susquehanna Steam Electric Station outside of Berick, Pennsylvania, which generates 2.5 gigawatts of nuclear power, sits adjacent to the humming data center and had been directly powering it since the center opened in 2023.

After striking the deal, Amazon wanted to change the terms of its original agreement to buy 180 megawatts of additional power directly from the nuclear plant. Susquehanna agreed to sell it. But third parties weren’t happy about that, and their deal has become bogged down in a regulatory battle that will likely set a precedent for data centers, cryptocurrency mining operations, and other computing facilities with voracious appetites for clean electricity.

Putting a data center right next to a power plant so that it can draw electricity from it directly, rather than from the grid, is becoming more common as data centers seek out cheap, steady, carbon-free power. Proposals for co-locating data centers next to nuclear power have popped up in New Jersey, Texas, Ohio, and elsewhere. Sweden is considering using small modular reactors to power future data centers.

However, co-location raises questions about equity and energy security, because directly-connected data centers can avoid paying fees that would otherwise help maintain grids. They also hog hundreds of megawatts that could be going elsewhere.

“They’re effectively going behind the meter and taking that capacity off of the grid that would otherwise serve all customers,” says Tony Clark, a senior advisor at the law firm Wilkinson Barker Knauer and a former commissioner at the Federal Energy Regulatory Commission (FERC), who has testified to a U.S. House subcommittee on the subject.

Amazon’s nuclear power deal meets hurdles

The dust-up over the Amazon-Susquehanna agreement started in June, after Amazon subsidiary Amazon Web Services filed a notice to change its interconnection service agreement (ISA) in order to buy more nuclear power from Susquehanna’s parent company, Talen Energy. Amazon wanted to increase the amount of behind-the-meter power it buys from the plant from 300 MW to 480 MW. Shortly after it requested the change, utility giants Exelon and American Electric Power (AEP), filed a protest against the agreement and asked FERC to hold a hearing on the matter.

Their complaint: the deal between Amazon and the nuclear plant would hurt a third party, namely all the customers who buy power from AEP or Exelon utilities. The protest document argues that the arrangement would shift up to $140 million in extra costs onto the people of Pennsylvania, New Jersey, and other states served by PJM, a regional transmission organization that oversees the grid in those areas. “Multiplied by the many similar projects on the drawing board, it is apparent that this unsupported filing has huge financial consequences that should not be imposed on ratepayers without sufficient process to determine and evaluate what is really going on,” their complaint says.

Susquehanna dismissed the argument, effectively saying that its deal with Amazon is none of AEP and Exelon’s business. “It is an unlawful attempt to hijack this limited [ISA] amendment proceeding that they have no stake in and turn it into an ad hoc national referendum on the future of data center load,” Susquehanna’s statement said. (AEP, Exelon, Talen/Susquehanna, and Amazon all declined to comment for this story.)

More disputes like this will likely follow as more data centers co-locate with clean energy. Kevin Schneider, a power system expert at Pacific Northwest National Laboratory and research professor at Washington State University, says it’s only natural that data center operators want the constant, consistent nature of nuclear power. “If you look at the base load nature of nuclear, you basically run it up to a power level and leave it there. It can be well aligned with a server farm.”

Data center operators are also exploring energy options from solar and wind, but these energy sources would have a difficult time matching the constancy of nuclear, even with grid storage to help even out their supply. So giant tech firms look to nuclear to keep their servers running without burning fossil fuels, and use that to trumpet their carbon-free achievements, as Amazon did when it bought the data center in Pennsylvania. “Whether you’re talking about Google or Apple or Microsoft or any of those companies, they tend to have corporate sustainability goals. Being served by a nuclear unit looks great on their corporate carbon balance sheet,” Clark says.

Costs of data centers seeking nuclear energy

Yet such arrangements could have major consequences for other energy customers, Clark argues. For one, directing all the energy from a nuclear plant to a data center is, fundamentally, no different than retiring that plant and taking it offline. “It’s just a huge chunk of capacity leaving the system,” he says, resulting in higher prices and less energy supply for everyone else.

Another issue is the “behind-the-meter” aspect of these kinds of deals. A data center could just connect to the grid and draw from the same supply as everyone else, Clark says. But by connecting directly to the power plant, the center’s owner avoids paying the administrative fees that are used to maintain the grid and grow its infrastructure. Those costs could then get passed on to businesses and residents who have to buy power from the grid. “There’s just a whole list of charges that get assessed through the network service that if you don’t connect through the network, you don’t have to pay,” Clark says. “And those charges are the part of the bill that will go up” for everyone else.

Even the “carbon-free” public relations talking points that come with co-location may be suspect in some cases. In Washington State, where Schneider works, new data centers are being planted next to the region’s abundant hydropower stations, and they’re using so much of that energy that parts of the state are considering adding more fossil fuel capacity to make ends meet. This results in a “zero-emissions shell game,” Clark wrote in a white paper on the subject.

These early cases are likely only the beginning. A report posted in May from the Electric Power Research Institute predicts energy demand from data centers will double by 2030, a leap driven by the fact that AI queries need ten times more energy than traditional internet searches. The International Energy Agency puts the timeline for doubling sooner–in 2026. Data centers, AI, and the cryptocurrency sector consumed an estimated 460 terawatt-hours (TWh) in 2022, and could reach more than 1000 TWh in 2026, the agency predicts.

Data centers face energy supply challenges

New data centers can be built in a matter of months, but it takes years to build utility-scale power projects, says Poorvi Patel, manager of strategic insights at Electric Power Research Institute and contributor to the report. The potential for unsustainable growth in electricity needs has put grid operators on alert, and in some cases has sent them sounding the alarm. Eirgrid, a state-owned transmission operator in Ireland, last week warned of a “mass exodus” of data centers in Ireland if it can’t connect new sources of energy.

There’s only so much existing nuclear power to go around, and enormous logistical and regulatory roadblocks to building more. So data center operators and tech giants are looking for creative solutions. Some are considering small modular reactors (SMRs)–which are advanced nuclear reactors with much smaller operating capacities than conventional reactors. Nano Nuclear Energy, which is developing microreactors–a particularly small type of SMR–last month announced an agreement with Blockfusion to explore the possibility of powering a currently defunct cryptomining facility in Niagara Falls, New York.

“To me, it does seem like a space where, if big tech has a voracious electric power needs and they really want that 24/7, carbon-free power, nuclear does seem to be the answer,” Clark says. “They also have the balance sheets to be able to do some of the risk mitigation that might make it attractive to get an SMR up and running.”

This article appears in the October 2024 print issue as “Utilities Question Amazon’s Nuclear Power Deal.”

Quantum Computers, Coming to a Data Center Near You

Edd Gent — Mon, 29 Jul 2024 16:17:05 +0000

Quantum processors are expected to massively outperform classical ones on certain problems, but for many computing tasks they offer little advantage. That’s leading to a growing focus on hybrid computing systems that could combine the best of both approaches. But getting such different technologies to work together smoothly is challenging. Quantum hardware struggles in a noisy data center, doesn’t naturally fit in with software architecture, and there is a technical language barrier between quantum and classical computing engineers.

Combining processors that excel at different tasks is a well-established idea. High-performance computing (HPC) frequently relies on a mixture of CPUs and special-purpose accelerators like graphics processing units (GPUs), and many kinds of computers feature dedicated co-processors to tackle problems like signal processing, networking, or encryption. It’s long been recognized that quantum computing is likely to follow a similar path, says Yuval Boger, chief commercial officer at quantum hardware maker QuEra Computing in Boston. That’s because, despite its strengths on certain intractable problems, the technology is too slow and unreliable for a lot of computing tasks.

“Had you asked me two years ago, I would have said you’d be crazy to have an on-premises quantum computer. But it turns out that’s what everyone wants.” —Yuval Boger, QuEra Computing

But while most co-processors are built using the same CMOS technology and operate in fundamentally similar ways, quantum processors rely on a completely different computing paradigm. There are also a dizzying array of hardware choices, including superconducting circuits, ion-traps, and neutral atoms. Efforts to merge these technologies with classical computers have mainly been underway in the laboratories of quantum computing companies. But lately, says Boger, customers increasingly want to integrate quantum processors into their own data centers or HPC facilities, opening up new engineering challenges.

“Had you asked me two years ago, I would have said you’d be crazy to have an on-premises quantum computer,” he says. “But it turns out that’s what everyone wants.”

“It’s integrating a new beast into the data center.” —Yuval Boger, QuEra Computing

So far, most quantum hardware companies have provided cloud access to their devices. But in the last couple of years, major players have announced deals to install their machines at data centers, supercomputing centers and national laboratories. Reasons are varied, says Boger, including concerns around sending sensitive data over the cloud or a desire to kick-start local quantum ecosystems by giving researchers direct access to hardware. But a major consideration is that transmitting data back and forth over the Internet can introduce latencies that significantly slow down hybrid algorithms.

Hardware Hiccups

However, quantum computers are very different from the hardware these facilities normally deal with. “It’s integrating a new beast into the data center,” says Boger. “Both from a physical perspective and also from a software perspective it just takes work.”

The fragile quantum states these devices rely on pose a particular challenge, says Jonathan Burnett, technical director of quantum hardware at Oxford Quantum Circuits (OQC) in the United Kingdom. Early quantum computers were built in laboratories where it was possible to control things like vibration, noise and electromagnetic interference. Data centers, in contrast, are full of loud cooling fans and rogue electromagnetic radiation caused by high power electronics. “You’ve actually got, in many ways, a horrible environment for anything quantum,” says Burnett.

“Our runtime is maybe a second. That really doesn’t fit with this model of supercomputing resource management.” —Jamie Friel, Oxford Quantum Circuits

Quantum computers also look very different from the uniform server cabinets these facilities normally house. Most quantum computing technologies, including the superconducting circuits OQC uses, must operate at cryogenic temperatures. That means they need to be enclosed in large dilution refrigerators that require liquid helium to cool the device close to absolute zero (though QuEra’s neutral atom-based processors use laser cooling and don’t require dilution refrigerators).

Software Stumbling Blocks

Hardware isn’t the only problem. Creating orchestration systems to efficiently share computing workloads between quantum and classical devices requires a lot of software engineering. Integrating into HPC centers presents particular challenges, says Jamie Friel, compiler team manager at OQC. These facilities rely on workload managers to allocate resources, but the software is designed to deal with algorithms that require vast amounts of memory and compute for days at a time. “Our runtime is maybe a second,” he says. “That really doesn’t fit with this model of supercomputing resource management.”

“You have to become more serious about manufacturing, and support, and treating these systems as systems as opposed to science experiments.” —David Rivas, Rigetti

Friel says they’ve created a workaround by putting a software node in front of the quantum computer, which looks like an HPC resource to the workload manager. But behind this is a separate job queue, which schedules quantum computations.

Adding to these engineering challenges is the fact that the physicists and computer scientists who work at quantum computing companies speak a very different language to data center engineers. Burnett says this has required them to reconsider a lot of their documentation and how they think and talk about their devices.

This is the main challenge for quantum hardware providers as they try to integrate with the broader computing ecosystem, says David Rivas, chief technology officer of Rigetti, in Berkley, Calif., which sells superconducting quantum computers. “You have to become more serious about manufacturing, and support, and treating these systems as systems as opposed to science experiments.”

Language Barrier

One unavoidable problem is that quantum computers are analog devices. That means their parameters tend to drift and they have to be regularly re-tuned, something data center engineers aren’t used to doing. Quantum computing companies also have to create operating systems and software tools that abstract away complicated analog hardware details, and present the hybrid system as a single coherent device.

“It can look at the very highest level much like writing classical computing environments,” Rivas says. “But this isn’t Linux running on a single processor or collection of cores. This is a distributed environment of communicating processes integrating these two disparate kinds of technologies.”

Ultimately though, as quantum computers become more powerful, Rivas suspects customers will increasingly want lower level access to squeeze out extra performance. And due to the idiosyncratic nature of each company’s quantum hardware they will have to create bespoke tooling to support such customers.

Further into the future, Rivas suspects the integration between classical and quantum hardware will deepen considerably. Existing quantum computers are already essentially hybrid devices, as the quantum processor is hooked up to a classical control system. He thinks eventually this control system will effectively merge with the HPC system.

“The next logical step is to build a control system for a quantum computer that looks an awful lot like an HPC blade [server], and then insert that into a node,” says Rivas. “Now you have a quantum enabled node for HPC.”

“We have to build quantum computing systems, not just qubits.” —David Rivas, Rigetti

This would allow the quantum processor to communicate with classical resources over very high speed connections, potentially speeding up hybrid algorithms. But just as importantly, it would provide the control system with access to HPC resources. Rivas says that could be important for carrying out error-correction on larger quantum systems, which will require heavy classical computations to be carried out very quickly.

Integrating their devices into classical computing environments is also helping clarify things for quantum computing companies, says Rivas. Much of their focus has, quite rightly, been on designing the best qubits, and things like software architectures and connectivity issues have often been neglected. But that’s starting to change, says Rivas.

“All this stuff that we’re talking about has to become part of the underlying DNA of the company,” he adds. “We have to build quantum computing systems, not just qubits.”

How Amazon’s New CPU Fights Cybersecurity Threats

Matthew S. Smith — Thu, 25 Jul 2024 12:00:02 +0000

Amazon’s Graviton4 CPU, the latest in its line of Arm-based server processors, promises the usual performance gains; the cloud-computing giant claims it provides up to 30 percent better compute performance, and 75 percent more memory bandwidth, than the chip’s predecessor.

Beyond expected performance, however, Graviton4 places the spotlight on security. It’s the first of the line to include features such as Branch Target Identification (BTI), a component of the Arm architecture designed to protect against sophisticated cyber threats while delivering significant performance gains.

“AWS is the only cloud provider that we know of to announce that all high-speed interfaces on a processor such as Graviton4 are encrypted, or to have made a CPU that supports Pointer Authentication and Branch Target Identification generally available,” says Rahul Kulkarni, director of compute and AI/ML at Amazon Web Services.

Graviton4’s Security Focus

Introduced as part of the Armv8.5-A architecture’s security extensions in 2018, BTI mitigates against branch prediction attacks, a type of side-channel attack that exploits the branch prediction mechanism found in nearly all modern processors. Branch prediction improves a CPU’s performance by guessing the outcome of conditional operations. This speeds computing, because it allows the processor to continue working while it awaits the answer to conditions such as if/then. Any work is “speculative”, and is undone if the guess was wrong. But attackers have learned to weaponize branch prediction and other speculative execution functions to read or manipulate data; this led to the Spectre and Meltdown attacks in 2018. Dozens of new branch prediction attacks have appeared since.

“[We allow] customers using Amazon Linux 2023 to get the defense in depth protection from these features by default,” says Kulkarni.

BTI combats these exploits by marking certain target memory addresses as “valid”. If the branch predictor attempts to address an unmarked memory address, a security exception occurs, preventing speculative execution of the potentially malicious code. BTI is backed up by Pointer Authentication. Pointers are variables that store the memory address of another variable. The authentication feature adds a cryptographic signature to authenticate memory pointers, thus helping thwart attacks that attempt to alter data in memory.

Amazon makes these features of the Arm architecture accessible by building them into Amazon Linux 2023, the customized version of Linux available through the company’s “elastic cloud computing” service, EC2. “This allows customers using Amazon Linux 2023 to get the defense in depth protection from these features by default,” says Kulkarni.

Graviton4 also hedges against attacks by encrypting data across high-speed hardware interfaces. This includes Graviton’s memory and AWS Nitro cards, which house Amazon’s proprietary input/output hardware and move data within AWS data centers. Encrypting data across these links should decrease the possibility of man-in-the-middle attacks designed to intercept data as it passes between different elements of Amazon’s server infrastructure.

Graviton4’s Computing Power

Graviton4 is the fourth generation in Amazon’s Graviton CPU architecture. Launched in November of 2018, Graviton is an Arm-based 64-bit CPU designed in-house specifically for Amazon’s Web Services (AWS). The original Graviton CPU, designed by Amazon subsidiary Annapurna Labs, was part of a wave of Arm-based server CPUs that began to arrive through the later end of the prior decade.

“Many people told me it was impossible to build a chip that could compete with the x86 CPUs and didn’t use the x86 architecture,” Ali Saidi, senior principal engineer at AWS, said in an interview published by Amazon’s A to Z blog. “But 25 years ago, x86 wasn’t the dominant architecture. The innovation and economies of scale of the PC drove success in other areas like servers. Since it happened before, I knew it could happen again.”

Amazon Web Services’ chief evangelist Jeff Barr poses with the Graviton4 processor.Amazon

Graviton4 has improved significantly since the first edition; the core count has increased from 16 to 96 cores, the architecture has moved from Arm’s general-purpose Cortex cores to the server-specific Neoverse N2, and the L2 cache size has expanded from 8MB to 192MB.

Kulkarni says ECS2 instances powered by Graviton4 should deliver “up to a 30 percent performance increase,” over Graviton 3, and that “third-parties, such as SmugMug [which owns Flickr] and SAP have corroborated 20 to 40 percent improvements for the same instance size.”

First announced in November of 2023, EC2 instances with Graviton4 are now generally available; Amazon says that besides SmugMug and SAP, Epic Games, Honeycomb.io, and ClickHouse are among the customers that have leapt from prior Graviton instances to the new hardware.

The Rise of Groupware

Ernie Smith — Wed, 24 Jul 2024 15:00:03 +0000

A version of this post originally appeared on Tedium, Ernie Smith’s newsletter, which hunts for the end of the long tail.

These days, computer users take collaboration software for granted. Google Docs, Microsoft Teams, Slack, Salesforce, and so on, are such a big part of many people’s daily lives that they hardly notice them. But they are the outgrowth of years of hard work done before the Internet became a thing, when there was a thorny problem: How could people collaborate effectively when everyone’s using a stand-alone personal computer?

The answer was groupware, an early term for collaboration software designed to work across multiple computers attached to a network. At first, those computers were located in the same office, but the range of operation slowly expanded from there, forming the highly collaborative networked world of today. This post will trace some of this history, starting from early ideas formed at Stanford Research Institute by the team of famed computer pioneer Douglas Engelbart, to a smaller company, Lotus, that hit the market with its groupware program, Notes, at the right time, to Microsoft’s ill-fated attempt to enter the groupware market, including never before seen footage of Bill Gates on Broadway.

In the early days of the computing era, when IBM’s PC reigned supreme, collaboration was difficult. Ross Anthony Willis/Fairfax Media/Getty Images

How the PC made us forget about collaboration for a while

Imagine that it’s the early-to-mid-1980s and that you run a large company. You’ve invested a lot of money into personal computers, which your employees are now using—IBM PCs, Apple Macintoshes, clones, and the like. There’s just one problem: You have a bunch of computers, but they don’t talk to one another.

If you’re in a small office and need to share a file, it’s no big deal: You can just hand a floppy disk off to someone on the other side of the room. But what if you’re part of an enterprise company and the person you need to collaborate with is on the other side of the country? Passing your colleague a disk doesn’t work.

The new personal-computing technologies clearly needed to do more to foster collaboration. They needed to be able to take input from a large group of people inside an office, to allow files to be shared and distributed, and to let multiple users tweak and mash information with everyone being able to sign off on the final version.

The hardware that would enable such collaboration software, or “groupware” as it tended to be called early on, varied by era. In the 1960s and ’70s, it was usually a mainframe-to-terminal setup, rather than something using PCs. Later, in the 1980s, it was either a token ring or Ethernet network, which were competing local-networking technologies. But regardless of the hardware used for networking, the software for collaboration needed to be developed.

Stanford Research Institute engineer Douglas Engelbart is sometimes called “the father of groupware.”Getty Images

Some of the basic ideas behind groupware were first forged at the Stanford Research Institute by a Douglas Engelbart–led team, in the 1960s, working on what they called an oN-Line System (NLS). An early version of NLS was presented in 1968 during what became known as the “Mother of All Demos.” It was essentially a coming-out party for many computing innovations that would eventually become commonplace. If you have 90 minutes and want to see something 20-plus years ahead of its time, watch this video.

In the years that followed, on top of well-known innovations like the mouse, Engelbart’s team developed tools that anticipated groupware, including an “information center,” an early precursor of the server in a client-server architecture, and tracking edits made to text files by different people, an early precursor of version control.

By the late 1980s, at a point when the PC had begun to dominate the workplace, Engelbart was less impressed with what had been gained than with what had been lost in the process. He wrote (with Harvey Lehtman) in Byte magazine in 1988:

The emergence of the personal computer as a major presence in the 1970s and 1980s led to tremendous increases in personal productivity and creativity. It also caused setbacks in the development of tools aimed at increasing organizational effectiveness—tools developed on the older time-sharing systems.
To some extent, the personal computer was a reaction to the overloaded and frustrating time-sharing systems of the day. In emphasizing the power of the individual, the personal computer revolution turned its back on those tools that led to the empowering of both co-located and distributed work groups collaborating simultaneously and over time on common knowledge work.
The introduction of local- and wide-area networks into the personal computer environment and the development of mail systems are leading toward some of the directions explored on the earlier systems. However, some of the experiences of those earlier pioneering systems should be considered anew in evolving newer collaborative environments.

Groupware comes of age

Groupware finally started to catch on in the late 1980s, with tech companies putting considerable resources into developing collaboration software—perhaps taken in by the idea of “orchestrating work teams,” as an Infoworld piece characterized the challenge in 1988. The San Francisco Examiner reported, for example, that General Motors had invested in the technology, and was beginning to require its suppliers to accept purchase orders electronically.

Focusing on collaboration software was a great way for independent software companies to stand out, this being an area that large companies—Microsoft in particular—had basically ignored. Today, Microsoft is the 800-pound gorilla of collaboration software, thanks to its combination of Teams and Office 365. But it took the tech giant a very long while to get there: Microsoft started taking the market seriously only around 1992.

One company in particular was well-positioned to take advantage of the opening that existed in the 1980s. That was the Lotus Development Corporation, a Cambridge, Mass.–based software company that made its name with its Lotus 1-2-3 spreadsheet program for IBM PCs.

Lotus did not invent groupware or coin the word—on top of Engelbart’s formative work at Stanford, the term had been around for years before Lotus Notes came on the scene. But it was the company that brought collaboration software to everyone’s attention.

Ray Ozzie [left] was primarily responsible for the development of Lotus Notes, the first popular groupware solution. Left: Ann E. Yow-Dyson/Getty Images; Right: James Keyser/Getty Images

The person most associated with the development of Notes was Ray Ozzie, who was recruited to Lotus after spending time working on VisiCalc, an early spreadsheet program. Ozzie essentially built out what became Notes while working at Iris Associates, a direct offshoot of Lotus that Ozzie founded to develop the Notes application. After some years of development in stealth mode, the product was released in 1989.

Ozzie explained his inspiration for Notes to Jessica Livingston, who described this history in her book, Founders At Work:

In Notes, it was (and this is hard to imagine because it was a different time) the concept that we’d all be using computers on our desktops, and therefore we might want to use them as communication tools. This was a time when PCs were just emerging as spreadsheet tools and word processing replacements, still available only on a subset of desks, and definitely no networks. It was ’82 when I wrote the specs for it. It had been based on a system called PLATO [Programmed Logic for Automatic Teaching Operations] that I’d been exposed to at college, which was a large-scale interactive system that people did learning and interactive gaming on, and things like that. It gave us a little bit of a peek at the future—what it would be like if we all had access to interactive systems and technology.

Building an application based on PLATO turned out to be the right idea at the right time, and it gave Lotus an edge in the market. Notes included email, a calendaring and scheduling tool, an address book, a shared database, and programming capabilities, all in a single front-end application.

Lotus Notes on Computer Chronicles Fall 1989

As an all-in-one platform built for scale, Notes gained a strong reputation as an early example of what today would be called a business-transformation tool, one that managed many elements of collaboration. It was complicated from an IT standpoint and required a significant investment to maintain. In a way, what Notes did that was perhaps most groundbreaking was that it helped turn PCs into something that large companies could readily use.

As Fortune noted in 1994, Lotus had a massive lead in the groupware space, in part because the software worked essentially the same anywhere in a company’s network. We take that for granted now, but back then it was considered magical:

Like Lotus 1-2-3, Notes is easy to customize. A sales organization, for instance, might use it to set up an electronic bulletin board that lets people pool information about prospective clients. If some of the info is confidential, it can be restricted so not everyone can call it up.
Notes makes such homegrown applications and the data they contain accessible throughout an organization. The electronic bulletin board you consult in Singapore is identical to the one your counterparts see in Sioux City, Iowa. The key to this universality is a procedure called replication, by which Notes copies information from computer to computer throughout the network. You might say Ozzie figured out how to make the machines telepathic—each knows what the others are thinking.

This article reported that around 4,000 major companies had purchased Notes, including Chase Manhattan, Compaq Computer, Delta Air Lines, Fluor, General Motors, Harley-Davidson, Hewlett-Packard, IBM, Johnson & Johnson, J.P. Morgan, Nynex, Sybase, and 3M. While it wasn’t dominant in the way Windows was, its momentum was hard to ignore.

A 1996 commercial for Notes highlighted its use by FedEx. Other commercials would use the stand-up comedian Denis Leary or be highly conceptual. Rarely, if ever, would these television advertisements show the software.

In the mid-1990’s, it was common for magazines to publish stories about how Notes reshaped businesses large and small. A 1996 Inc. piece, for example, described how a natural-foods company successfully produced a new product in just eight months, a feat the company directly credited to Notes.

“It’s become our general manager,” Groveland Trading Co. president Steve McDonnell recalled.

Notes wasn’t cheap (InfoWorld lists the price circa 1990 as US $62,000), and it was complicated to manage. But the positive results it enabled were immensely hard to ignore. IBM noticed and ended up buying Lotus in 1995, almost entirely to get ahold of Notes. Even earlier, Microsoft had realized that office collaboration was a big deal, and they wanted in.

Microsoft jumps on the groupware bandwagon

Microsoft’s first foray into collaboration software was its 1992 release of Windows for Workgroups. Despite great efforts to promote the release, the software was not a commercial success. Daltrois/Flickr

Microsoft had high hopes for Windows for Workgroups, the networking-focused variant of its popular Windows 3.1 software suite. To create buzz for it, the company pulled out all the stops. Seriously.

In the fall of 1992, Microsoft paid something like $2 million to put on a Broadway production with Bill Gates literally center stage, at New York City’s Gershwin Theater, one of the largest on Broadway. It was a wild show, and yet, somehow, there is no video of this event currently posted online—until now. The only person I know of who has a video recording of this extravaganza is, fittingly enough, Ray Ozzie, the groupware guru and Notes inventor. Ozzie later served as a top executive at Microsoft, famously replacing Bill Gates as Chief Software Architect in the mid-2000s, and he has shared this video with us for this post:

The 1992 one-day event was not a hit. Watch to see why. (Courtesy of Ray Ozzie and the Microsoft Corporation)

00:00 Opening number
02:23 “My VGA can hardly wait for your CPU to reciprocate”
05:17 Bill Gates enters the stage
27:55 “Get ready, get set” musical number
31:50 Bit with Mike Appe, Microsoft VP of sales
58:30 Bill Gates does jumping jacks

A 1992 Washington Post article describes the performance, which involved dozens of actors, some of whom were dressed like the Blues Brothers. At one point, Gates did jumping jacks. Gates himself later said, “That was so bad, I thought [then Microsoft CEO] Ballmer was going to retch.” For those who don’t have an extra hour to spend, here is a summary:

To get a taste of the show, watch this news segment from channel 4. Courtesy of Microsoft Corporate Archives

Despite all the effort to generate fanfare, Windows for Workgroups was not a hit. While Windows 3.1 was dominant, Microsoft had built a program that didn’t seem to capture the burgeoning interest in collaborative work in a real way. Among other things, it didn’t initially support the TCP/IP networking protocol, despite the fact that it was the networking technology that was winning the market and enabled the rise of the Internet.

In its original version, Windows for Workgroups carried such a negative reputation in Microsoft’s own headquarters that the company nicknamed it Windows for Warehouses, referring to the company’s largely unsold inventory, according to Microsoft’s own expert on company lore, Raymond Chen.

Unsuccessful as it was, the fact that it existed in the first place hinted at Microsoft’s general acknowledgement that perhaps this networking thing was going to catch on with its users.

Launched in late 1992, a few months after Windows 3.1 itself, the product was Microsoft’s first attempt at integrated networking in a Windows package. The software enabled file-sharing across servers, printer sharing, and email—table stakes in the modern day but at the time a big deal.

This video presents a very accurate view of what it was like to use Windows in 1994.

Unfortunately, it was a big deal that came a few years late. Microsoft itself was so lukewarm on the product that the company had to update it to Windows for Workgroups 3.11 just a year later, whose marquee feature wasn’t improved network support but increased disk speed. Confusingly, the company had just released Windows NT by this point, a program that better matched the needs of enterprise customers.

The work group terminology Microsoft introduced with Windows for Workgroups stuck around, though, and it is actually used in Windows to this day.

In 2024, group-oriented software feels like the default paradigm, with single-user apps being the anomaly. Over time, groupware became so pervasive that people no longer think of it as groupware, though there are plenty of big, hefty, groupware-like tools out there, like Salesforce. Now, it’s just software. But no one should forget the long history of collaboration software or its ongoing value. It’s what got most of us through the pandemic, even if we never used the word “groupware” to describe it.

Quantum Leap: Sydney’s Leading Role in the Next Tech Wave

BESydney — Wed, 24 Jul 2024 00:00:02 +0000

This is a sponsored article brought to you by BESydney.

Australia plays a crucial role in global scientific endeavours, with a significant contribution recognized and valued worldwide. Despite comprising only 0.3 percent of the world’s population, it has contributed over 4 percent of the world’s published research.

Renowned for collaboration, Australian scientists work across disciplines and with international counterparts to achieve impactful outcomes. Notably excelling in medical sciences, engineering, and biological sciences, Australia also has globally recognized expertise in astronomy, physics and computer science.

As the country’s innovation hub and leveraging its robust scientific infrastructure, world-class universities and vibrant ecosystem, Sydney is making its mark on this burgeoning industry.

The city’s commitment to quantum research and development is evidenced by its groundbreaking advancements and substantial government support, positioning it at the forefront of the quantum revolution.

Sydney’s blend of academic excellence, industry collaboration and strategic government initiatives is creating a fertile ground for cutting-edge quantum advancements.

Sydney’s quantum ecosystem

Sydney’s quantum industry is bolstered by the Sydney Quantum Academy (SQA), a collaboration between four top-tier universities: University of NSW Sydney (UNSW Sydney), the University of Sydney (USYD), University of Technology Sydney (UTS), and Macquarie University. SQA integrates over 100 experts, fostering a dynamic quantum research and development environment.

With strong government backing Sydney is poised for significant growth in quantum technology, with a projected A$2.2 billion industry value and 8,700 jobs by 2030. The SQA’s mission is to cultivate a quantum-literate workforce, support industry partnerships and accelerate the development of quantum technology.

Professor Hugh Durrant-Whyte, NSW Chief Scientist and Engineer, emphasizes Sydney’s unique position: “We’ve invested in quantum for 20 years, and we have some of the best people at the Quantum Academy in Sydney. This investment and talent pool make Sydney an ideal place for pioneering quantum research and attracting global talent.”

Key institutions and innovations

UNSW’s Centre of Excellence for Quantum Computation and Communication Technology is at the heart of Sydney’s quantum advancements. Led by Scientia Professor Michelle Simmons AO, the founder and CEO of Silicon Quantum Computing, this centre is pioneering efforts to develop the world’s first practical supercomputer. This team is at the vanguard of precision atomic electronics, pioneering the fabrication of devices in silicon that are pivotal for both conventional and quantum computing applications and they have created the narrowest conducting wires and the smallest precision transistors.

“We can now not only put atoms in place but can connect complete circuitry with atomic precision.” —Michelle Simmons, Silicon Quantum Computing

Simmons was named 2018 Australian of the Year and won the 2023 Prime Minister’s Prize for Science for her work in creating the new field of atomic electronics. She is an Australian Research Council Laureate Fellow, a Fellow of the Royal Society of London, the American Academy of Arts and Science, the American Association of the Advancement of Science, the UK Institute of Physics, the Australian Academy of Technology and Engineering and the Australian Academy of Science.

In response to her 2023 accolade, Simmons said: “Twenty years ago, the ability to manipulate individual atoms and put them where we want in a device architecture was unimaginable. We can now not only put atoms in place but can connect complete circuitry with atomic precision—a capability that was developed entirely in Australia.”

The Design Futures Lab at UNSW in Sydney, Australia, is a hands-on teaching and research lab that aims to inspire exploration, innovation, and research into fabrication, emerging technologies, and design theories.UNSW

Government and industry support

In April 2024, the Australian Centre for Quantum Growth program, part of the National Quantum Strategy, provided a substantial four-year grant to support the quantum industry’s expansion in Australia. Managed by the University of Sydney, the initiative aims to establish a central hub that fosters industry growth, collaboration, and research coordination.

This centre will serve as a primary resource for the quantum sector, enhancing Australia’s global competitiveness by promoting industry-led solutions and advancing technology adoption both domestically and internationally. Additionally, the centre will emphasise ethical practices and security in the development and application of quantum technologies.

Additionally, Sydney hosts several leading quantum startups, such as Silicon Quantum Computing, Quantum Brilliance, Diraq and Q-CTRL, which focus on improving the performance and stability of quantum systems.

Educational excellence

Sydney’s universities are globally recognized for their contributions to quantum research. They nurture future quantum leaders, and their academic prowess attracts top talent and fosters a culture of innovation and collaboration.

Sydney hosts several leading quantum startups, such as Silicon Quantum Computing, Quantum Brilliance, Diraq, and Q-CTRL, which focus on improving the performance and stability of quantum systems.

The UNSW Sydney is, one of Sydney’s universities, ranked among the world’s top 20 universities, and boasts the largest concentration of academics working in AI and quantum technologies in Australia.

UNSW Sydney Professor Toby Walsh is Laureate Fellow and Scientia Professor of Artificial Intelligence at the Department of Computer Science and Engineering at the University of New South Wales. He explains the significance of this academic strength: “Our students and researchers are at the cutting edge of quantum science. The collaborative efforts within Sydney’s academic institutions are creating a powerhouse of innovation that is driving the global quantum agenda.”

Sydney’s strategic investments and collaborative efforts in quantum technology have propelled the city to the forefront of this transformative field. With its unique and vibrant ecosystem, a blend of world-leading institutions, globally respected talent and strong government and industry support, Sydney is well-positioned to lead the global quantum revolution for the benefit of all. For more information on Sydney’s science and engineering industries visit besydney.com.au.

Windows on Arm Is Here to Stay

Matthew S. Smith — Tue, 09 Jul 2024 12:00:01 +0000

For the first time in history, there’s a good chance your next Windows laptop won’t have an x86 chip inside.

Microsoft launched a new generation of AI-focused Windows laptops, called Copilot Plus PCs, in June of 2024. Controversy surrounding one of Microsoft’s key AI features made for a shaky debut, but it’s built on a sound foundation: Qualcomm’s Snapdragon X, an Arm system-on-a-chip that goes toe-to-toe with the best from AMD and Intel. Its debut represents a seismic shift in the Windows world.

“It’s the most exciting time for PCs I’ve seen in my entire life,” said Anshel Sag, principal analyst at Moor Insights & Strategy. “We are getting better products from the industry because of competition, and ultimately that’s the best thing for everyone. It’s driving better products, prices, software, and the users win.”

Qualcomm saves Microsoft’s awkward Copilot Plus PC launch

Microsoft launched Windows for Arm in 2012 with the Surface 2-in-1, powered by Nvidia’s Tegra 3. It didn’t catch on. The original Surface was slow, buggy, and ran an Arm-only version of Windows called Windows RT. Microsoft retreated to x86 with the release of the Intel-powered Surface Pro.

But Windows on Arm crawled forward with a new strategy. Instead of building a new version of Windows, Microsoft would bring Arm to the Windows everyone already used. After testing the waters in 2019 with the Qualcomm-powered Surface Pro X, Microsoft and Qualcomm committed fully. The Copilot Plus PC is the result.

The launch didn’t go according to plan. Microsoft pitched Copilot Plus PCs on the strength of on-device AI performance, which is accelerated by a neural processing unit (NPU) included in Snapdragon X. But Microsoft bungled the software, going so far as to recall the headline feature (ironically known as Windows Recall) over security and privacy concerns. Sag said that cast a dark cloud over the launch. “Recalling Recall was a very weird overcorrection [...] I think they would’ve been fine shipping it disabled by default.”

Microsoft’s recall of Recall brought the AI-enabled features of Copilot Plus PCs into question. Matthew Berman

Snapdragon X, on the other hand, met and surpassed expectations. My review of the Surface Laptop 7, published on PC World, called it “a new era for Windows PCs.”

The Surface Laptop 7, like other Copilot Plus PCs with Snapdragon X chips, benefits from an advantage Arm chips often hold when compared to x86: efficiency. Laptops with Snapdragon X can last over 20 hours on battery, yet meet or beat x86 alternatives in performance benchmarks. Tests have found Snapdragon X is up to 50 percent more efficient than comparable x86 chips in single-core workloads and 20 percent more efficient in multi-core. That translates not only to long battery life but also less heat, less fan noise, and better performance on battery power.

I’m not alone in my praise. Leonard Lee, executive analyst and founder at neXt Curve, said the launch was “a big win” for Qualcomm, though he cautioned it’s too early to know how well Qualcomm-powered laptops have sold. Sag agreed. “My experience with the [Copilot Plus] PCs that I’ve used so far has been overwhelmingly positive,” he said.

PC chip competition heats up

The launch of Qualcomm’s Snapdragon X puts x86 at a disadvantage, at least for the moment.

“We don’t know when Intel and AMD are going to come on board with Copilot Plus,” said Sag. “I think the best case scenario is the end of Q4 2024, but to be realistic, it’s probably going to be the beginning of next year.” Lee was more optimistic about x86’s reply. “My early thinking is Intel’s Lunar Lake [architecture] will be a stabilizing entry for Intel. And they have an ecosystem and software backing them up [...] Intel and AMD will fight tooth and nail to be relevant.”

We’ll know more this time next year. Qualcomm is likely to announce more Arm chips for Windows’ PCs later this year. The Consumer Electronics Show, to be held in January of 2025, will see PC makers announce the next wave of Copilot Plus PCs; some with Arm chips, and some with x86.

“The ground hasn’t settled yet. We’re going to have a couple years of this uncertainty.” —Anshel Sag, Moor Insights & Strategy

And while Qualcomm was Microsoft’s partner for the Copilot Plus PC launch, other Arm chip makers are likely soon join in. MediaTek, Nvidia, Broadcom, and Samsung are among the big names that could design their own chips for Windows. Even x86 stalwarts may get into the Arm action; Sag points out AMD had an Arm chip in development, though it was never released.

That’s exciting news for the PC. After decades of lopsided competition between Intel and AMD, the field could expand to include a half dozen chip makers (or more). Each chip will have an integrated GPU and NPU, too, further expanding what Windows’ PCs can accomplish.

Developers, on the other hand, should expect trouble.

Unlike Apple, Unlike Apple, which leaned on vertical integration to transition the Mac from x86 to its own Arm-based chips within just three years, Windows must support both—Arm and x86 PCs will exist side-by-side for years to come. Some developers may choose to rely on Windows’ emulator, which can run x86 software on Arm, but at reduced performance. Other developers—especially those building apps that require high performance—must take on the burden of optimizing software for both.

“The ground hasn’t settled yet. We’re going to have a couple years of this uncertainty, and that Arm vendors will add more complexity, and competition, to the PC space than we’ve ever seen before,” said Sag.

This article appears in the September 2024 print issue as “Windows on Arm CPUs Are Here to Stay.”

How Good Is ChatGPT at Coding, Really?

Michelle Hampson — Sat, 06 Jul 2024 12:00:01 +0000

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

Programmers have spent decades writing code for AI models, and now, in a full circle moment, AI is being used to write code. But how does an AI code generator compare to a human programmer?

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code—with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent—depending on the difficulty of the task, the programming language, and a number of other factors.

While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

Yutian Tang is a lecturer at the University of Glasgow who was involved in the study. He notes that AI-based code generation could provide some advantages in terms of enhancing productivity and automating software development tasks—but it’s important to understand the strengths and limitations of these models.

“By conducting a comprehensive analysis, we can uncover potential issues and limitations that arise in the ChatGPT-based code generation... [and] improve generation techniques,” Tang explains.

To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to address 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset.” —Yutian Tang, University of Glasgow

Overall, ChatGPT was fairly good at solving problems in the different coding languages—but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively.

“However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” Tang notes.

For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset,” Tang says.

Essentially, as coding evolves, ChatGPT has not been exposed yet to new problems and solutions. It lacks the critical thinking skills of a human and can only address problems it has previously encountered. This could explain why it is so much better at addressing older coding problems than newer ones.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems.” —Yutian Tang, University of Glasgow

Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems.

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn’t understand the content or problem at hand.

While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems, thus, this simple error feedback information is not enough,” Tang explains.

The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable. Their results also show that generated code in C was the most complex, followed by C++ and Python, which has a similar complexity to the human-written code.

Tangs says, based on these results, it’s important that developers using ChatGPT provide additional information to help ChatGPT better understand problems or avoid vulnerabilities.

“For example, when encountering more complex programming problems, developers can provide relevant knowledge as much as possible, and tell ChatGPT in the prompt which potential vulnerabilities to be aware of,” Tang says.

This Wearable Computer Made a Fashion Statement

Allison Marsh — Sat, 29 Jun 2024 13:00:01 +0000

In 1993, well before Google Glass debuted, the artist Lisa Krohn designed a prototype wearable computer that looked like no other. The Cyberdesk was an experiment in augmented reality. At a time when computers were mostly beige and boxy, Krohn envisioned a pliable, high-tech garment that fused fashion with function.

Krohn studied art and architectural history at Brown University and the Rhode Island School of Design (RISD) before completing an MFA at Cranbrook Academy of Art in Bloomfield Hills, Mich., in 1988. With the Cyberdesk, she tapped into a cultural moment in which artists, techies, writers, and others were celebrating the convergence of humans and machines and eagerly anticipating our cyborg future.

What is Lisa Krohn’s Cyberdesk?

Although a working prototype of the Cyberdesk was never built, the yellow eyepiece suggested a retinal display.Lisa Krohn and Christopher Myers

The Cyberdesk, made of resin, plastic, metal, and glass, was meant to be worn like a necklace. The four circles along the breastbone are a four-key keyboard with a large trackball at the top center; the user would use the keyboard and trackball to make selections from menus of options. A small microphone lies against the throat, and an earpiece hooks into the left ear. Krohn imagined the yellow tube in front of the right eye as a retinal scan display that would project a laser beam directly onto the back of the eye, creating a screen centered in the user’s field of vision. In the back, there is a port suggestive of some type of neural link. The Cyberdesk was intended to run on energy harvested from the body’s movement and the sun.

A port on the back of the Cyberdesk was intended as a neural link.Lisa Krohn and Christopher Myers

Krohn, along with Chris Myers, a student at the Art Center School of Design, made two models of the Cyberdesk, but it was never turned into a working prototype. The underlying technology wasn’t there yet, although there were engineers who were experimenting with similar ideas. For example, Krohn knew about work on virtual retinal displays at the University of Washington’s Human Interface Technology Laboratory, but she didn’t pursue a collaboration.

And so Krohn’s design existed as “strategic foresight, speculative technology, predictive design, or design fiction,” she told me in a recent email. Krohn imagined a possible future, one in which, as she notes on her company’s website, “person and machine merge into one seamless collaborative super-being!” In other words, a cyborg.

The Cyberdesk wasn’t the only piece of cyborg gear that Krohn designed. In 1988, before the age of smartphones and Web searches, she imagined a wrist computer that combined satellite navigation, a phone, a wristwatch, and a regional information guide. Made of a flexible plastic, it could be folded up and worn as a decorative cuff when not being used as a computer.

Lisa Krohn also designed a flexible wrist computer that could be folded up when not in use. Lisa Krohn

Krohn designed the wrist computer prototype before “wearable” became a common way to refer to a portable device that incorporates computer technology. Futurist Paul Saffo is credited with first using the term “wearable computer” in an article in InfoWorld in 1991. Saffo predicted the first wearables would be worn on the belts of maintenance workers and then be extended to deskless, information-intensive tasks, such as conducting store inventories. He also suggested a game console consisting of a tiny display integrated into sunglasses and paired with a power glove. Nowhere did he consider technology as a fashion accessory, and I suspect he wasn’t even considering women when he made his predictions.

Meanwhile, Steve Mann was working on ideas for mediated vision as a graduate student at MIT. Mann was first inspired to build a better welding mask that would protect the welder’s eyes from the bright electric arc while still allowing a clear view. This led him to think about how to use video cameras, displays, and computers to modify vision in real time. Both Krohn and Mann ran into similar real-world challenges: cellphones, the Internet, civilian GPS, and online databases were still in their infancy, and the hardware was heavy and clunky. While Mann built boxy functional prototypes that he demoed on himself, Krohn imagined more speculative technology.

Each “page” of the Krohn’s phonebook represents a separate function—dial phone, answering machine, and printer. Lisa Krohn, Sigmar Willnauer, and Tony Guido

Krohn also worked on utilitarian business technologies. In 1987, she designed a prototype for the phonebook, an integrated phone with answering machine and printer. Each “page” of the phonebook had its own function, and an electric switch automatically changed to that function as the page was flipped, with instructions printed on the page. That intuitive design was in sharp contrast to most answering machines of the time, which were clunky and not particularly easy to use.

The phonebook was an example of “product semantics,” which holds that a product’s design should help the user understand the product’s function and meaning. At Cranbrook, Krohn studied under Michael and Katherine McCoy, who embraced that theory of design. Krohn and Michael McCoy wrote about that aspect of the phonebook in their 1989 essay “Beyond Beige: Interpretive Design for the Post-Industrial Age”: “The casting of [a] personal electronic device into the mold of [a] personal agenda is an attempt to make a product reach out to its users by informing them about how it operates, where it resides, and how it fits into their lives.”

Lisa Krohn championed cyberfeminism and cyborgs

Lisa Krohn designed the Cyberdesk in 1993, at a time when wearable computers existed mainly in science fiction. Dietmar Quistorf

The Cyberdesk as well as the wrist computer were early examples of designs influenced by cyberfeminism. This feminist movement emerged in the early 1990s as a counter to the dominance of men in computing, gaming, and various Internet spaces. It built on feminist science fiction, such as the writings of Octavia Butler, Vonda McIntyre, and Joanna Russ, as well as the work of hackers, coders, and media artists. Different threads of cyberfeminism developed around the world, especially in Australia, Germany, and the United States. While mainstream depictions of cyborgs continued to tilt masculine, cyberfeminists challenged the patriarchy by experimenting with genderless ideas of cyborgs and recombinants that melded machines, plants, humans, and animals.

The feminist theorist and historian of technology Donna Haraway kindled this cyborgian drift through her 1985 essay, “A Manifesto for Cyborgs,” published in the Socialist Review. She argued that as the end of the 20th century approached, we were all becoming cyborgs due to the breakdown of lines dividing humans and machines. Her cyborg theory hinged on communication, and she saw cyborgs as a potential solution that allowed for a fluidity of both language and identity. The essay is considered one of the foundational texts in cyberfeminism, and it was republished in Haraway’s 1990 book, Simians, Cyborgs, and Women: The Reinvention of Nature.

Krohn imagined a possible future, one in which “person and machine merge into one seamless collaborative super-being!” In other words, a cyborg.

Krohn and McCoy’s 1989 essay also highlighted communication as a central problem in modern design. Mainstream consumer electronics, they argued, had reached a monotonous uniformity of design that favored manufacturing efficiency over conveying the product’s intended function.

Both Haraway and Krohn saw opportunities for technology, especially microelectronics, to challenge the restrictions of the past. By embracing the cyborg, both women found new ways to overcome the limits of language and communication and to forge new directions in feminism.

Cyberdesk 2.0

I had the privilege of meeting Lisa Krohn when she participated in a roundtable on the Cyberdesk at the 2023 annual meeting of the Society for the History of Technology. The assembled group, which included curators and conservators from the Cooper Hewitt, Smithsonian Design Museum and the San Francisco Museum of Modern Art (each of which has a Cyberdesk prototype in its collection), considered a possible Cyberdesk version 2.0. What would be different if Krohn were designing it today?

In 2023, Krohn reimagined the Cyberdesk. It now incorporates technology that hadn’t been available 30 years earlier, such as sensors to monitor brainwaves, hydration, and stress levels.Duvit Mark Kakunegoda

The group focused their discussion around the idea of “design futuring,” a concept promoted by Tony Fry in his 2009 book of the same name. Design futuring is a way to actively shape the future, rather than passively trying to predict it and then reacting after the fact. Fry describes how design futuring could be used to promote sustainability.

In the case of the Cyberdesk 2.0, a focus on sustainability might lead to a different choice of materials. The original resin provided a malleable material that could mold to the contours of the body. But its long-term stability is terrible. Despite best practices in conservation, the Cyberdesk will likely turn into a goopy mess in the not-too-distant future. (In a previous column, I wrote about a transistorized music box owned by John Bardeen that suffers from the same basic problem of decaying materials, which in curatorial circles is known as “inherent vice.”)

The panelists considered alternatives like biomaterials, and they discussed the entire product life cycle, the challenges of electronic waste, and the mining of rare earth elements. They wondered how the design process and the global supply chain might change if such factors were considered from the start, rather than as problems to be solved later.

These are just a few of the ideas that percolated while historians, artists, curators, and conservators considered the Cyberdesk. Now imagine if a few engineers were also present. To me, that would have been a really worthwhile discussion. Not only can art unlock creative design and push innovations in new directions, it also allows us to reflect on technology in daily life. And artists can learn from engineers about new materials, technologies, and possibilities. Working together, technology and design no longer need the modifiers speculative and predictive. Engineers and artists can create the future reality.

Part of a continuing series looking at historical artifacts that embrace the boundless potential of technology.

An abridged version of this article appears in the July 2024 print issue as “The Wearable Computer as Bling.”

References

I first learned about Lisa Krohn’s Cyberdesk and design theory at the Society for the History of Technology’s conference in Los Angeles in 2023, during the session “Revisiting Lisa Krohn’s Cyberdesk (1993), a cyberfeminist concept model.”

Both the Cooper Hewitt, Smithsonian Design Museum and the San Francisco Museum of Modern Art have featured their respective Cyberdesks in exhibits and online articles. Note that the difference in the colors—SFMOMA’s is white, while Cooper Hewitt’s is brown—is due to the instability of the plastics and resin, as well as variations in the materials.

As I considered Krohn’s cyborg designs, I couldn’t help but recall Donna Haraway’s classic essay “A Cyborg Manifesto,” a foundational text in cyberfeminism. Forty years on, we are more cyborgian than Haraway originally posited. Her challenges to traditional notions of identity still resonate with today’s nuanced discussions of gender. Addressing algorithmic bias and generative AI training may be a new frontier for cyberfeminism.

How Vannevar Bush Engineered the 20th Century

G. Pascal Zachary — Tue, 18 Jun 2024 10:00:04 +0000

In the summer of 1945, Robert J. Oppenheimer and other key members of the Manhattan Project gathered in New Mexico to witness the first atomic bomb test. Among the observers was Vannevar Bush, who had overseen the Manhattan Project and served as the sole liaison to U.S. President Franklin D. Roosevelt on progress toward the bomb.

Remarkably, given his intense wartime responsibilities, Bush continued to develop his own ideas about computing and information. Just days before the Trinity test, he had published in The Atlantic Monthly a futuristic account of networks of information knitted together via “associative trails”—which we would now call hypertext or hyperlinks. To this day, Bush’s article—titled “As We May Think”—and his subsequent elaborations of networked information appliances are credited with shaping what would become the personal computer and the World Wide Web. And during his lifetime, Bush was celebrated as one of the nation’s leading prophets of technological change and the most influential proponent of government funding of science and engineering.

Vannevar Bush’s influential 1945 essay “As We May Think” shaped the subsequent development of the personal computer and the World Wide Web. The Atlantic Monthly

And yet, if you watched this year’s Oscar-winning Oppenheimer, Bush is only a minor character. Played by actor Matthew Modine, he testifies before a secret government panel that will decide whether Oppenheimer, scientific director of the Manhattan Project, should be stripped of his security clearance and banished from participating in future government decisions on sensitive technological issues.

“Try me, if you want to try him,” Bush defiantly tells the panel. Alas, tragedy unfolds when the panel punishes Oppenheimer for his opposition to testing the nation’s first hydrogen bomb. No more is said about Bush, even though he also opposed the first H-bomb test, on the grounds that the test, held on 1 November 1952, would help the Soviet Union build its own superweapon and accelerate a nuclear arms race. Bush was spared sanction and continued to serve in government, while Oppenheimer became a pariah.

Today, though, Oppenheimer is lionized while Bush is little known outside a small circle of historians, computer scientists, and policy thinkers. And yet, Bush’s legacy is without a doubt the more significant one for engineers and scientists, entrepreneurs, and public policymakers. He died at the age of 84 on 28 June 1974, and the 50th anniversary of his death seems like a good time to reflect on all that Vannevar Bush did to harness technological innovation as the chief source of economic, political, and military power for the United States and other leading nations.

Vannevar Bush and the Funding of Science & Engineering

Beginning in 1940, and with the ear of the president and leading scientific and engineering organizations, Vannevar Bush promoted the importance of supporting all aspects of research, including in universities, the military, and industry. Bush’s vision was shaped by World War II and America’s need to rapidly mobilize scientists and engineers for war fighting and defense. And it deepened during the long Cold War.

Bush’s pivotal contribution was his creation of the “research contract,” whereby public funds are awarded to civilian scientists and engineers based on effort, not just outcomes (as had been normal before World War II). This freedom to try new things and take risks transformed relations between government, business, and academia. By the end of the war, Bush’s research organization was spending US $3 million a week (about $52 million in today’s dollars) on some 6,000 researchers, most of them university professors and corporate engineers.

On its 3 April 1944 cover, Time called Vannevar Bush the “General of Physics,” for his role in accelerating wartime R&D.Ernest Hamlin Baker/TIME

Celebrated as the “general of physics” on the cover of Time magazine in 1944, Bush served as the first research chief of the newly created Department of Defense in 1947. Three years later, he successfully advocated for the creation of a national science foundation, to nourish and sustain civilian R&D. In launching his campaign for the foundation, Bush issued a report, entitled Science, The Endless Frontier, in which he argued that the nation’s future prosperity and the American spirit of “frontier” exploration depended on advances in science and engineering.

Bush’s influence went well beyond the politics of research and the mobilization of technology for national security. He was also a business innovator. In the 1920s, he cofounded Raytheon, and the company competed with behemoth RCA in the design and manufacture of vacuum tubes. As a professor and later dean of engineering at the Massachusetts Institute of Technology, he crafted incentives for professors to consult part time for business, setting in motion in the 1920s and 1930s practices now considered essential to science-based industry.

Bush’s beliefs influenced Frederick Terman, a doctoral student of his, to join Stanford University, where Terman played a decisive role in the birth of Silicon Valley. Another Bush doctoral student, Claude Shannon, joined Bell Labs and founded information theory. As a friend and trusted adviser to Georges Doriot, Bush helped launch one of the first venture capital firms, American Research and Development Corp.

Vannevar Bush’s Contributions to Computing

Starting in the 1920s, Bush began designing analog computing machines, known as differential analyzers. This version was at Aberdeen Proving Ground, in Maryland.MIT Museum

But wait, there’s more! Bush was a major figure in the early history of modern computing. In the 1930s, he gained prestige as the designer of a room-size analog computing machine known as the “differential analyzer,” then considered the most powerful calculating machine on the planet. It was visually impressive enough that UCLA’s differential analyzer had a major cameo in the 1951 sci-fi movie When Worlds Collide.

In the 1940s, despite his busy schedule with the Manhattan Project, Bush set aside time to envision and build working models of a desktop “memory extender,” or memex, to assist professionals in managing information and making decisions. And, as mentioned, he published that pivotal Atlantic article.

For engineers, Bush carries a special significance because of his passionate arguments throughout his life that all engineers—especially electrical engineers—deserve the same professional status as doctors, lawyers, and judges. Before World War II, engineers were viewed chiefly as workers for hire who did what they were told by their employers, but Bush eloquently insisted that engineers possessed professional rights and obligations and that they delivered their expert judgments independently and, when feasible, with the public interest in mind.

Vannevar Bush considered engineering not just a job but a calling. John Lent/AP

From the distance of a half century, Bush’s record as a futurist was mixed. He failed to envision the enormous expansion of both digital processing power and storage. He loudly proclaimed that miniaturized analog images stored on microfilm would long provide ample storage. (To be fair, many old microfilm and microfiche archives remain readable, unlike, say, digital video disks and old floppies.)

And yet, Bush’s ideas about the future of information have proved prescient. He believed, for example, that human consciousness could be enhanced through computational aids and that the automation of routine cognitive tasks could liberate human minds to concentrate and solve more difficult problems.

In this regard, Bush prefigures later computing pioneers like Douglas Engelbart (inventor of the mouse) and Larry Page (cofounder of Google), who promoted the concept of human “augmentation” through innovative digital means, such as hypertext and search, and enhancing the speed, accuracy, and depth of purposeful thought. Indeed, today’s debate over the harm to humans from generative AI could benefit from Bush’s own calm assessment about the creative, intellectual, and artistic benefits to be gained from “the revolution in machines to reduce mental drudgery.” The subject of human enhancement through digital systems was “almost constantly” on his mind, he wrote in his 1970 memoir, Pieces of the Action, four years before his death. Bush cautioned against hysteria in the face of digitally mediated cognitive enhancements. And he insisted that our technological systems should maintain the proverbial “human in the loop,” in order to honor and safeguard our values in the tricky management of digital information systems.

The fate of human culture and values was not Bush’s only worry. In his later life, he fretted about the spread of nuclear weapons and the risk of their use. Fittingly, as the titular head of the Manhattan Project and, in the 1950s, an opponent of testing the first H-bomb, he saw nuclear weapons as an existential threat to all life on the planet.

Bush identified no ultimate solutions to these problems. Having done so much to enhance and solidify the role of scientists and engineers in the advancement of society, he nevertheless foresaw an uncertain world, where scientific and technological outcomes would also continue to challenge us.

The Trick to a Cleaner Google Search

Ernie Smith — Sun, 16 Jun 2024 14:00:05 +0000

A version of this post originally appeared on Tedium, Ernie Smith’s newsletter, which hunts for the end of the long tail.

Last month, Google announced some big changes to its search engine that are, in a word, infuriating, to users like myself.

Google has started adding AI overviews to many of its search results, which essentially generate pre-processed answers to search queries. If you’re using Google to actually find websites rather than get answers, it $!@(&!@ sucks.

But in the midst of all this, Google quietly added something else to its results—a “Web” filter that presents what Google used to look like a decade ago, no extra junk. While Google made its AI-focused changes known on its biggest stage—during its Google I/O event—the Web filter was curiously announced on Twitter by Search Liaison Danny Sullivan.

As Sullivan wrote:

We’ve added this after hearing from some that there are times when they’d prefer to just see links to web pages in their search results, such as if they’re looking for longer-form text documents, using a device with limited internet access, or those who just prefer text-based results shown separately from search features. If you’re in that group, enjoy!

The results are fascinating. It’s essentially Google, minus the extra fluff. No parsing of the information in the results. No surfacing metadata like address or link info. No knowledge panels, but also, no ads. It looks like the Google we learned to love in the early 2000s, buried under the “More” menu.

This is what Google search used to look like, without any extras, and it can look like that again.Ernie Smith

For power searchers like myself, it’s likely going to be an amazing tool. But Google’s decision to bury it ensures that few people will use it. The company has essentially bet that you’ll be better off with a pre-parsed guess produced by its AI engine.

It’s worth understanding the tradeoffs, though. A simplified view does not replace the declining quality of Google’s results, largely caused by decades of SEO optimization by website creators. The same overly optimized results are going to be there, like it or not. It is not Google circa 2001; it is a Google-circa-2001 presentation of Google circa 2024, a very different site.

But if you understand the tradeoffs, it can be a great tool.

And here’s the trick to using it without having to click the ‘Web’ option buried in a menu every single time. Google does not make it easy, but by adding a URL parameter to your search—in this case, “udm=14”—you can get directly to the Web results in a search.

That sounds like extra work until you realize that many browsers allow you to add custom search engines by adding the %s entry as a stand-in for the search term you put in. And it works great in the case of Google.

You can specify the default url for the omnibar search on your browser. Ernie Smith

In Vivaldi, my weapon of choice, I did this:

●Go to Settings -> Search

●Look at the list of search engines, and hit the plus button at the bottom left of the dialog box to add a new one

●Name the new item “Google Web Only,” and give it the nickname of “gw”

●Set the URL as https://www.google.com/search?q=%s&udm=14

●Set it as your default search

Now, when you use the omnibar on your browser of choice, it will automatically push you to the Google Web Only search. If you want a more traditional search, add a “g” in front of the search in your omnibar, and it will give you the full-fat search, knowledge panels and all. Don’t want to make it your default? Don’t.

A variant of this should work for most Chromium-based browsers, including Chrome proper. It is also possible in Firefox with an extension. Safari, which does not allow you to add custom search engines by default, is a little more complicated, but it is possible through the use of custom extensions like HyperWeb for iOS. I’m still looking for a Safari-for-Mac solution.

Or, you can use a front-end that I created at udm14.com or udm14.org.

When you want something more elemental, less adulterated, it’s there, no extra junk.

It’s depressing that it’s gotten to this, isn’t it?

Purified Silicon Makes Bigger, Faster Quantum Computers

Alfred Poor — Sat, 15 Jun 2024 16:08:04 +0000

Purifying silicon for computer chips dates back to the 1950s, and while the industry has arrived at viable purity standards for conventional computer chips, quantum computing demands purer silicon still. So researchers in England have developed ways to eliminate the isotopes that disrupt the delicate qubit states in the silicon. This process could make silicon-based quantum computing more feasible and cost-effective.

The Silicon-28 (²⁸Si) isotope makes up more than 90 percent of natural silicon. But 4.5 percent of raw silicon is ²⁹Si, containing one extra neutron, which effectively gives the isotope a net nuclear spin that can wreak havoc on a silicon qubit’s delicate electron spin states. So to make quantum computers with the maximum coherence times—and therefore computers that can sustain the longest and most complex computations—as much ²⁹Si as possible should be removed or extracted somehow from a silicon qubit substrate.

“Better purity with a method that can be integrated in industrial processes will immediately...[enable] this technology to be scaled to the millions of qubits needed for world-changing applications.” —Andre Saraiva, Diraq

“Think of spin as a tiny magnet,” says Paul Smith-Goodson, a principal analyst at the Austin, Texas–based Moore Insights & Strategy. “Purified silicon-28 has no spin. And a qubit without spin can maintain its quantum state for a longer period of time. An isotope of silicon, such as silicon-29, does have a spin. The spin creates enough magnetic noise to interfere with the qubit’s quantum state and make it collapse.”

The scaling potentials could be big, according to Andre Saraiva, head of solid-state theory at the Sydney-based quantum computing startup Diraq. “Spin qubits in silicon have excellent fidelities, but only after some isotopic purification,” he says. “Better purity with a method that can be integrated in industrial processes will immediately give us better coherence and easier operations, enabling this technology to be scaled to the millions of qubits needed for world-changing applications.”

While some efforts have been made to reduce the presence of ²⁹Si in silicon wafers, they have met with limited success. A team of researchers from the University of Manchester, in England, and the University of Melbourne, in Australia, have come up with a novel approach to the problem. Rather than try to purify an entire silicon wafer, they chose to reduce the ²⁹Si content of just the regions that would be used by a qubit.

A focused ion beam of silicon-28 strikes a natural silicon wafer—forcing out the unwanted silicon-29 isotope—in a new purification technology that could be used to make a silicon wafer more suitable for housing delicate spin qubits. Nature Communications Materials

To accomplish this, they placed the wafer in a vacuum and bombarded the target area with a focused ion beam of ²⁸Si atoms. When the beam strikes the wafer, it displaces the existing silicon atoms, replacing them with just the ²⁸Si isotope. They used a device that produced a beam with about a 500-nanometer diameter. This beam was raster scanned across a square target area 22 micrometers on a side. That, the researchers say, is an area sufficient to house a 12,000-qubit array. The wafer then goes through a two-step annealing process to return the irradiated material to a crystalline phase.

The researchers managed to produce samples with fewer than 3 parts per million of ²⁹Si—some 1/10,000th the ²⁹Si impurities than what exist in natural silicon.

By treating only the target areas, the process is more efficient and scalable than other processes that attempt to physically separate the isotopes, such as by centrifuge. An additional advantage of the ion-beam enrichment process is that it does not introduce other contaminants into the silicon wafer. Other procedures can introduce additional carbon and oxygen during the enrichment process.

Richard Curry, professor of advanced electronic materials at the University of Manchester, says the research team wanted to develop a technology that “was scalable and compatible with standard silicon-device processing.” “This allows for its adoption in the future manufacturing of silicon-based quantum technologies.”

UPDATE: 17 June 2024: The story was updated to correct a typo—the area the silicon ion beam swept out was a square 22 µm on each side, not 22 nm, as was originally reported.

Immersion Cooling Dunks Servers to Cut Power

Dina Genkina — Thu, 13 Jun 2024 14:00:02 +0000

Electronics and fluids don’t generally mix. But teams from different corners of the globe are showing that immersing data-center gear in specialized fluids could be the best way to keep them cool.

Computers may fail if they get too hot, so they often use power-hungry fans to cool them down. Recently engineers have deployed ways to cool supercomputers by circulating water in pipes near processors. Fluids are far more dense than air, which makes them more efficient at drawing heat away from computers. This efficiency is increasingly important—a 2023 study finds that the energy required to keep the servers in data centers from overheating represents 30 to 40 percent of the total energy that data centers consume.

However, water cooling faces problems of its own. The water carrying heat from computers is typically piped to cooling towers. There its heat converts a separate supply of water into mist that evaporates into the atmosphere. In 2022, Google’s data centers consumed about 19 billion liters of freshwater for cooling.

Sandia researchers are testing out cooling computers by submerging them entirely in nonconductive oil.

Now, two separate results are putting a different technology on the map—immersion cooling, or dunking whole data centers in oil. The oil is nonconductive and noncorrosive, so that it can be in direct contact with electronics without short-circuiting or damaging them. The technology holds the potential to cut energy usage in half, says Oliver Curtis, co-CEO of immersion-cooled data center company Sustainable Metal Cloud.

“We’ve proven that you can get the same amount of performance, but for half the amount of energy, and if you can do that, it’s our social responsibility to proliferate this technology,” Curtis says.

Dunking an AI Factory

Yesterday, the MLPerf AI training competition announced a new benchmark—energy consumption. As the name suggests, it measures the power each submitting machine consumes when performing each of its other benchmarks, such as training a large language model or a recommendation engine. This new category had only one submitting organization, Singapore-based Sustainable Metal Cloud (SMC).

SMC was looking to show off the efficiency gains that result from its immersion-based cooling system. The system’s fluid is an oil called polyalphaolefin, which is a commonly used automotive lubricant. The oil is forced slowly through the dunked servers, allowing for efficient heat transfer.

The SMC team has figured out what modifications need to be made to servers to make them compatible with this cooling method over the long-term. Beyond removing the built-in fans, they switch out thermal interface materials that connect chips to their heat sinks, as some of those materials degrade in the oil. Curtis says the modifications they make are small but important to the functioning of their setup.

“What we’ve done there is we’ve created the perfect operating environment for a computer,” Curtis says. “There’s no dust, there’s no movement, no vibration, because there’s no fans. And it’s a perfect operating temperature.”

SMC’s systems, which it calls HyperCubes, consist of 12 or 16 oil tanks, each housing a server. The servers are connected to each other in between tanks through ordinary interconnects, looping out of the oil in one tank and into the adjacent tank. Curtis claims that this approach saves 20 to 30 percent of total energy usage at the server level.

In addition, SMC builds sitewide heat-exchange systems, one for each HyperCube. In a traditional data center, in addition to fans attached directly to servers, centralized air conditioning is needed to keep servers cool. Curtis says the system-level heat exchanger does the job of the A/C more efficiently, supplying a further 20 percent energy reduction.

SMC calls its combined HyperCubes and dedicated heat exchangers “AI Factories.” The company deployed its first HyperCube in Tasmania in 2019, and subsequently built and delivered more than 14 others in Australia. In 2022, SMC installed its first AI Factory in Singapore, accessible via the cloud for commercial use in Asia.

Benchmark	SMC Energy (kJ)	SMC Time to Train	Best Time to Train
Natural language processing	1,793	5.39	5.31 (Supermicro)
Recommender systems	1,266	3.84	3.84 (SMC)
GPT-3	1,676,757	56.87	50.73 (Nvidia)
Image recognition	7,757	2.55	2.49 (Oracle)
Object detection	21,493	6.31	6.08 (Nvidia)
Medical imaging	5,915	1.83	1.83 (SMC)

Because SMC was the only company to enter MLPerf’s new energy category, it is hard to validate its exact energy-saving claims. However, the performance of its platform on various benchmarks was on par with comparable competitors—that is, other systems that, like SMC, use Nvidia’s H100 GPUs in the same numbers. And its energy results are now out there as a gauntlet, thrown down for other companies to beat.

Researching Oil for the Chill

Separately, Sandia National Laboratories, in New Mexico, is testing immersion cooling with the aim of providing an independent, publicly available assessment. So far, immersion cooling “has a lot of advantages, and it’s really hard for me to see any disadvantages that would sway me to other technologies,” says Dave Martinez, engineering program project lead for Sandia’s infrastructure computing services.

The liquid Sandia is using is from Submer Technologies in Barcelona. It’s is a synthetic, biodegradable, nontoxic, nonflammable, noncorrosive fluid made using food-grade components. The fluid has 1/8th the electrical conductivity of air and is roughly the viscosity of cooking oil, Martinez says.

In tests, Sandia is placing entire computers—server racks and their power cables—in immersion tanks loaded with the fluid. This strategy aims to capture all of the heat the electronics generate to provide even cooling. The coolant gives up its heat to the open air, given the right difference in temperature.

According to Submer, its immersion cooling system is 95 percent more efficient than traditional cooling technologies. Martinez suggests it may cut energy consumption by 70 percent compared with standard methods. In addition, after the coolant absorbs heat, it can be used to warm buildings during winter months, he says.

When it comes to replacing a component—say, a chip on a board—a gantry system above the tank can lift out a server rack. “We just let it drip until there’s no oil left,” Martinez says. “We might have to clean it all up a tiny bit, not a whole lot. It is just one more step than a normal system. But my assumption is that the failure rate of these parts will go down a lot because the cooling is more effective than a fan-based system.”

In partnership with Albuquerque-based data company Adacen, Martinez and his colleagues began testing Submer’s fluid and equipment in May.

“Right now, we’re seeing a lot more pros than cons,” Martinez says. “It’s not just the energy saved, which is pretty tremendous. Without all the fans, there’s virtually no noise, too. You might not even know there’s a data center there.”

Sandia’s tests involve checking temperatures inside and outside the immersion tank, measuring the amount of energy that cooling requires, the reliability of the hardware, examining whether some coolant flow patterns work better than others, calculating infrastructure costs, and figuring out how best to use fans or water to remove what heat the coolant does release. The lab also plans to overclock the computers and see how much of a performance boost the coolant might provide without damaging the electronics, Martinez says.

Submer notes there are potential challenges its coolant faces. For instance, plasticizer compounds in PVC cables may leak into the coolant, potentially making the cables stiffer and brittle. However, the company notes that cables with outer sheaths made of materials like polyurethane resin do not show this problem.

Sandia plans to finish its tests in July and write up its results in August. “Sandia is exploring what our next data center is going to look like,” and immersion cooling could play a part, Martinez says. “Right now this is looking pretty good as a player in our future.”

Nvidia Conquers Latest AI Tests

Samuel K. Moore — Wed, 12 Jun 2024 15:00:05 +0000

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

MLPerf, the AI benchmarking suite sometimes called “the Olympics of machine learning,” has released a new set of training tests to help make more and better apples-to-apples comparisons between competing computer systems. One of MLPerf’s new tests concerns fine-tuning of large language models, a process that takes an existing trained model and trains it a bit more with specialized knowledge to make it fit for a particular purpose. The other is for graph neural networks, a type of machine learning behind some literature databases, fraud detection in financial systems, and social networks.

Even with the additions and the participation of computers using Google’s and Intel’s AI accelerators, systems powered by Nvidia’s Hopper architecture dominated the results once again. One system that included 11,616 Nvidia H100 GPUs—the largest collection yet—topped each of the nine benchmarks, setting records in five of them (including the two new benchmarks).

“If you just throw hardware at the problem, it’s not a given that you’re going to improve.” —Dave Salvator, Nvidia

The 11,616-H100 system is “the biggest we’ve ever done,” says Dave Salvator, director of accelerated computing products at Nvidia. It smashed through the GPT-3 training trial in less than 3.5 minutes. A 512-GPU system, for comparison, took about 51 minutes. (Note that the GPT-3 task is not a full training, which could take weeks and cost millions of dollars. Instead, the computers train on a representative portion of the data, at an agreed-upon point well before completion.)

Compared to Nvidia’s largest entrant on GPT-3 last year, a 3,584 H100 computer, the 3.5-minute result represents a 3.2-fold improvement. You might expect that just from the difference in the size of these systems, but in AI computing that isn’t always the case, explains Salvator. “If you just throw hardware at the problem, it’s not a given that you’re going to improve,” he says.

“We are getting essentially linear scaling,” says Salvator. By that he means that twice as many GPUs lead to a halved training time. “[That] represents a great achievement from our engineering teams,” he adds.

Competitors are also getting closer to linear scaling. This round Intel deployed a system using 1,024 GPUs that performed the GPT-3 task in 67 minutes versus a computer one-fourth the size that took 224 minutes six months ago. Google’s largest GPT-3 entry used 12-times the number of TPU v5p accelerators as its smallest entry and performed its task nine times as fast.

Linear scaling is going to be particularly important for upcoming “AI factories” housing 100,000 GPUs or more, Salvator says. He says to expect one such data center to come online this year, and another, using Nvidia’s next architecture, Blackwell, to startup in 2025.

Nvidia’s streak continues

Nvidia continued to boost training times despite using the same architecture, Hopper, as it did in last year’s training results. That’s all down to software improvements, says Salvator. “Typically, we’ll get a 2-2.5x [boost] from software after a new architecture is released,” he says.

For GPT-3 training, Nvidia logged a 27 percent improvement from the June 2023 MLPerf benchmarks. Salvator says there were several software changes behind the boost. For example, Nvidia engineers tuned up Hopper’s use of less accurate, 8-bit floating point operations by trimming unnecessary conversions between 8-bit and 16-bit numbers and better targeting of which layers of a neural network could use the lower precision number format. They also found a more intelligent way to adjust the power budget of each chip’s compute engines, and sped communication among GPUs in a way that Salvator likened to “buttering your toast while it’s still in the toaster.”

Additionally, the company implemented a scheme called flash attention. Invented in the Stanford University laboratory of Samba Nova founder Chris Re, flash attention is an algorithm that speeds transformer networks by minimizing writes to memory. When it first showed up in MLPerf benchmarks, flash attention shaved as much as 10 percent from training times. (Intel, too, used a version of flash attention but not for GPT-3. It instead used the algorithm for one of the new benchmarks, fine-tuning.)

Using other software and network tricks, Nvidia delivered an 80 percent speedup in the text-to-image test, Stable Diffusion, versus its submission in November 2023.

New benchmarks

MLPerf adds new benchmarks and upgrades old ones to stay relevant to what’s happening in the AI industry. This year saw the addition of fine-tuning and graph neural networks.

Fine tuning takes an already trained LLM and specializes it for use in a particular field. Nvidia, for example took a trained 43-billion-parameter model and trained it on the GPU-maker’s design files and documentation to create ChipNeMo, an AI intended to boost the productivity of its chip designers. At the time, the company’s chief technology officer Bill Dally said that training an LLM was like giving it a liberal arts education, and fine tuning was like sending it to graduate school.

The MLPerf benchmark takes a pretrained Llama-2-70B model and asks the system to fine tune it using a dataset of government documents with the goal of generating more accurate document summaries.

There are several ways to do fine-tuning. MLPerf chose one called low-rank adaptation (LoRA). The method winds up training only a small portion of the LLM’s parameters leading to a 3-fold lower burden on hardware and reduced use of memory and storage versus other methods, according to the organization.

The other new benchmark involved a graph neural network (GNN). These are for problems that can be represented by a very large set of interconnected nodes, such as a social network or a recommender system. Compared to other AI tasks, GNNs require a lot of communication between nodes in a computer.

The benchmark trained a GNN on a database that shows relationships about academic authors, papers, and institutes—a graph with 547 million nodes and 5.8 billion edges. The neural network was then trained to predict the right label for each node in the graph.

Future fights

Training rounds in 2025 may see head-to-head contests comparing new accelerators from AMD, Intel, and Nvidia. AMD’s MI300 series was launched about six months ago, and a memory-boosted upgrade the MI325x is planned for the end of 2024, with the next generation MI350 slated for 2025. Intel says its Gaudi 3, generally available to computer makers later this year, will appear in MLPerf’s upcoming inferencing benchmarks. Intel executives have said the new chip has the capacity to beat H100 at training LLMs. But the victory may be short-lived, as Nvidia has unveiled a new architecture, Blackwell, which is planned for late this year.

Giant Chips Give Supercomputers a Run for Their Money

Dina Genkina — Wed, 12 Jun 2024 14:00:04 +0000

As large supercomputers keep getting larger, Sunnyvale, California-based Cerebras has been taking a different approach. Instead of connecting more and more GPUs together, the company has been squeezing as many processors as it can onto one giant wafer. The main advantage is in the interconnects—by wiring processors together on-chip, the wafer-scale chip bypasses many of the computational speed losses that come from many GPUs talking to each other, as well as losses from loading data to and from memory.

Now, Cerebras has flaunted the advantages of their wafer-scale chips in two separate but related results. First, the company demonstrated that its second generation wafer-scale engine, WSE-2, was significantly faster than world’s fastest supercomputer, Frontier, in molecular dynamics calculations—the field that underlies protein folding, modeling radiation damage in nuclear reactors, and other problems in material science. Second, in collaboration with machine learning model optimization company Neural Magic, Cerebras demonstrated that a sparse large language model could perform inference at one-third of the energy cost of a full model without losing any accuracy. Although the results are in vastly different fields, they were both possible because of the interconnects and fast memory access enabled by Cerebras’ hardware.

Speeding Through the Molecular World

“Imagine there’s a tailor and he can make a suit in a week,” says Cerebras CEO and co-founder Andrew Feldman. “He buys the neighboring tailor, and she can also make a suit in a week, but they can’t work together. Now, they can now make two suits in a week. But what they can’t do is make a suit in three and a half days.”

According to Feldman, GPUs are like tailors that can’t work together, at least when it comes to some problems in molecular dynamics. As you connect more and more GPUs, they can simulate more atoms at the same time, but they can’t simulate the same number of atoms more quickly.

Cerebras’ wafer-scale engine, however, scales in a fundamentally different way. Because the chips are not limited by interconnect bandwidth, they can communicate quickly, like two tailors collaborating perfectly to make a suit in three and a half days.

“It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” —Tomas Oppelstrup, Lawrence Livermore National Laboratory

To demonstrate this advantage, the team simulated 800,000 atoms interacting with each other, calculating the interactions in increments of one femtosecond at a time. Each step took just microseconds to compute on their hardware. Although that’s still 9 orders of magnitude slower than the actual interactions, it was also 179 times as fast as the Frontier supercomputer. The achievement effectively reduced a year’s worth of computation to just two days.

This work was done in collaboration with Sandia, Lawrence Livermore, and Los Alamos National Laboratories. Tomas Oppelstrup, staff scientist at Lawrence Livermore National Laboratory, says this advance makes it feasible to simulate molecular interactions that were previously inaccessible.

Oppelstrup says this will be particularly useful for understanding the longer-term stability of materials in extreme conditions. “When you build advanced machines that operate at high temperatures, like jet engines, nuclear reactors, or fusion reactors for energy production,” he says, “you need materials that can withstand these high temperatures and very harsh environments. It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” Being able to simulate the behavior of candidate materials for longer, Oppelstrup says, will be crucial to the material design and development process.

Ilya Sharapov, principal engineer at Cerebras, say the company is looking forward to extending applications of its wafer-scale engine to a larger class of problems, including molecular dynamics simulations of biological processes and simulations of airflow around cars or aircrafts.

Downsizing Large Language Models

As large language models (LLMs) are becoming more popular, the energy costs of using them are starting to overshadow the training costs—potentially by as much as a factor of ten in some estimates. “Inference is is the primary workload of AI today because everyone is using ChatGPT,” says James Wang, director of product marketing at Cerebras, “and it’s very expensive to run especially at scale.”

One way to reduce the energy cost (and speed) of inference is through sparsity—essentially, harnessing the power of zeros. LLMs are made up of huge numbers of parameters. The open-source Llama model used by Cerebras, for example, has 7 billion parameters. During inference, each of those parameters is used to crunch through the input data and spit out the output. If, however, a significant fraction of those parameters are zeros, they can be skipped during the calculation, saving both time and energy.

The problem is that skipping specific parameters is a difficult to do on a GPU. Reading from memory on a GPU is relatively slow, because they’re designed to read memory in chunks, which means taking in groups of parameters at a time. This doesn’t allow GPUs to skip zeros that are randomly interspersed in the parameter set. Cerebras CEO Feldman offered another analogy: “It’s equivalent to a shipper, only wanting to move stuff on pallets because they don’t want to examine each box. Memory bandwidth is the ability to examine each box to make sure it’s not empty. If it’s empty, set it aside and then not move it.”

“There’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.” —Ilya Sharapov, Cerebras

Some GPUs are equipped for a particular kind of sparsity, called 2:4, where exactly two out of every four consecutively stored parameters are zeros. State-of-the-art GPUs have terabytes per second of memory bandwidth. The memory bandwidth of Cerebras’ WSE-2 is more than one thousand times as high, at 20 petabytes per second. This allows for harnessing unstructured sparsity, meaning the researchers can zero out parameters as needed, wherever in the model they happen to be, and check each one on the fly during a computation. “Our hardware is built right from day one to support unstructured sparsity,” Wang says.

Even with the appropriate hardware, zeroing out many of the model’s parameters results in a worse model. But the joint team from Neural Magic and Cerebras figured out a way to recover the full accuracy of the original model. After slashing 70 percent of the parameters to zero, the team performed two further phases of training to give the non-zero parameters a chance to compensate for the new zeros.

This extra training uses about 7 percent of the original training energy, and the companies found that they recover full model accuracy with this training. The smaller model takes one-third of the time and energy during inference as the original, full model. “What makes these novel applications possible in our hardware,” Sharapov says, “Is that there’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.”

Errors in Navigational Models Could Have an Easy Answer

Rahul Rao — Fri, 07 Jun 2024 11:00:03 +0000

Just as early mariners used simple compasses to chart courses across the sea, today’s ships, planes, satellites, and smartphones can rely on Earth’s magnetic field to find their bearings. The difference is that today’s rather more sophisticated compasses have the aid of complex models, like the commonly used World Magnetic Model (WMM), that try to capture the multifaceted processes that create Earth’s magnetosphere. A compass can rely on the WMM or similar models to convert a needle pointing to magnetic north to a heading with respect to true north. (The two norths differ by ever-changing angles.)

These models are not perfect: There are differences between the magnetosphere that they predict and the magnetosphere that satellites observe. Scientists have traditionally ascribed these differences to space currents that flow through the magnetic field high in Earth’s upper atmosphere. But new research complicates the picture, suggesting that the differences are the result of observational biases, incomplete models, or both.

For craft that require sensitive navigation, particularly around Earth’s poles, any of these complications pose a problem. And those problems stand to grow as polar ice melts around the North Pole, opening up potential new shipping routes.

Earth’s magnetic field is multifaceted and complex, but models like the WMM can project it out a few years at a time. The WMM’s current edition, released in December 2019, contains estimates of Earth’s magnetic field from the start of 2020 to the end of 2024. (The next version, covering 2025 through 2029, is scheduled for release in December of this year.)

“Compasses need to account for space currents already, but this adds more complication and sources of noise that have to be dealt with.” —Mark Moldwin, University of Michigan

These models do not always account for space currents, which are often pushed around by extraterrestrial forces like the solar wind. But if space currents are responsible for the discrepancies between models and observations, scientists could identify them by simply finding the differences, which they call “residuals.” Moreover, there would then be little reason for one of Earth’s hemispheres to display more residuals than the other—except that’s what existing models predict.

But the new study’s authors, space physicists Yining Shi and Mark Moldwin from the University of Michigan, had been among a number of researchers who had spotted an imbalance in residuals. More residuals seemed to emerge from the magnetic woodwork, so to speak, in the southern hemisphere than in the Northern Hemisphere. “We wanted to take a closer look at them,” Moldwin said.

Shi and Moldwin compared estimates between 2014 and 2020 from another Earth magnetic field model, IGRF-13, with observations from the European Space Agency’s Swarm mission, a trio of satellites that have continually measured Earth’s magnetic field since their 2014 launch.

When they focused on residuals over that time period, they did indeed find about 12 percent more major residuals in the Southern Hemisphere than in the Northern. All of these large residuals were found in the polar regions. Many were concentrated at latitudes of 70 degrees north and south, where scientists expect to find space currents.

But another spate of residuals were concentrated closer to Earth’s geographic poles, about 80 degrees north and south, where they have no obvious geophysical explanation. Moreover, the distributions of these poles differed—matching the fact that Earth’s geographic poles map to different magnetic coordinates.

This second peak in particular led the researchers to consider alternative explanations. It is possible, for instance, that IGRF-13 simply does not capture all of the factors driving Earth’s magnetosphere around the poles. But another cause could be the satellites themselves. Shi and Moldwin say that, because Swarm satellites reside in orbits that cross the poles, Earth’s northern and southern polar regions are overrepresented in their magnetic measurements.

“Compasses need to account for space currents already, but this adds more complication and sources of noise that have to be dealt with,” Moldwin said.

Now, Shi is examining these residuals more closely to pick apart the causes of the residuals—which ones have actual geophysical explanations and which are the result of statistical errors.

Shi and Moldwin published their work on 6 May in Journal of Geophysical Research: Space Physics.

IEEE Spectrum

Startup Says It Can Make a 100x Faster CPU

Predicting Malicious Behavior on X Before It Happens

Neuromorphic Wires Amplify Their Own Signals

Neuromorphic Devices at the Edge of Chaos

Challengers Are Coming for Nvidia’s Crown

Nvidia’s Armory

AMD: The other GPU maker

Intel: Software success

Cerebras: Bigger is better

SambaNova: A transformer for transformers

Groq: Form for function

Qualcomm: Power is everything

The Hyperscalers: Custom brains for brawn

Chinese chips: An opaque future

Room for more

Amazon's Secret Weapon in Chip Design Is Amazon

From Punch Cards to Python

Eliminating the punch-card system

Easier, faster, and more accurate programming

About Grace Hopper

Transistor-like Qubits Hit Key Benchmark

Qubits That Act Like Transistors

Where VR Gaming Took a Wrong Turn

VR Gaming’s Contemporary Emergence

The Three Wrong Assumptions of VR Gaming

When Immersion Is Too Much

Casual Virtual Reality?

AI Inference Competition Heats Up

The power of Blackwell

Untether AI shines in power use and at the edge

Cerebras, Furiosa skip MLPerf but announce new chips

A Match Made in Yorktown Heights

Top Programming Languages Methodology 2024

Google

Stack Overflow

IEEE Xplore Digital Library

IEEE Job Site

CareerBuilder

GitHub

Trinity College Dublin Library

Discord

The Top Programming Languages 2024

IBM’s Big Bet on the Quantum-Centric Supercomputer

A Tale of Bits and Qubits

The Quantum-Centric Supercomputer’s Center

Getting the Quantum Stuff up to Snuff

Error Correction to the Rescue

Error Correction

Hybrid Classical-Quantum Computers for the Win

2024

2025

2027

2029

2033

Knitting Together a Quantum-Centric Supercomputer

Combining the strengths of quantum and classical

Nasir Ahmed: An Unsung Hero of Digital Media

Raised to Love Technology

The Seed of an Idea

DCT Compression Comes Together

JPEGs, MPEGs, and More

NIST Announces Post-Quantum Cryptography Standards

Why we need post-quantum cryptography now

NIST’s competition for the best quantum-safe algorithm

Lattice cryptography survives alarms over vulnerabilities

Amazon Vies for Nuclear-Powered Data Center

Amazon’s nuclear power deal meets hurdles

Costs of data centers seeking nuclear energy

Data centers face energy supply challenges

Quantum Computers, Coming to a Data Center Near You

Hardware Hiccups

Software Stumbling Blocks

Language Barrier

How Amazon’s New CPU Fights Cybersecurity Threats

Graviton4’s Security Focus

Graviton4’s Computing Power

The Rise of Groupware

How the PC made us forget about collaboration for a while

Groupware comes of age

References

Nvidia Conquers Latest AI Tests