Math

Huge Math Error Corrected In Black Plastic Study (arstechnica.com) 105

Ars Technica's Beth Mole reports: Editors of the environmental chemistry journal Chemosphere have posted an eye-catching correction to a study reporting toxic flame retardants from electronics wind up in some household products made of black plastic, including kitchen utensils. The study sparked a flurry of media reports a few weeks ago that urgently implored people to ditch their kitchen spatulas and spoons. Wirecutter even offered a buying guide for what to replace them with. The correction, posted Sunday, will likely take some heat off the beleaguered utensils. The authors made a math error that put the estimated risk from kitchen utensils off by an order of magnitude.

Specifically, the authors estimated that if a kitchen utensil contained middling levels of a key toxic flame retardant (BDE-209), the utensil would transfer 34,700 nanograms of the contaminant a day based on regular use while cooking and serving hot food. The authors then compared that estimate to a reference level of BDE-209 considered safe by the Environmental Protection Agency. The EPA's safe level is 7,000 ng -- per kilogram of body weight -- per day, and the authors used 60 kg as the adult weight (about 132 pounds) for their estimate. So, the safe EPA limit would be 7,000 multiplied by 60, yielding 420,000 ng per day. That's 12 times more than the estimated exposure of 34,700 ng per day. However, the authors missed a zero and reported the EPA's safe limit as 42,000 ng per day for a 60 kg adult. The error made it seem like the estimated exposure was nearly at the safe limit, even though it was actually less than a tenth of the limit.
"We regret this error and have updated it in our manuscript," the authors said in a correction.

"This calculation error does not affect the overall conclusion of the paper," the correction reads. The study maintains that flame retardants "significantly contaminate" the plastic products, which have "high exposure potential."
AI

Microsoft Announces Phi-4 AI Model Optimized for Accuracy and Complex Reasoning (computerworld.com) 31

An anonymous reader shared this report from Computerworld: Microsoft has announced Phi-4 — a new AI model with 14 billion parameters — designed for complex reasoning tasks, including mathematics. Phi-4 excels in areas such as STEM question-answering and advanced problem-solving, surpassing similar models in performance. Phi-4, part of the Phi small language models (SLMs), is currently available on Azure AI Foundry under the Microsoft Research License Agreement and will launch on Hugging Face [this] week, the company said in a blog post.

The company emphasized that Phi-4's design focuses on improving accuracy through enhanced training and data curation.... "Phi-4 outperforms comparable and even larger models on tasks like mathematical reasoning, thanks to a training process that combines synthetic datasets, curated organic data, and innovative post-training techniques," Microsoft said in its announcement. The model leverages a new training approach that integrates multi-agent prompting workflows and data-driven innovations to enhance its reasoning efficiency. The accompanying report highlights that Phi-4 balances size and performance, challenging the industry norm of prioritizing larger models... Phi-4 achieved a score of 80.4 on the MATH benchmark and has surpassed other systems in problem-solving and reasoning evaluations, according to the technical report accompanying the release. This makes it particularly appealing for domain-specific applications requiring precision, like scientific computation or advanced STEM problem-solving.

Microsoft emphasized its commitment to ethical AI development, integrating advanced safety measures into Phi-4. The model benefits from Azure AI Content Safety features such as prompt shields, protected material detection, and real-time application monitoring. These features, Microsoft explained, help users address risks like adversarial prompts and data security threats during AI deployment. The company also reiterated that Azure AI Foundry, the platform hosting Phi-4, offers tools to measure and mitigate AI risks. Developers using the platform can evaluate and improve their models through built-in metrics and custom safety evaluations, Microsoft added... With Phi-4, Microsoft continues to evolve its AI offerings while promoting responsible use through robust safeguards. Industry watchers will observe how this approach shapes adoption in critical fields where reasoning and security are paramount.

AI

Are People Starting to Love Self-Driving Robotaxis? (marketplace.org) 106

"In a tiny handful of places..." Wired wrote last month, "you can find yourself flanked by taxis with no one in the drivers' seats." But they added that "Granted, practically everyone has been numbed by the hype cycle."

Wired's response? "[P]ile a few of us into an old-fashioned, human-piloted hired car, then follow a single Waymo robotaxi wherever it goes for a whole workday" to "study its movements, its relationship to life on the streets, its whole self-driving gestalt. We'll interview as many of its passengers as will speak to us, and observe it through the eyes of the kind of human driver it's designed to replace."

This week Wired senior editor John Gravios discussed the experience on the business-news radio show Marketplace (with Marketplace host Kai Ryssdal): Ryssdal: What kinds of reactions did you get from people once you track them down, what did they say about their experience in this driverless car?

Gravios:It was pretty uniform and impressive how much people just love it. They just like the experience of the drive, I guess it's a little bit less herky-jerky than a human driver, but I think a lot of it just comes down to people are just kind of relieved not to have to talk to somebody else, as as sad as that is...

Ryssdal: Tell me about Gabe, your Uber driver, and his thoughts on this whole thing, because that was super interesting.

Gravios: So Gabe, this is a guy whose labor is directly at stake. You know, he's a guy whose labor is going to be replaced by a Waymo. He's had 30 years of experience as a professional driver, first as a taxi driver. He even organized a taxi driver strike in the days before Uber. His first, I think his prejudice with Waymo is having shared the road with them sort of sporadically, he thought of them as kind of dopey, rule-following, frustrating vehicles to share the road with. But over the course of the day, he started to recognize that the Waymo was driving a lot like a taxi driver. The Waymo was doing things that were aggressive, that are exactly the kinds of things that a taxi driver is trained to be aggressive with and doing things that were cautious that are exactly the kinds of things that taxi drivers are trained to be cautious with.

Ryssdal: Can we talk unit economics here? According to the math from a study you guys' cite, Waymo is not making a whole lot of money per vehicle, right? And eventually they're going to scale, and it's going to work out, but for the moment, even though they've gotten 11 billion-something-dollars, they're not turning a whole lot of profit here.

Gravios: Yeah, that's a big question, and the math is, even that study, based on a lot of guesswork. It's really hard to say what the unit economics are. What we can say is that the ridership rates are going up so fast that that study is already well out of date. When we were doing our chase, I think the monthly ridership for Waymo was 100,000 rides a month. By October, it was already 150,000 rides a month. So, the economics are just shifting under our feet a lot.

AI

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (wired.com) 27

Harvard University announced Thursday it's releasing a high-quality dataset of nearly one million public-domain books that could be used by anyone to train large language models and other AI tools. From a report: The dataset was created by Harvard's newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta's Llama, the Institutional Data Initiative's database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to "level the playing field" by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. "It's gone through rigorous review," he says.

Leppert believes the new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models. "I think about it a bit like the way that Linux has become a foundational operating system for so much of the world," he says, noting that companies would still need to use additional training data to differentiate their models from those of their competitors.

AI

OpenAI Releases 'Smarter, Faster' ChatGPT - Plus $200-a-Month Subscriptions for 'Even-Smarter Mode' (venturebeat.com) 64

Wednesday OpenAI CEO Sam Altman announced "12 Days of OpenAI," promising that "Each weekday, we will have a livestream with a launch or demo..." And sure enough, today he announced the launch of two things:

- "o1, the smartest model in the world. Smarter, faster, and more features (e.g. multimodality) than o1-preview. Live in ChatGPT now, coming to API soon."

- "ChatGPT Pro. $200/month. Unlimited usage and even-smarter mode for using o1. More benefits to come!"

Altman added this update later: For extra clarity: o1 is available in our plus tier, for $20/month. With the new pro tier ($200/month), it can think even harder for the hardest problems. Most users will be very happy with o1 in the plus tier!
VentureBeat points out that subscribers "also gain access to GPT-4o, known for its advanced natural language generation capabilities, and the Advanced Voice feature for speech-based interactions."

And even for non-subscribers, ChatGPT can now also analyze images, points out VentureBeat, "a hugely helpful feature upgrade as it enables users to upload photos and have the AI chatbot respond to them, giving them detailed plans on how to build a birdhouse entirely from a single candid photo of one, for one fun example." In another, potentially more serious and impressive example, it is now capable of helping design data centers from sketches... o1 represents a significant evolution in reasoning model capabilities, including better handling of complex tasks, image-based reasoning, and enhanced accuracy. Enterprise and Education users will gain access to the model next week... OpenAI's updates also include safety enhancements, with the o1-preview scoring 84 on a rigorous safety test, compared to 22 for its predecessor...

To encourage the use of AI in societal-benefit fields, OpenAI has announced the ChatGPT Pro Grant Program. The initiative will initially award 10 grants to leading medical researchers, providing free access to ChatGPT Pro tools.

In a video Altman displays graphs showing o1 dramatically outperforms gpt4o on math questions, on competition coding at CodeForces, and on PhD-level science questions.
Earth

The New Climate Math on Hurricanes 136

Climate change has intensified hurricane wind speeds by an average of 19 mph in 84% of North Atlantic hurricanes between 2019-2024, according to new research that links warming ocean temperatures to storm intensity for individual hurricanes.

This year, Hurricanes Helene and Milton slammed into Florida, breaking meteorological records and causing catastrophic damage. The study by Climate Central found that higher sea surface temperatures elevated most hurricanes by an entire category on the Saffir-Simpson scale, with three storms, including Hurricane Rafael, seeing wind speeds increase by 34 mph due to warming.

Researchers calculated storm intensity using models of pre-warming ocean temperatures. "It's really the evolution of our science on sea surface temperature attribution that has allowed this work to take place," said lead author Daniel Gilford, noting that hurricane damage increases exponentially with wind speed. For example, a storm with double the wind speed can cause 256 times as much damage. The methodology enables scientists to determine climate change impacts on hurricanes in near-real time.
Math

Does Casio's New Calculator Watch Take You Back To 6th Grade Math Class? (techspot.com) 78

Slashdot reader jjslash brings word that Casio "has reintroduced its iconic calculator watch featuring a retro design with green text on a negative LCD and a classic keypad layout."

TechSpot reports that the watch was based on the Casio Mini personal calculator first released in the early 1970s — even offering a keypad using the original fonts (with numbers separated by grid lines): Even the mode button, colored red, is a nod to the calculator's power indicator. The watches' calculator function can add, subtract, multiply, and divide up to eight digits. As for watch functions, you get dual time, an alarm, stopwatch functionality, and more...

Casio's original personal calculator debuted in 1972, and cost $59.95. It featured a six-digit display, was a quarter the size of its competitors, and cost just a third of rival products. The calculator was an instant hit for Casio, selling a million units in the first 10 months on the market and more than six million units over the span of the series.

Long-time Slashdot reader antdude says "I still wear one! Casio Data Bank 150 model...!"

Share your own vintage calculator memories in the comments...
AI

AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test 82

Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week.

The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath's problems, which span computational number theory to algebraic geometry and require complex reasoning.

"These are extremely challenging. [...] The only way to solve them is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages," Tao said. The problems are designed to be "guessproof," with large numerical answers or complex mathematical objects as solutions, making it nearly impossible to solve without proper mathematical reasoning.

Further reading: New secret math benchmark stumps AI models and PhDs alike.
Math

Australian Mathematicians Debunk 'Infinite Monkey Theorem' 124

Australian mathematicians have proven the famous "infinite monkey theorem" impossible within the universe's lifespan. The theorem suggests monkeys typing randomly would eventually produce Shakespeare's complete works. Scientists Stephen Woodcock and Jay Falletta calculated that even 200,000 chimpanzees typing one character per second until the universe's heat death would fail to reproduce Shakespeare's writings.

A single chimp has only a 5% chance of typing "bananas" in its lifetime, with more complex phrases facing astronomically lower odds. "This finding places the theorem among other probability puzzles and paradoxes... where using the idea of infinite resources gives results that don't match up with what we get when we consider the constraints of our universe," Associate Prof Woodcock was quoted as saying by BBC.
Math

Former Nvidia Engineer Discovers 41-Million-Digit Prime (tomshardware.com) 29

Former Nvidia engineer Luke Durant, working with the Great Internet Mersenne Prime Search (GIMPS), recently discovered the largest known prime number: (2^136,279,841)-1 or M136279841 (where the number following the letter M represents the exponent). The achievement was detailed on Mersenne.org. Tom's Hardware reports: This is the largest prime number we've seen so far, with the last one, M82589933, being discovered six years prior. What makes this discovery particularly fascinating is that this is the first GIMPS discovery that used the power of data center GPUs. Mihai Preda was the first one to harness GPU muscle in 2017, says the GIMPS website, when he "wrote the GpuOwl program to test Mersenne numbers for primarilty, making his software available to all GIMPS users." When Luke joined GIMPS in 2023, they built the infrastructure needed to deploy Preda's software across several GPU servers available in the cloud.

While it took a year of testing, Luke's efforts finally bore fruit when an A100 GPU in Dublin, Ireland gave the M136279841 result last October 11. This was then corroborated by an Nvidia H100 located in San Antonio, Texas, which confirmed its primality with the Lucas-Lehmer test.

AI

Anthropic's AI Can Now Run And Write Code (techcrunch.com) 23

Anthropic's Claude chatbot can now write and run JavaScript code. TechCrunch: Today, Anthropic launched a new analysis tool that helps Claude respond with what the company describes as "mathematically precise and reproducible answers." With the tool enabled -- it's currently in preview -- Claude can perform calculations and analyze data from files like spreadsheets and PDFs, rendering the results as interactive visualizations.

"Think of the analysis tool as a built-in code sandbox, where Claude can do complex math, analyze data, and iterate on different ideas before sharing an answer," Anthropic wrote in a blog post. "Instead of relying on abstract analysis alone, it can systematically process your data -- cleaning, exploring, and analyzing it step-by-step until it reaches the correct result." Anthropic gives a few examples of where this might be useful. For instance, a product manager could upload sales data and ask Claude for country-specific performance analysis, while an engineer could give Claude monthly financial data and have it create a dashboard highlighting key trends.

Math

Physicist Reveals Why You Should Run in The Rain (sciencealert.com) 116

Theoretical Physicist Jacques Treiner, from the University of Paris Cite, explains why you should run in the rain: ... Let p represent the number of drops per unit volume, and let a denote their vertical velocity. We'll denote Sh as the horizontal surface area of the individual (e.g., the head and shoulders) and Sv as the vertical surface area (e.g., the body). When you're standing still, the rain only falls on the horizontal surface, Sh. This is the amount of water you'll receive on these areas. Even if the rain falls vertically, from the perspective of a walker moving at speed v, it appears to fall obliquely, with the angle of the drops' trajectory depending on your speed. During a time period T, a raindrop travels a distance of aT. Therefore, all raindrops within a shorter distance will reach the surface: these are the drops inside a cylinder with a base of Sh and a height of aT, which gives:
p.Sh.a.T.

As we have seen, as we move forward, the drops appear to be animated by an oblique velocity that results from the composition of velocity a and velocity v. The number of drops reaching Sh remains unchanged, since velocity v is horizontal and therefore parallel to Sh. However, the number of drops reaching surface Sv -- which was previously zero when the walker was stationary -- has now increased. This is equal to the number of drops contained within a horizontal cylinder with a base area of Sv and a length of v.T. This length represents the horizontal distance the drops travel during this time interval. In total, the walker receives a number of drops given by the expression:
p.(Sh.a + Sv.v). T

Now we need to take into account the time interval during which the walker is exposed to the rain. If you're covering a distance d at constant speed v, the time you spend walking is d/v. Plugging this into the equation, the total amount of water you encounter is:
p.(Sh.a + Sv.v). d/v = p.(Sh.a/v + Sv). d
This equation proves that the faster you move, the less water hits your head and shoulders, but the amount of water hitting the vertical part of your body remains constant. To stay drier, it's best to move quickly and lean forward. However, you'll have to increase your speed to offset the exposed surface area caused by leaning.
Math

A Calculator's Most Important Button Has Been Removed (theatlantic.com) 108

Apple's latest iOS update has removed the "C" button from its Calculator app, replacing it with a backspace function. The change, part of iOS 18, has sparked debate among users accustomed to the traditional clear function. The removal of the "C" button represents a significant departure from decades-old calculator design conventions, The Atlantic writes. From the story: The "C" button's function is vestigial. Back when calculators were commercialized, starting in the mid-1960s, their electronics were designed to operate as efficiently as possible. If you opened up a desktop calculator in 1967, you might have found a dozen individual circuit boards to run and display its four basic mathematical functions. Among these would have been an input buffer or temporary register that could store an input value for calculation and display. The "C" button, which was sometimes labeled "CE" (Clear Entry) or "CI" (Clear Input), provided a direct interface to zero out -- or "clear" -- such a register. A second button, "AC" (All Clear), did the same thing, but for other parts of the circuit, including previously stored operations and pending calculations. (A traditional calculator's memory buttons -- "M+," "M-," "MC" -- would perform simple operations on a register.)

By 1971, Mostech and Texas Instruments had developed a "calculator on a chip," which condensed all of that into a single integrated circuit. Those chips retained the functions of their predecessors, including the ones that were engaged by "C" and "AC" buttons. And this design continued on into the era of pocket calculators, financial calculators, and even scientific calculators such as the ones you may have used in school. Some of the latter were, in essence, programmable pocket computers themselves, and they could have been configured with a backspace key. They were not.

Math

52nd Known Mersenne Prime Found (mersenne.org) 61

chalsall writes: After more than six years of work since the last discovery, the Great Internet Mersenne Prime Search (GIMPS) has found the 52nd known Mersenne Prime number. This is also the largest prime number known to humans.

The number is 2^136,279,841-1, which is 41,024,320 decimal digits long.

Luke Durant, a researcher from San Jose, CA, found it after contributing a fantastic amount of compute to the GIMPS project.

News

'A Nobel For the Big Big Questions' (noahpinion.blog) 15

In a rather critical analysis of the 2024 Economics Nobel, commentator Noah Smith has questioned the prize's shift back to "big-think" theories. He argues that Acemoglu, Johnson, and Robinson's (the winner of the 2024 Economics Nobel) influential work on institutions and development, while intriguing, lacks robust empirical validation. From his blog: The science prizes rely very heavily on external validity to determine who gets the prize -- your theory or your invention has to work, basically. If it doesn't, you can be the biggest genius in the world, but you'll never get a Nobel. The physicist Ed Witten won a Fields Medal, which is even harder to get than a Nobel, for the math he invented for string theory. But he'll almost certainly never get a Physics Nobel, because string theory can't be empirically tested.

The Econ Nobel is different. Traditionally, it's given to economists whose ideas are most influential within the economics profession. If a whole bunch of other economists do research that follows up on your research, or which uses theoretical or empirical techniques you pioneered, you get an Econ Nobel. Your theory doesn't have to be validated, your specific empirical findings can already have been overturned by the time the prize is awarded, but if you were influential, you get the prize.

You could argue that this is appropriate for what Thomas Kuhn would call a "pre-paradigmatic" science -- a field that's still looking for a set of basic concepts and tools. But it's been 55 years since they started giving the prize, and that seems like an awfully long time for a field to still be tooling up. Meanwhile, making "influence within the economics profession" the criterion for successful research seems a little too much like a popularity contest. It's how you end up with prizes like the one in 2004, which was given to some macroeconomic theorists whose theory said that recessions are caused by technological slowdowns and that mass unemployment is a voluntary vacation.

In recent years, that looked like it might be changing. Often, the prize was given to empirical economists associated with the so-called "credibility revolution" -- basically, quasi-experiments. Those cases include Goldin in 2023, Card/Angrist/Imbens in 2021, and Banerjee/Duflo/Kremer in 2019. And when it was given to theorists, they tended to be game theorists whose theories are very predictive of real-world outcomes -- Milgrom/Wilson in 2020, Hart/Holmstrom in 2016, Tirole in 2014, and Roth/Shapley in 2012. Even when the prize was given to macro -- a field where validity is much harder to establish -- it was given to economists whose theories have seen immediate application to pressing problems of the day, such as Bernanke/Diamond/Dybvig in 2022 and Nordhaus in 2018. In other words, the recent Nobels have made it seem like economics might be becoming more like a natural science, where practical applications and external validity are the ultimate arbiter of the value of research, rather than cultural influence within the economics profession. But this year's prize seems like a step away from that, and back toward the sort of big-think that used to be more popular in the prize's early years.

AI

Study Done By Apple AI Scientists Proves LLMs Have No Ability to Reason (appleinsider.com) 233

Slashdot reader Rick Schumann shared this report from the blog AppleInsider: A new paper from Apple's artificial intelligence scientists has found that engines based on large language models, such as those from Meta and OpenAI, still lack basic reasoning skills.

The group has proposed a new benchmark, GSM-Symbolic, to help others measure the reasoning capabilities of various large language models (LLMs). Their initial testing reveals that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models. The group investigated the "fragility" of mathematical reasoning by adding contextual information to their queries that a human could understand, but which should not affect the fundamental mathematics of the solution. This resulted in varying answers, which shouldn't happen...

The study found that adding even a single sentence that appears to offer relevant information to a given math question can reduce the accuracy of the final answer by up to 65 percent. "There is just no way you can build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bit of irrelevant info can give you a different answer," the study concluded... "We found no evidence of formal reasoning in language models," the new study concluded. The behavior of LLMS "is better explained by sophisticated pattern matching" which the study found to be "so fragile, in fact, that [simply] changing names can alter results."

Math

Researchers Claim New Technique Slashes AI Energy Use By 95% (decrypt.co) 115

Researchers at BitEnergy AI, Inc. have developed Linear-Complexity Multiplication (L-Mul), a technique that reduces AI model power consumption by up to 95% by replacing energy-intensive floating-point multiplications with simpler integer additions. This method promises significant energy savings without compromising accuracy, but it requires specialized hardware to fully realize its benefits. Decrypt reports: L-Mul tackles the AI energy problem head-on by reimagining how AI models handle calculations. Instead of complex floating-point multiplications, L-Mul approximates these operations using integer additions. So, for example, instead of multiplying 123.45 by 67.89, L-Mul breaks it down into smaller, easier steps using addition. This makes the calculations faster and uses less energy, while still maintaining accuracy. The results seem promising. "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products," the researchers claim. Without getting overly complicated, what that means is simply this: If a model used this technique, it would require 95% less energy to think, and 80% less energy to come up with new ideas, according to this research.

The algorithm's impact extends beyond energy savings. L-Mul outperforms current 8-bit standards in some cases, achieving higher precision while using significantly less bit-level computation. Tests across natural language processing, vision tasks, and symbolic reasoning showed an average performance drop of just 0.07% -- a negligible tradeoff for the potential energy savings. Transformer-based models, the backbone of large language models like GPT, could benefit greatly from L-Mul. The algorithm seamlessly integrates into the attention mechanism, a computationally intensive part of these models. Tests on popular models such as Llama, Mistral, and Gemma even revealed some accuracy gain on certain vision tasks.

At an operational level, L-Mul's advantages become even clearer. The research shows that multiplying two float8 numbers (the way AI models would operate today) requires 325 operations, while L-Mul uses only 157 -- less than half. "To summarize the error and complexity analysis, L-Mul is both more efficient and more accurate than fp8 multiplication," the study concludes. But nothing is perfect and this technique has a major achilles heel: It requires a special type of hardware, so the current hardware isn't optimized to take full advantage of it. Plans for specialized hardware that natively supports L-Mul calculations may be already in motion. "To unlock the full potential of our proposed method, we will implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design," the researchers say.

Space

Stephen Hawking Was Wrong - Extremal Black Holes Are Possible (quantamagazine.org) 44

"Even black holes have edge cases," writes Astronomy magazine contributing editor Steve Nadis, in an article in Quanta magazine (republished today by Wired): Black holes rotate in space. As matter falls into them, they start to spin faster; if that matter has charge, they also become electrically charged. In principle, a black hole can reach a point where it has as much charge or spin as it possibly can, given its mass. Such a black hole is called "extremal" — the extreme of the extremes. These black holes have some bizarre properties. In particular, the so-called surface gravity at the boundary, or event horizon, of such a black hole is zero. "It is a black hole whose surface doesn't attract things anymore," said Carsten Gundlach, a mathematical physicist at the University of Southampton. But if you were to nudge a particle slightly toward the black hole's center, it would be unable to escape.

In 1973, the prominent physicists Stephen Hawking, James Bardeen and Brandon Carter asserted that extremal black holes can't exist in the real world — that there is simply no plausible way that they can form. Nevertheless, for the past 50 years, extremal black holes have served as useful models in theoretical physics. "They have nice symmetries that make it easier to calculate things," said Gaurav Khanna of the University of Rhode Island, and this allows physicists to test theories about the mysterious relationship between quantum mechanics and gravity. Now two mathematicians have proved Hawking and his colleagues wrong. The new work — contained in a pair of recent papers by Christoph Kehle of the Massachusetts Institute of Technology and Ryan Unger of Stanford University and the University of California, Berkeley — demonstrates that there is nothing in our known laws of physics to prevent the formation of an extremal black hole.

Their mathematical proof is "beautiful, technically innovative and physically surprising," said Mihalis Dafermos, a mathematician at Princeton University (and Kehle's and Unger's doctoral adviser). It hints at a potentially richer and more varied universe in which "extremal black holes could be out there astrophysically," he added. That doesn't mean they are. "Just because a mathematical solution exists that has nice properties doesn't necessarily mean that nature will make use of it," Khanna said. "But if we somehow find one, that would really [make] us think about what we are missing." Such a discovery, he noted, has the potential to raise "some pretty radical kinds of questions." Before Kehle and Unger's proof, there was good reason to believe that extremal black holes couldn't exist.

Hawking, Bardeen, and Carter believed there was no way an extremal black hole could form, according to the article, and "in 1986, a physicist named Werner Israel seemed to put the issue to rest."

But the two mathematicians, studying the formation of electrically charged black holes, stumbled into a counterexample — and along the way "also constructed two other solutions to Einstein's equations of general relativity that involved different ways of adding charge to a black hole. Having disproved Bardeen, Carter and Hawking's hypothesis in three different contexts, the work should leave no doubt, Unger said... "This is a beautiful example of math giving back to physics," said Elena Giorgi, a mathematician at Columbia University....

In the meantime, a better understanding of extremal black holes can provide further insights into near-extremal black holes, which are thought to be plentiful in the universe. "Einstein didn't think that black holes could be real [because] they're just too weird," Khanna said. "But now we know the universe is teeming with black holes."

For similar reasons, he added, "we shouldn't give up on extremal black holes. I just don't want to put limits on nature's creativity."

AI

OpenAI Releases o1, Its First Model With 'Reasoning' Abilities 108

OpenAI has launched a new AI model, named "o1", designed for improved reasoning and problem-solving skills. o1, part of a new series of models and available in ChatGPT and the API, can tackle complex tasks in science, coding, and math more effectively than their predecessors. Notably, o1 models have shown promising results in standardized tests and coding competitions. While o1 models represent a significant advancement in AI capabilities, they currently lack features like web browsing and file uploading. The Verge adds: But it's also more expensive and slower to use than GPT-4o. OpenAI is calling this release of o1 a "preview" to emphasize how nascent it is.

ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn't set a release date yet. Developer access to o1 is really expensive: In the API, o1-preview is $15 per 1 million input tokens, or chunks of text parsed by the model, and $60 per 1 million output tokens. For comparison, GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens.

The training behind o1 is fundamentally different from its predecessors, OpenAI's research lead, Jerry Tworek, tells me, though the company is being vague about the exact details. He says o1 "has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it."
Education

College Grades Have Become a Charade. It's Time To Abolish Them. (msn.com) 234

When most students get As, grading loses all meaning as a way to encourage exceptional work and recognize excellence. From a report: Grade inflation at American universities is out of control. The statistics speak for themselves. In 1950, the average GPA at Harvard was estimated at 2.6 out of 4. By 2003, it had risen to 3.4. Today, it stands at 3.8. The more elite the college, the more lenient the standards. At Yale, for example, 80% of grades awarded in 2023 were As or A minuses. But the problem is also prevalent at less selective colleges. Across all four-year colleges in the U.S., the most commonly awarded grade is now an A. Some professors and departments, especially in STEM disciplines, have managed to uphold more stringent criteria. A few advanced courses attract such a self-selecting cohort of students that virtually all of them deserve recognition for genuinely excellent work. But for the most part, the grading scheme at many institutions has effectively become useless. An A has stopped being a mark of special academic achievement.

If everyone outside hard-core engineering, math or pre-med courses can easily get an A, the whole system loses meaning. It fails to make distinctions between different levels of achievement or to motivate students to work hard on their academic pursuits. All the while, it allows students to pretend -- to themselves and to others -- that they are performing exceptionally well. Worse, this system creates perverse incentives. To name but one, it actively punishes those who take risks by enrolling in truly challenging courses. All of this contributes to the strikingly poor record of American colleges in actually educating their students. As Richard Arum and Josipa Roksa showed in their 2011 book "Academically Adrift," the time that the average full-time college student spent studying dropped by half in the five decades after 1960, falling to about a dozen hours a week. A clear majority of college students "showed no significant progress on tests of critical thinking, complex reasoning and writing," with about half failing to make any improvements at all in their first two years of higher education.

Slashdot Top Deals