AI

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (wired.com) 27

Harvard University announced Thursday it's releasing a high-quality dataset of nearly one million public-domain books that could be used by anyone to train large language models and other AI tools. From a report: The dataset was created by Harvard's newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta's Llama, the Institutional Data Initiative's database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to "level the playing field" by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. "It's gone through rigorous review," he says.

Leppert believes the new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models. "I think about it a bit like the way that Linux has become a foundational operating system for so much of the world," he says, noting that companies would still need to use additional training data to differentiate their models from those of their competitors.

Programming

Amazon Says Developers Spend 'Just One Hour Per Day' on Actual Coding (fortune.com) 152

An anonymous reader shares a report: Amazon Web Services said in a post earlier this month that developers report spending an average of "just one hour per day" on actual coding. But that doesn't mean these workers twiddle their thumbs the remaining seven hours per day. Instead, developers spend the majority of their time on "tedious, undifferentiated tasks such as learning codebases, writing and reviewing documentation, testing, managing deployments, troubleshooting issues or finding and fixing vulnerabilities," according to Amazon Web Services.
Businesses

Startup Will Brick $800 Emotional Support Robot For Kids Without Refunds (arstechnica.com) 144

Startup Embodied is closing down, and its product, an $800 robot for kids ages 5 to 10, will soon be bricked. From a report: Embodied blamed its closure on a failed "critical funding round." On its website, it explained: "We had secured a lead investor who was prepared to close the round. However, at the last minute, they withdrew, leaving us with no viable options to continue operations. Despite our best efforts to secure alternative funding, we were unable to find a replacement in time to sustain operations."

The company didn't provide further details about the pulled funding. Embodied's previous backers have included Intel Capital, Toyota AI Ventures, Amazon Alexa Fund, Sony Innovation Fund, and Vulcan Capital, but we don't know who the lead investor mentioned above is. When it first announced Moxie in April 2020, Embodied described the robot as a "safe and engaging animate companion for children designed to help promote social, emotional, and cognitive development."

AI

Photobucket Sued Over Plans To Sell User Photos, Biometric Identifiers To AI Companies (arstechnica.com) 22

Photobucket was sued Wednesday after a recent privacy policy update revealed plans to sell users' photos -- including biometric identifiers like face and iris scans -- to companies training generative AI models. From a report: The proposed class action seeks to stop Photobucket from selling users' data without first obtaining written consent, alleging that Photobucket either intentionally or negligently failed to comply with strict privacy laws in states like Illinois, New York, and California by claiming it can't reliably determine users' geolocation.

Two separate classes could be protected by the litigation. The first includes anyone who ever uploaded a photo between 2003 -- when Photobucket was founded -- and May 1, 2024. Another potentially even larger class includes any non-users depicted in photographs uploaded to Photobucket, whose biometric data has also allegedly been sold without consent.

Photobucket risks huge fines if a jury agrees with Photobucket users that the photo-storing site unjustly enriched itself by breaching its user contracts and illegally seizing biometric data without consent. As many as 100 million users could be awarded untold punitive damages, as well as up to $5,000 per "willful or reckless violation" of various statutes.

The Internet

Russia Tests Cutting Off Access To Global Web, and VPNs Can't Get Around It (pcmag.com) 123

An anonymous reader shares a report: Russia has reportedly cut some regions of the country off from the rest of the world's internet for a day, effectively siloing them, according to reports from European and Russian news outlets reshared by the US nonprofit Institute for the Study of War (ISW) and Western news outlets.

Russia's communications authority, Roskomnadzor, blocked residents in Dagestan, Chechnya, and Ingushetia, which have majority-Muslim populations, ISW says. The three regions are in southwest Russia near its borders with Georgia and Azerbaijan. People in those areas couldn't access Google, YouTube, Telegram, WhatsApp, or other foreign websites or apps -- even if they used VPNs, according to a local Russian news site.

Russian digital rights NGO Roskomsvoboda told TechRadar that most VPNs didn't work during the shutdown, but some apparently did. It's unclear which ones or how many actually worked, though. Russia has been increasingly blocking VPNs more broadly, and Apple has helped the country's censorship efforts by taking down VPN apps on its Russian App Store. At least 197 VPNs are currently blocked in Russia, according to Russian news agency Interfax.

Stats

'The Dying Language of Accounting' (wsj.com) 177

Paul Knopp, KPMG US CEO, writing in an op-ed on WSJ: According to a United Nations estimate, 230 languages went extinct between 1950 and 2010. If my profession doesn't act, the language of business -- accounting -- could vanish too. The number of students who took the exam to become certified public accountants in 2022 hit a 17-year low. From 2020 to 2022, bachelor's degrees in accounting dropped 7.8% after steady declines since 2018.

While the shortage isn't yet an issue for the country's largest firms, it's beginning to affect our economy and capital markets. In the first half of 2024, nearly 600 U.S.-listed companies reported material weaknesses related to personnel. S&P Global analysts last year warned that many municipalities were at risk of having their credit ratings downgraded or withdrawn due to delayed financial disclosures.

Our profession must remove hurdles to learning the accounting language while preserving quality. In October, KPMG became the first large accounting firm to advocate developing alternate paths to CPA licensing. We want pathways that emphasize experience, not academic credits, after college. Most people today must earn 30 credits after their bachelor's degrees -- the so-called 150-hour rule -- work under a licensed CPA for a year, and pass the CPA exam to become licensed.

Research by the Center for Audit Quality finds that the 150-hour rule is among the top reasons people don't pursue CPA licensure. A December 2023 study found that the requirement causes a 26% drop in interest among minorities. There is a consensus for change, but we can't waste time. Many state CPA societies are working on legislation to create an alternative path to licensure. State boards of accountancy should replace the extra academic requirement with more on-the-job experience. A person who is licensed in one state should be able to practice in another even if reforms create different licensing requirements.

Google

Google's New Jules AI Agent Will Help Developers Fix Buggy Code (theverge.com) 19

Google has announced an experimental AI-powered code agent called "Jules" that can automatically fix coding errors for developers. From a report: Jules was introduced today alongside Gemini 2.0, and uses the updated Google AI model to create multi-step plans to address issues, modify multiple files, and prepare pull requests for Python and Javascript coding tasks in GitHub workflows.

Microsoft introduced a similar experience for GitHub Copilot last year that can recognize and explain code, alongside recommending changes and fixing bugs. Jules will compete against Microsoft's offering, and also against tools like Cursor and even Claude and ChatGPT's coding abilities. Google's launch of a coding-focused AI assistant is no surprise -- CEO Sundar Pichai said in October that more than a quarter of all new code at the company is now generated by AI.

"Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build," Google says in its blog post. "This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding."

Businesses

WordPress Chief Quits Community Forum After Court Loss (404media.co) 133

Automattic CEO Matt Mullenweg abruptly left a key WordPress community platform after a federal court ordered his company to restore rival WP Engine's access to WordPress.org and remove a controversial login requirement. The preliminary injunction mandates Automattic eliminate a checkbox that forced users to declare they had no connection to WP Engine before accessing the platform.

Mullenweg departed the Post Status Slack forum following the ruling, writing he was "sick and disgusted to be legally compelled to provide free labor" to WP Engine, according to 404 Media. "It's hard to imagine wanting to continue to working on WordPress after this," he added. The order gives Automattic 72 hours to comply, including reinstating WP Engine's employee credentials and plugin access. The ruling marks a significant development in an escalating dispute between the WordPress parent company and the web hosting provider.
Science

New Magnetic Flow Has Potential To Revolutionise Electronic Devices (ft.com) 40

An international research team has for the first time imaged and controlled a type of magnetic flow called altermagnetism, which physicists say could be used to develop faster and more reliable electronic devices. Financial Times: A groundbreaking experiment at a powerful X-ray microscope in Sweden provides direct proof of the existence of altermagnetism, according to a paper published in Nature on Wednesday. Altermagnetic materials can sustain magnetic activity without themselves being magnetic.

The team from the UK's Nottingham university that led the research said the discovery has revolutionary potential for the electronics industry. "Altermagnets have the potential to lead to a thousand-fold increase in the speed of microelectronic components and digital memory, while being more robust and energy-efficient," said senior author Peter Wadley, Royal Society research fellow at Nottingham.

Hard disks and other components underpinning the modern computers industry process data in ferromagnetic materials, whose intrinsic magnetism limits their speed and packing density. Using altermagnetic materials will allow current to flow in non-magnetic products.

Google

Google Unveils Project Mariner: AI Agents To Use the Web For You 49

Google today unveiled Project Mariner, its first AI agent capable of autonomously navigating web browsers, operating through a Chrome extension that controls cursor movements and form-filling to replicate human interactions online.

The Gemini-powered prototype, developed by Google's DeepMind division, is initially available to a select group of testers. During demos, the agent performed tasks like creating shopping carts on grocery websites, though with noticeable five-second delays between actions. The system captures browser screenshots and processes them through Gemini in the cloud to generate navigation commands.

It operates only in Chrome's active tab, requiring users to observe its actions rather than running in the background. Project Mariner achieved an 83.5% success rate on the WebVoyager benchmark for web-based tasks. The agent has built-in limitations, including inability to complete purchases, accept cookies, or agree to terms of service. Google Labs Director Jaclyn Konzelmann described the project as a "fundamentally new UX paradigm shift" that could transform how users interact with websites. The company said it is engaging with web ecosystem stakeholders as development continues.
The Military

'Modern War Cannot Be Won Without Software,' Palantir Executive Says (calcalistech.com) 151

Software has become essential for winning modern wars, a senior Palantir executive told a defense conference in Israel this week. "Modern war cannot be won without software," said Noam Perski, executive vice president at the data analytics company.

"Seeing software as a defense system, as a weapon system, and the most malleable weapon system we have, is really important as we build the next generation's capabilities." Speaking at Tel Aviv University's first DefenseTech Summit, Perski said human factors still determine military success.
Security

Researchers Uncover Chinese Spyware Used To Target Android Devices (techcrunch.com) 34

Security researchers have uncovered a new surveillance tool that they say has been used by Chinese law enforcement to collect sensitive information from Android devices in China. From a report: The tool, named "EagleMsgSpy," was discovered by researchers at U.S. cybersecurity firm Lookout. The company said at the Black Hat Europe conference on Wednesday that it had acquired several variants of the spyware, which it says has been operational since "at least 2017."

Kristina Balaam, a senior intelligence researcher at Lookout, told TechCrunch the spyware has been used by "many" public security bureaus in mainland China to collect "extensive" information from mobile devices. This includes call logs, contacts, GPS coordinates, bookmarks, and messages from third-party apps including Telegram and WhatsApp. EagleMsgSpy is also capable of initiating screen recordings on smartphones, and can capture audio recordings of the device while in use, according to research Lookout shared with TechCrunch.

A manual obtained by Lookout describes the app as a "comprehensive mobile phone judicial monitoring product" that can obtain "real-time mobile phone information of suspects through network control without the suspect's knowledge, monitor all mobile phone activities of criminals and summarize them."

United Kingdom

UK Low-Carbon Renewable Power Set To Overtake Fossil Fuels For First Time 62

Rising renewables, low demand and cheaper power imports all helped reduce fossil fuel use in the UK power system to record lows. From a report: For the first full year wind, solar, and hydropower will generate more electricity than all fossil fuels combined. Homegrown UK renewable power will cross a significant threshold in 2024, overtaking fossil fuel generation for the first full year. Wind, solar and hydropower are set to generate a combined 37% of UK electricity in 2024 (103 TWh), compared to 35% from fossil fuels (97 TWh). Just 3 years ago, in 2021, fossil fuels generated 46% of UK electricity, while low-carbon renewables generated 27%.

Including biomass, renewables overtook fossil fuels in the UK in 2020, fell below fossil power the following year as biomass production fell, and again overtook in 2023. However, Ember's analysis raises concerns about biomass being categorised as clean power in the UK, given the significant emissions risks and lack of domestic pellet production. Bioenergy, which includes biomass and biogas power, is set to provide 14% of UK electricity in 2024.

Fossil generation in 2024 has fallen by two-thirds since 2000, with the long awaited phase-out of coal power, and gas increasingly displaced by cheaper, cleaner power sources. Coal started to decline rapidly from 2012 and since 2020, coal power has made up only 2% of generation in the UK, dropping to zero by October 2024. Gas has seen a gradual decline since 2016. Across 2024 there has been a large decrease in fossil gas power, which provided 30% of electricity in 2024 (85 TWh), down from 34% in 2023 (98 TWh).
AI

AI App Gold Rush Floods Apple Store With Low-Quality Offerings (theverge.com) 23

AI-powered apps have flooded Apple's App Store, with AI-branded tools dominating top rankings across multiple categories, particularly in graphics and design. An investigation by The Verge reveals significant quality concerns among these applications.

Turkey-based developer HUBX controls three of the top 10 graphics apps, including DaVinci AI, which offers limited free features while charging up to $30 annually for full access. The app produces low-quality images and forces watermarks on paid users' downloads, The Verge writes. According to Sensor Tower data, four of the top 10 most downloaded iOS graphics apps in the U.S. this year include "AI" in their titles.

While established photo editing apps like Photoshop Express saw downloads drop 21%, AI-focused app Photoroom's downloads surged 160% year-over-year. Professional creative apps continue to dominate iPad and paid iPhone categories, suggesting the AI app trend primarily targets casual users seeking free alternatives to paid creative services.
Transportation

Cruise Employees 'Blindsided' By GM's Plan To End Robotaxi Program (techcrunch.com) 70

An anonymous reader shares a report: The news came by Slack message. Cruise CEO Marc Whitten, who took the top post in June, posted a message Tuesday afternoon in the company's announcements channel along with a link to a press release entitled "GM to refocus autonomous driving development on personal vehicles."

GM, which acquired the self-driving car startup in 2016, would no longer fund the company, ending a mission that hundreds of Cruise engineers had worked on for years. Minutes later, during an all-hands meeting, Cruise employees learned a few more details. The self-driving car company would be absorbed into parent company GM and combined with the automaker's own efforts to develop driver assistance features -- and eventually fully autonomous personal vehicles. Whether their jobs would be safe or cut was, and still is, unclear.

That meeting was short and unsatisfactory, according to one source, who noted that the senior leadership team was also surprised by this turn of events. Whitten, president and chief technology officer Mo Elshenawy, and chief administrative officer Craig Glidden, led the all-hands. Several Cruise employees who spoke to TechCrunch on condition of anonymity said they were "surprised" and "blindsided" by the decision. One source told TechCrunch that employees learned about GM's plans the same time the media did.

The Courts

WordPress Parent Company Must Stop Blocking WP Engine, Judge Rules (theverge.com) 66

WP Engine just won a preliminary injunction against WordPress.com parent company Automattic. On Tuesday, a California District Court judge ordered Automattic to stop blocking WP Engine's access to WordPress.org resources and interfering with its plugins. From a report: The preliminary injunction comes after WP Engine, a third-party WordPress hosting service, filed a lawsuit that accused Automattic and its CEO, Matt Mullenweg, of "multiple forms of immediate irreparable harm." It later asked the court to stop Mullenweg from restricting WP Engine's access to WordPress.org.

Mullenweg waged a public campaign against WP Engine in September, accusing the service of misusing the WordPress trademark and not contributing enough to the WordPress community. After blocking WP Engine from WordPress.org's servers, Automattic took control of WP Engine's ACF Plugin.

Google

Google Asks FTC To Kill Microsoft's Exclusive Cloud Deal with OpenAI (theinformation.com) 17

An anonymous reader shares a report: Google recently asked the U.S. government to break up Microsoft's exclusive agreement to host OpenAI's technology on its cloud servers, according to a person who has been directly involved in the effort. The conversation took place after the Federal Trade Commission, one of the primary federal antitrust enforcement agencies, asked Google about Microsoft's business practices as part of a broader investigation, this person said.

Firms that compete with Microsoft in renting out cloud servers, including Google and Amazon, want to host OpenAI's artificial intelligence themselves so their cloud customers don't need to also tap Microsoft servers to get access to the startup's technology, this person said.

Businesses

Amazon is Officially in the Online Car Sales Business (techcrunch.com) 32

Amazon expanded Tuesday into online car sales with the launch of Amazon Autos, an e-commerce business that lets customers find, order, and buy new cars, trucks, and SUVs from dealerships. From a report: Amazon is kicking off the new endeavor with Hyundai in 48 U.S. cities, including Atlanta, Boston, Chicago, Los Angeles, and New York. The launch comes a little more than a year since the e-commerce giant announced plans to start selling vehicles on its website in the second half of 2024. Amazon said it will add more cities and additional auto manufacturers in 2025.

Amazon Autos will function, in many ways, like the rest of the broader Amazon e-commerce ecosystem. Shoppers will be able to search for available vehicles from participating dealers by model, trim, color, and features. Notably, customers will also be able to secure financing and e-sign paperwork via the Amazon Autos site. Once the payment is finalized, customers can schedule when to pick up their vehicle from that dealership. When vehicles go on sale at Amazon, the local dealer (for now just Hyundai dealers) will be the seller of record. Amazon Autos will even handle trade ins.

Australia

Investigation Launched Into Queensland Lab Breach, With Vials of Deadly Viruses Missing (abc.net.au) 59

An anonymous reader shares a report: Nearly 100 live samples of the deadly Hendra virus have been lost in a biosecurity bungle at a state-run Queensland laboratory. An investigation has been launched after it was revealed 323 virus samples went missing from Virology Laboratory in 2021 in a "major breach" of biosecurity protocol, Health Minister Tim Nicholls announced on Monday.

The material, which included samples of Hendra virus, lyssavirus and hantavirus, appears to have gone missing after a freezer storing the samples broke down. Mr Nicholls said the breach was uncovered in August 2023. The lab has been unable to say whether the materials were removed or destroyed. "It's this part of the transfer of those materials that is causing concern," Mr Nicholls said.

Programming

Open Source Maintainers Are Drowning in Junk Bug Reports Written By AI (theregister.com) 91

An anonymous reader shares a report: Software vulnerability submissions generated by AI models have ushered in a "new era of slop security reports for open source" -- and the devs maintaining these projects wish bug hunters would rely less on results produced by machine learning assistants. Seth Larson, security developer-in-residence at the Python Software Foundation, raised the issue in a blog post last week, urging those reporting bugs not to use AI systems for bug hunting.

"Recently I've noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open source projects," he wrote, pointing to similar findings from the Curl project in January. "These reports appear at first glance to be potentially legitimate and thus require time to refute." Larson argued that low-quality reports should be treated as if they're malicious.

As if to underscore the persistence of these concerns, a Curl project bug report posted on December 8 shows that nearly a year after maintainer Daniel Stenberg raised the issue, he's still confronted by "AI slop" -- and wasting his time arguing with a bug submitter who may be partially or entirely automated.

Slashdot Top Deals