Programming

'Real' Programming Is an Elitist Myth (wired.com) 283

When people build a database to manage reading lists or feed their neighbors, that's coding -- and culture. From an essay: We are past the New York City Covid-19 peak. Things have started to reopen, but our neighborhood is in trouble, and people are hungry. There's a church that's opened space for a food pantry, a restaurant owner who has given herself to feeding the neighborhood, and lots of volunteers. [...] It's a complex data model. It involves date fields, text fields, integers, notes. You need lots of people to log in, but you need to protect private data too. You'd think their planning conversations would be about making lots of rice. But that is just a data point. The tool the mutual aid group has settled on to track everything is Airtable, a database-as-a-service program. You log in and there's your database. There are a host of tools like this now, "low-code" or "no-code" software with names like Zapier or Coda or Appy Pie. At first glance these tools look like flowcharts married to spreadsheets, but they're powerful ways to build little data-management apps. Airtable in particular keeps showing up everywhere for managing office supplies or scheduling appointments or tracking who at WIRED has their fingers on this column. The more features you use, the more they charge for it, and it can add up quickly. I know because I see the invoices at my company; we use it to track projects.

"Real" coders in my experience have often sneered at this kind of software, even back when it was just FileMaker and Microsoft Access managing the flower shop or tracking the cats at the animal shelter. It's not hard to see why. These tools are just databases with a form-making interface on top, and with no code in between. It reduces software development, in all its complexity and immense profitability, to a set of simple data types and form elements. You wouldn't build a banking system in it or a game. It lacks the features of big, grown-up databases like Oracle or IBM's Db2 or PostgreSQL. And since it is for amateurs, the end result ends up looking amateur. But it sure does work. I've noticed that when software lets nonprogrammers do programmer things, it makes the programmers nervous. Suddenly they stop smiling indulgently and start talking about what "real programming" is. This has been the history of the World Wide Web, for example. Go ahead and tweet "HTML is real programming," and watch programmers show up in your mentions to go, "As if." Except when you write a web page in HTML, you are creating a data model that will be interpreted by the browser. This is what programming is. Code culture can be solipsistic and exhausting. Programmers fight over semicolon placement and the right way to be object-oriented or functional or whatever else will let them feel in control and smarter and more economically safe, and always I want to shout back: Code isn't enough on its own. We throw code away when it runs out its clock; we migrate data to new databases, so as not to lose one precious bit. Code is a story we tell about data.

Security

Pen Test Partners: Boeing 747s Receive Critical Software Updates Over 3.5" Floppy Disks (theregister.com) 113

Boeing 747-400s still use floppy disks for loading critical navigation databases, Pen Test Partners has revealed to the infosec community after poking about one of the recently abandoned aircraft. From a report: The eye-catching factoid emerged during a DEF CON video interview of PTP's Alex Lomas, where the man himself gave a walkthrough of a 747-400, its avionics bay and the flight deck. Although airliners are not normally available to curious infosec researchers, a certain UK-based Big Airline's decision to scrap its B744 fleet gave Pen Test Partners a unique opportunity to get aboard one and have a poke about before the scrap merchants set about their grim task.

"Aircraft themselves are really expensive beasts, you know," said Lomas as he filmed inside the big Boeing. "Even if you had all the will in the world, airlines and manufacturers won't just let you pentest an aircraft because [they] don't know what state you're going to leave it in." While giving a tour of the aircraft on video, Lomas pointed out the navigation database loader.

Intel

Will We Someday Write Code Just By Describing It? (zdnet.com) 158

Using millions of programs in online repositories, Intel, Georgia Tech, and MIT researchers created a tool called MISIM (Machine Inferred code Similarity) with a database of code scored by the similarity of its outcomes to suggest alternatives (and corrections) to programmers.

The hope is "to aid developers with nitty-gritty choices like 'what is the most efficient way to use this API' or 'how can I correctly validate this input',"Ryan Marcus, scientist at Intel Labs, told ZDNet. "This should give engineers a lot more time to focus on the elements of their job that actually create a real-world impact..." Justin Gottschlich, the lead for Intel's "machine programming" research team, told ZDNet that as software development becomes ever-more complex, MISIM could have a great impact on productivity. "The rate at which we're introducing senior developers is not on track to match the pace at which we're introducing new chip architectures and software complexity," he said. "With today's heterogeneous hardware — CPUs, GPUs, FPGAs, ASICs, neuromorphic and, soon, quantum chips — it will become difficult, perhaps impossible, to find developers who can correctly, efficiently, and securely program across all of that hardware."

But the long-term goal of machine programming goes even further than assisting software development as it stands today. After all, if a technology can assess intent and come up with relevant snippets of code in response, it doesn't seem far-fetched to imagine that the algorithm could one day be used by any member of the general public with a good software idea. Combined with natural language processing, for example, MISIM could in theory react to verbal clues to one day let people write programs simply by describing them. In other words, an Alexa of sorts, but for software development.

Gottschlich explained that software creation is currently limited to the 27 million people around the world who can code. It is machine programming's ultimate goal to expand that number and one day, let people express their ideas in some other fashion than code — be it natural language, visual diagrams or even gestures.

Intel currently plans to use the new tool internally.
Government

Government's PACER Fees Are Too High, Federal Circuit Says (bloomberglaw.com) 17

An anonymous reader quotes a report from Bloomberg Law: The U.S. government charges too much for access to an electronic database of federal court records, the Federal Circuit ruled in a decision curbing a revenue stream the court system uses to help fund other programs. The U.S. Court of Appeals for the Federal Circuit affirmed a lower court's decision that the government was not authorized under federal law to spend $192 million in Public Access to Court Records system fees on court technology projects. The lower court "got it just right" when it limited the government's use of PACER revenues to the costs of operating the system, the court said in a precedential opinion Thursday.

"We agree with plaintiffs and amici that the First Amendment stakes here are high," the court said. But it said it doesn't foresee the lower court's interpretation "as resulting in a level of user fees that will significantly impede public access to courts." The ruling is a win for public access to court information, as PACER fees will go down if the ruling withstands a possible government appeal. But access still won't be free, despite calls for the government to stop charging for it. The Federal Circuit said it was up to Congress to decide whether to require free access. Challengers said PACER fees were too high, while the government said the middle ground reached by the lower court made the fees too low. Fees for downloading a copy of a filing run 10 cents per page, up to $3 per document. The Administrative Office of the U.S. Courts collected more than $145 million in fees in 2014 alone, according to the complaint in the case. Under a 2020 change to the fee waiver rules, about 75% of users pay nothing each quarter.

Security

LastPass Will Warn You If Your Passwords Show Up On the Dark Web (engadget.com) 34

LastPass is updating its Security Dashboard with a feature that provides an overview of all your accounts, highlighting any passwords that could pose a security risk. The password manager is also introducing dark web monitoring, although it will require you to be a paid LastPass subscriber. Engadget reports: If you already use LastPass and the Security Dashboard sounds familiar, it's because it builds on the Security Challenge functionality LastPass developer LogMeIn added in 2010. As before, grading is a major aspect of the interface. When you first navigate to the Security Dashboard, you'll see a score of all your logins, followed by a breakdown of passwords that are either old, inactive, weak or reused. You can click or tap on a problematic password to change it, and LastPass will automatically take you to the webpage where you can update your login information. LogMeIn hasn't changed how the app calculates the overall score it gives to each user. But one significant improvement the Security Dashboard brings over the Security Challenge is that you don't need to manually run it each time you want to see the security of your online accounts. The score and steps you can take to improve your online security are there each time you visit that part of the software's interface.

With today's update, LogMeIn is also introducing dark web monitoring. When you enable the feature, LastPass will proactively check your online accounts against Enzoic's compromised credentials database. If it detects an issue, it will notify you through both email and the app. Dark web monitoring is available to LastPass Premium, Family and Business subscribers. The dashboard, by contrast, is available to all LastPass users.

China

Will China's AI Surveillance State Go Global? (theatlantic.com) 109

China already has hundreds of millions of surveillance cameras in place, reports the Atlantic's deputy editor, and "because a new regulation requires telecom firms to scan the face of anyone who signs up for cellphone services, phones' data can now be attached to a specific person's face."

But the article also warns that when it comes to AI-powered surveillance, China "could also export it beyond the country's borders, entrenching the power of a whole generation of autocrats" and "shift the balance of power between the individual and the state worldwide..." The country is now the world's leading seller of AI-powered surveillance equipment.... China uses "predatory lending to sell telecommunications equipment at a significant discount to developing countries, which then puts China in a position to control those networks and their data," Michael Kratsios, America's CTO, told me. When countries need to refinance the terms of their loans, China can make network access part of the deal, in the same way that its military secures base rights at foreign ports it finances. "If you give [China] unfettered access to data networks around the world, that could be a serious problem," Kratsios said...

Having set up beachheads* in Asia, Europe, and Africa, China's AI companies are now pushing into Latin America, a region the Chinese government describes as a "core economic interest." China financed Ecuador's $240 million purchase of a surveillance-camera system. Bolivia, too, has bought surveillance equipment with help from a loan from Beijing. Venezuela recently debuted a new national ID-card system that logs citizens' political affiliations in a database built by ZTE.

* The article provides these additional examples:
  • In Malaysia, the government is working with Yitu, a Chinese AI start-up, to bring facial-recognition technology to Kuala Lumpur's police...
  • Chinese companies also bid to outfit every one of Singapore's 110,000 lampposts with facial-recognition cameras.
  • In South Asia, the Chinese government has supplied surveillance equipment to Sri Lanka.
  • On the old Silk Road, the Chinese company Dahua is lining the streets of Mongolia's capital with AI-assisted surveillance cameras.
  • In Serbia, Huawei is helping set up a "safe-city system," complete with facial-recognition cameras and joint patrols conducted by Serbian and Chinese police aimed at helping Chinese tourists to feel safe.
  • Kenya, Uganda, and Mauritius are outfitting major cities with Chinese-made surveillance networks...

Medicine

COVID-19 Hospital Data Is a Hot Mess After Feds Take Control (arstechnica.com) 174

slack_justyb shares a report from Ars Technica: As COVID-19 hospitalizations in the US approach the highest levels seen in the pandemic so far, national efforts to track patients and hospital resources remain in shambles after the federal government abruptly seized control of data collection earlier this month. Watchdogs and public health experts were immediately aghast by the switch to the HHS database, fearing the data would be manipulated for political reasons or hidden from public view all together. However, the real threat so far has been the administrative chaos. The switch took effect July 15, giving hospitals and states just days to adjust to the new data collection and submission process.

As such, hospitals have been struggling with the new data reporting, which involves reporting more types of data than the CDC's previous system. Generally, the data includes stats on admissions, discharges, beds and ventilators in use and in reserve, as well as information on patients. For some hospitals, that data has to be harvested from various sources, such as electronic medical records, lab reports, pharmacy data, and administrative sources. Some larger hospital systems have been working to write new scripts to automate new data mining, while others are relying on staff to compile the data manually into excel spreadsheets, which can take multiple hours each day, according to a report by Healthcare IT News. The task has been particularly onerous for small, rural hospitals and hospitals that are already strained by a crush of COVID-19 patients.
"It seems the obvious of going from a system that is well tested, to something new and alien to everyone is happening exactly as everyone who has ever done these kinds of conversions predicted," adds Slashdot reader slack_justyb.
Security

Hackers Stole GitHub and GitLab OAuth Tokens From Git Analytics Firm Waydev (zdnet.com) 28

Waydev, an analytics platform used by software companies, has disclosed a security breach that occurred earlier this month. From a report: The company says that hackers broke into its platform and stole GitHub and GitLab OAuth tokens from its internal database. Waydev, a San Francisco-based company, runs a platform that can be used to track software engineers' work output by analyzing Git-based codebases. To do this, Waydev runs a special app listed on the GitHub and GitLab app stores. When users install the app, Waydev receives an OAuth token that it can use to access its customers' GitHub or GitLab projects. Waydev stores this token in its database and uses it on a daily basis to generate analytical reports for its customers. Waydev CEO and co-founder Alex Circei told ZDNet today in a phone call that hackers used a blind SQL injection vulnerability to gain access to its database, from where they stole GitHub and GitLab OAuth tokens. The hackers then used some of these tokens to pivot to other companies' codebases and gain access to their source code projects.
Science

NIST Study Finds That Masks Defeat Most Facial Recognition Algorithms (venturebeat.com) 46

In a report published today by the National Institutes of Science and Technology (NIST), a physical sciences laboratory and non-regulatory agency of the U.S. Department of Commerce, researchers attempted to evaluate the performance of facial recognition algorithms on faces partially covered by protective masks. They report that even the best of the 89 commercial facial recognition algorithms they tested had error rates between 5% and 50% in matching digitally applied masks with photos of the same person without a mask. From a report: "With the arrival of the pandemic, we need to understand how face recognition technology deals with masked faces," Mei Ngan, a NIST computer scientist and a coauthor of the report, said in a statement. "We have begun by focusing on how an algorithm developed before the pandemic might be affected by subjects wearing face masks. Later this summer, we plan to test the accuracy of algorithms that were intentionally developed with masked faces in mind."

The study -- part of a series from NIST's Face Recognition Vendor Test (FRVT) program conducted in collaboration with the Department of Homeland Security's Science and Technology Directorate, the Office of Biometric Identity Management, and Customs and Border Protection -- explored how well each of the algorithms was able to perform "one-to-one" matching, where a photo is compared with a different photo of the same person. (NIST notes this sort of technique is often used in smartphone unlocking and passport identity verification systems.) The team applied the algorithms to a set of about 6 million photos used in previous FRVT studies, but they didn't test "one-to-many" matching, which is used to determine whether a person in a photo matches any in a database of known images. Because real-world masks differ, the researchers came up with nine mask variants to test, which included differences in shape, color, and nose coverage.

Databases

'Meow' Attack Has Now Wiped Nearly 4,000 Databases (arstechnica.com) 54

On Thursday long-time Slashdot reader PuceBaboon wrote: Ars Technica is reporting a new attack on unprotected databases which, to date, has deleted all content from over 1,000 ElasticSearch and MongoDB databases across the 'net, leaving the calling-card "meow" in its place.

Most people are likely to find this a lot less amusing than a kitty video, so if you have a database instance on a cloud machine, now would be a good time to verify that it is password protected by something other than the default, install password...

From the article: The attack first came to the attention of researcher Bob Diachenko on Tuesday, when he discovered a database that stored user details of the UFO VPN had been destroyed. UFO VPN had already been in the news that day because the world-readable database exposed a wealth of sensitive user information... Besides amounting to a serious privacy breach, the database was at odds with the Hong Kong-based UFO's promise to keep no logs. The VPN provider responded by moving the database to a different location but once again failed to secure it properly. Shortly after, the Meow attack wiped it out.
"Attacks have continued and are getting closer to 4,000," reports Bleeping Computer. "A new search on Saturday using Shodan shows that more than 3,800 databases have entry names matching a 'meow' attack. More than 97% of them are Elastic and MongoDB."
Privacy

New York Bans Use of Facial Recognition In Schools Statewide (venturebeat.com) 29

The New York legislature today passed a moratorium banning the use of facial recognition and other forms of biometric identification in schools until 2022. VentureBeat reports: The bill, which has yet to be signed by Governor Andrew Cuomo, appears to be the first in the nation to explicitly regulate the use of the technologies in schools and comes in response to the planned launch of facial recognition by the Lockport City School District. In January, Lockport Schools became one of the only U.S. school districts to adopt facial recognition in all of its K-12 buildings, which serve about 5,000 students. Proponents argued the $1.4 million system could keep students safe by enforcing watchlists and sending alerts when it detected someone dangerous (or otherwise unwanted). But critics said it could be used to surveil students and build a database of sensitive information about people's faces, which the school district then might struggle to keep secure.

While Lockport Schools' privacy policy states the watchlist wouldn't include students and the database would only cover non-students deemed a threat, including sex offenders or those banned by court order, the district's superintendent ultimately oversaw which individuals were added to the system. And it was reported earlier this month that the school board's president, John Linderman, couldn't guarantee that student photos would never be included in the system for disciplinary reasons.
"This is especially important as schools across the state begin to acknowledge the experiences of Black and Brown students being policed in schools and funneled into the school-to-prison pipeline," said Stefanie Coyle, Deputy Director of the Education Policy Center at the New York Civil Liberties Union. "Facial recognition is notoriously inaccurate especially when it comes to identifying women and people of color. For children, whose appearances change rapidly as they grow, biometric technologies' accuracy is even more questionable. False positives, where the wrong student is identified, can result in traumatic interactions with law enforcement, loss of class time, disciplinary action, and potentially a criminal record."
Privacy

Security Breach Exposes More Than One Million DNA Profiles On Major Genealogy Database (buzzfeednews.com) 28

An anonymous reader quotes a report from BuzzFeed News: On July 19, genealogy enthusiasts who use the website GEDmatch to upload their DNA information and find relatives to fill in their family trees got an unpleasant surprise. Suddenly, more than a million DNA profiles that had been hidden from cops using the site to find partial matches to crime scene DNA were available for police to search. The news has undermined efforts by Verogen, the forensic genetics company that purchased GEDmatch last December, to convince users that it would protect their privacy while pursuing a business based on using genetic genealogy to help solve violent crimes.

A second alarm came on July 21, when MyHeritage, a genealogy website based in Israel, announced that some of its users had been subjected to a phishing attack to obtain their log-in details for the site -- apparently targeting email addresses obtained in the attack on GEDmatch just two days before. In a statement emailed to BuzzFeed News and posted on Facebook, Verogen explained that the sudden unmasking of GEDmatch profiles that were supposed to be hidden from law enforcement was "orchestrated through a sophisticated attack on one of our servers via an existing user account." "As a result of this breach, all user permissions were reset, making all profiles visible to all users. This was the case for approximately 3 hours," the statement said. "During this time, users who did not opt in for law enforcement matching were available for law enforcement matching and, conversely, all law enforcement profiles were made visible to GEDmatch users." It's unclear whether any unauthorized profiles were searched by law enforcement.

Crime

Surveillance Software Scanning File-Sharing Networks Led To 12,000 Arrests (nbcnews.com) 106

Mr. Cooper was a retired high school history teacher using what NBC News calls those peer-to-peer networks where "the lack of corporate oversight creates the illusion of safety for people sharing illegal images."
Police were led to Cooper's door by a forensic tool called Child Protection System, which scans file-sharing networks and chatrooms to find computers that are downloading photos and videos depicting the sexual abuse of prepubescent children. The software, developed by the Child Rescue Coalition, a Florida-based nonprofit, can help establish the probable cause needed to get a search warrant... Cooper is one of more than 12,000 people arrested in cases flagged by the Child Protection System software over the past 10 years, according to the Child Rescue Coalition... The Child Protection System, which lets officers search by country, state, city or county, displays a ranked list of the internet addresses downloading the most problematic files...

The Child Protection System "has had a bigger effect for us than any tool anyone has ever created. It's been huge," said Dennis Nicewander, assistant state attorney in Broward County, Florida, who has used the software to prosecute about 200 cases over the last decade. "They have made it so automated and simple that the guys are just sitting there waiting to be arrested." The Child Rescue Coalition gives its technology for free to law enforcement agencies, and it is used by about 8,500 investigators in all 50 states. It's used in 95 other countries, including Canada, the U.K. and Brazil. Since 2010, the nonprofit has trained about 12,000 law enforcement investigators globally. Now, the Child Rescue Coalition is seeking partnerships with consumer-focused online platforms, including Facebook, school districts and a babysitter booking site, to determine whether people who are downloading illegal images are also trying to make contact with or work with minors...

The tool has a growing database of more than a million hashed images and videos, which it uses to find computers that have downloaded them. The software is able to track IP addresses — which are shared by people connected to the same Wi-Fi network — as well as individual devices. The system can follow devices even if the owners move or use virtual private networks, or VPNs, to mask the IP addresses, according to the Child Rescue Coalition.... Before getting a warrant, police typically subpoena the internet service provider to find out who holds the account and whether anyone at the address has a criminal history, has children or has access to children through work.

A lawyer who specializes in digital rights tells NBC that these tools need more oversight and testing. "There's a danger that the visceral awfulness of the child abuse blinds us to the civil liberties concerns. Tools like this hand a great deal of power and discretion to the government. There need to be really strong checks and safeguards."
Security

VPN With 'Strict No-Logs Policy' Exposed Millions of User Log Files (betanews.com) 86

New submitter kimmmos shares a report from BetaNews: An unprotected database belonging to the VPN service UFO VPN was exposed online for more than two weeks. Contained within the database were more than 20 million logs including user passwords stored in plain text. User of both UFO VPN free and paid services are affected by the data breach which was discovered by the security research team at Comparitech. Despite the Hong Kong-based VPN provider claiming to have a "strict no-logs policy" and that any data collected is anonymized, Comparitech says that "based on the contents of the database, users' information does not appear to be anonymous at all." A total of 894GB of data was exposed, and the API access records and user logs included: Account passwords in plain text; VPN session secrets and tokens; IP addresses of both user devices and the VPN servers they connected to; Connection timestamps; Geo-tags; Device and OS characteristics; and URLs that appear to be domains from which advertisements are injected into free users' web browsers. Comparitech notes that this runs counter to UFO VPN's privacy policy.
Earth

The Entire World's Carbon Emissions Will Finally Be Trackable In Real Time (vox.com) 46

An anonymous reader quotes a report from Vox: There's an old truism in the business world: what gets measured gets managed. One of the challenges in managing the greenhouse gas emissions warming the atmosphere is that they aren't measured very well. The ultimate solution to this problem-- the killer app, as it were -- would be real-time tracking of all global greenhouse gases, verified by objective third parties, and available for free to the public. Now, a new alliance of climate research groups called the Climate TRACE (Tracking Real-Time Atmospheric Carbon Emissions) Coalition has launched an effort to make the vision a reality, and they're aiming to have it ready for COP26, the climate meetings in Glasgow, Scotland, in November 2021 (postponed from November 2020). If they pull it off, it could completely change the tenor and direction of international climate talks. It could also make it easier for the hundreds of companies, cities, counties, and states that have made ambitious climate commitments to reliably track their process.

In addition to [Al Gore, who had been looking for more reliable ways to track emissions] and WattTime, [which intends to create a public database that will track carbon emissions from all the world's large power plants using AI], the coalition now contains:

-Carbon Tracker uses machine learning and satellite data to predict the utilization of every power plant in the world;
-Earthrise Alliance aggregates and organizes publicly available environmental data into a format meaningful to journalists and researchers;
-CarbonPlan uses satellite data to track changes in aboveground biomass (especially forests) and the associated carbon emissions, down to a spatial resolution of 300 meters;
-Hudson Carbon uses satellite data to track changes in agricultural cover, cropping, and tilling, down to the level of the individual field, and compares that data against ground-level sensors;
-OceanMind uses onboard sensors to track the global movement of ships in real time and combines that with engine specs to extrapolate carbon emissions;
-Rocky Mountain Institute combines multiple sources of data to quantify methane emissions from oil and gas infrastructure;
-Hypervine uses spectroscopic imagery to track vehicle usage and blasting at quarries;
-Blue Sky Analytics uses near-infrared and shortwave infrared imagery from satellites to track fires.

The coalition will also be gathering data from a variety of other sources, from power grid data to fuel sales, sensor networks, and drones. Gore acknowledges that "this is a work in progress," but says the coalition is aiming big: "everything that can be known about where greenhouse gas emissions are coming from will be known, in near-real time."

Government

White House Reportedly Orders Hospitals To Bypass CDC During COVID-19 Data Collection 189

The Trump administration is now ordering hospitals to send coronavirus patient data to a database in Washington, DC as part of a new initiative that may bypass the Centers for Disease Control and Prevention (CDC), according to a report from The New York Times published on Tuesday. The Verge reports: As outlined in a document (PDF) posted to the website of the Department of Health and Human Services (HHS), hospitals are being ordered to send data directly to the administration, effective tomorrow, a move that has alarmed some within the CDC, according to The Times. The database that will collect and store the information is referred to in the document as HHS Protect, which was built in part by data mining and predictive analytics firm Palantir. The Silicon Valley company is known most for its controversial contract work with the US military and other clandestine government agencies as well as for being co-founded and initially funded by Trump ally Peter Thiel.

"A unique link will be sent to the hospital points of contact. This will direct the [point of care] to a hospital-specific secure form that can then be used to enter the necessary information. After completing the fields, click submit and confirm that the form has been successfully captured," reads the HHS instructions. "A confirmation email will be sent to you from the HHS Protect System. This method replaces the emailing of individual spreadsheets previously requested." While the White House's official reasoning is that this plan will help make data collection on the spread of COVID-19 more centralized and efficient, some current and former public health officials fear the bypassing of the CDC may be an effort to politicize the findings and cut experts out of the loop with regard to federal messaging and guidelines, The Times reports.
The Internet

MIT Removes Huge Dataset That Teaches AI Systems To Use Racist, Misogynistic Slurs (theregister.com) 62

An anonymous reader quotes a report from The Register MIT has taken offline its highly cited dataset that trained AI systems to potentially describe people using racist, misogynistic, and other problematic terms. The database was removed this week after The Register alerted the American super-college. MIT also urged researchers and developers to stop using the training library, and to delete any copies. "We sincerely apologize," a professor told us. The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT's cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word. Applications, websites, and other products relying on neural networks trained using MIT's dataset may therefore end up using these terms when analyzing photographs and camera footage.

The problematic training library in question is 80 Million Tiny Images, which was created in 2008 to help produce advanced object-detection techniques. It is, essentially, a huge collection of photos with labels describing what's in the pics, all of which can be fed into neural networks to teach them to associate patterns in photos with the descriptive labels. So when a trained neural network is shown a bike, it can accurately predict a bike is present in the snap. It's called Tiny Images because the pictures in library are small enough for computer-vision algorithms in the late-2000s and early-2010s to digest. Today, the Tiny Images dataset is used to benchmark computer-vision algorithms along with the better-known ImageNet training collection. Unlike ImageNet, though, no one, until now, has scrutinized Tiny Images for problematic content.

AI

MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems To Use Racist, Misogynistic Slurs (theregister.com) 128

MIT has taken offline its highly cited dataset that trained AI systems to potentially describe people using racist, misogynistic, and other problematic terms. From a report: The database was removed this week after The Register alerted the American super-college. And MIT urged researchers and developers to stop using the training library, and to delete any copies. "We sincerely apologize," a professor told us. The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT's cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.

Applications, websites, and other products relying on neural networks trained using MIT's dataset may therefore end up using these terms when analyzing photographs and camera footage. The problematic training library in question is 80 Million Tiny Images, which was created in 2008 to help produce advanced object detection techniques. It is, essentially, a huge collection of photos with labels describing what's in the pics, all of which can be fed into neural networks to teach them to associate patterns in photos with the descriptive labels. So when a trained neural network is shown a bike, it can accurately predict a bike is present in the snap. It's called Tiny Images because the pictures in library are small enough for computer-vision algorithms in the late-2000s and early-2010s to digest.

Businesses

AWS Launches 'Amazon Honeycode', a No-Code App Building Service (zdnet.com) 43

"Amazon Web Services on Wednesday launched Amazon Honeycode, a fully-managed service that enables companies to build mobile and web applications without any programming," reports ZDNet: Customers can use the service to build apps that leverage an AWS-built database, such as a simple task-tracking application or a more complex project management app to manage multiple workflows. "Customers have told us that the need for custom applications far outstrips the capacity of developers to create them," AWS VP Larry Augustin said in a statement.

Low-code and no-code tools have been growing in popularity in recent years, enabling people with little or no coding experience to be able to build the applications they need. Other major cloud companies like Salesforce offer low-code app builders. With IT teams stretched thin during the COVID-19 pandemic, low-code tools can prove particularly useful.

Customers "can get started by selecting a pre-built template, where the data model, business logic, and applications are pre-defined and ready-to-use..." Amazon explains in a press release. "Or, they can import data into a blank workbook, use the familiar spreadsheet interface to define the data model, and design the application screens with objects like lists, buttons, and input fields.

"Builders can also add automations to their applications to drive notifications, reminders, approvals, and other actions based on conditions. Once the application is built, customers simply click a button to share it with team members."
Databases

Appeals Court Says California's IMDb-Targeting 'Ageism' Law Is Unconstitutional (techdirt.com) 140

The state of California has lost again in its attempt to punish IMDb for ageism perpetrated by movie studios who seem to refuse to cast actresses above a certain age in choice roles. Techdirt reports: The law passed by the California legislature does one thing: prevents IMDb (and other sites, theoretically) from publishing facts about actors: namely, their ages. This stupid law was ushered into existence by none other than the Screen Actors Guild, capitalizing on a (failed) lawsuit brought against the website by an actress who claimed the publication of her real age cost her millions in Hollywood paychecks. These beneficiaries of the First Amendment decided there was just too much First Amendment in California. To protect actors from studio execs, SAG decided to go after a third-party site respected for its collection of factual information about movies, actors, and everything else film-related.

The federal court handling IMDb's lawsuit against the state made quick work of the state's arguments in favor of very selective censorship. In only six pages, the court destroyed the rationale offered by the government's finest legal minds. [...] Even if the law had somehow survived a First Amendment challenge, it still wouldn't have prevented studios from engaging in discriminatory hiring practices. If this was really the state's concerns, it would have stepped up its regulation of the entertainment industry, rather than a single site that was unsuccessfully sued by an actress, who speculated IMDb's publication of her age was the reason she wasn't landing the roles she wanted.

Slashdot Top Deals