Apple News | Slashdot

Man Behind LinkedIn Scraping Said He Grabbed 700 Million Profiles 'For Fun' (9to5mac.com) 27

Posted by BeauHD on Monday July 19, 2021 @07:20PM from the kids-these-days dept.

The Case Against SQL (scattered-thoughts.net) 297

Posted by EditorDavid on Sunday July 18, 2021 @04:34PM from the data-bias dept.

Long-time Slashdot reader RoccamOccam shares "an interesting take on SQL and its issues from Jamie Brandon (who describes himself as an independent researcher who's built database engines, query planners, compilers, developer tools and interfaces).

It's title? "Against SQL." The relational model is great... But SQL is the only widely-used implementation of the relational model, and it is: Inexpressive, Incompressible, Non-porous. This isn't just a matter of some constant programmer overhead, like SQL queries taking 20% longer to write. The fact that these issues exist in our dominant model for accessing data has dramatic downstream effects for the entire industry:

- Complexity is a massive drag on quality and innovation in runtime and tooling
- The need for an application layer with hand-written coordination between database and client renders useless most of the best features of relational databases

The core message that I want people to take away is that there is potentially a huge amount of value to be unlocked by replacing SQL, and more generally in rethinking where and how we draw the lines between databases, query languages and programming languages...

I'd like to finish with this quote from Michael Stonebraker, one of the most prominent figures in the history of relational databases:

"My biggest complaint about System R is that the team never stopped to clean up SQL... All the annoying features of the language have endured to this day. SQL will be the COBOL of 2020..."
It's been interesting to follow the discussion on Twitter, where the post's author tweeted screenshots of actual SQL code to illustrate various shortcomings. But he also notes that "The SQL spec (part 2 = 1732) pages is more than twice the length of the Javascript 2021 spec (879 pages), almost matches the C++ 2020 spec (1853) pages and contains 411 occurrences of 'implementation-defined', occurrences which include type inference and error propagation."

His Twitter feed also includes a supportive retweet from Rust creator Graydon Hoare, and from a Tetrane developer who says "The Rust of SQL remains to be invented. I would like to see it come."

EPA Approved Toxic Chemicals For Fracking a Decade Ago, New Files Show (nytimes.com) 137

Posted by BeauHD on Monday July 12, 2021 @11:30PM from the environmentally-unfriendly dept.

An anonymous reader quotes a report from The New York Times: For much of the past decade, oil companies engaged in drilling and fracking have been allowed to pump into the ground chemicals that, over time, can break down into toxic substances known as PFAS -- a class of long-lasting compounds known to pose a threat to people and wildlife -- according to internal documents from the Environmental Protection Agency. The E.P.A. in 2011 approved the use of these chemicals, used to ease the flow of oil from the ground, despite the agency's own grave concerns about their toxicity, according to the documents, which were reviewed by The New York Times. The E.P.A.'s approval of the three chemicals wasn't previously publicly known. The records, obtained under the Freedom of Information Act by a nonprofit group, Physicians for Social Responsibility, are among the first public indications that PFAS, long-lasting compounds also known as "forever chemicals," may be present in the fluids used during drilling and hydraulic fracturing, or fracking.

In a consent order issued for the three chemicals on Oct. 26, 2011, E.P.A. scientists pointed to preliminary evidence that, under some conditions, the chemicals could "degrade in the environment" into substances akin to PFOA, a kind of PFAS chemical, and could "persist in the environment" and "be toxic to people, wild mammals, and birds." The E.P.A. scientists recommended additional testing. Those tests were not mandatory and there is no indication that they were carried out. "The E.P.A. identified serious health risks associated with chemicals proposed for use in oil and gas extraction, and yet allowed those chemicals to be used commercially with very lax regulation," said Dusty Horwitt, researcher at Physicians for Social Responsibility. [...] There is no public data that details where the E.P.A.-approved chemicals have been used. But the FracFocus database, which tracks chemicals used in fracking, shows that about 120 companies used PFAS -- or chemicals that can break down into PFAS; the most common of which was "nonionic fluorosurfactant" and various misspellings -- in more than 1,000 wells between 2012 and 2020 in Texas, Arkansas, Louisiana, Oklahoma, New Mexico, and Wyoming. Because not all states require companies to report chemicals to the database, the number of wells could be higher. Nine of those wells were in Carter County, Okla., within the boundaries of Chickasaw Nation. "This isn't something I was aware of," said Tony Choate, a Chickasaw Nation spokesman. [...] The findings underscore how, for decades, the nation's laws governing various chemicals have allowed thousands of substances to go into commercial use with relatively little testing. The E.P.A.'s assessment was carried out under the 1976 Toxic Substances Control Act, which authorizes the agency to review and regulate new chemicals before they are manufactured or distributed. "[T]he Toxic Substances Control Act grandfathered in thousands of chemicals already in commercial use, including many PFAS chemicals," the report says. "In 2016, Congress strengthened the law, bolstering the E.P.A.'s authority to order health testing, among other measures. The Government Accountability Office, the watchdog arm of Congress, still identifies the Toxic Substances Control Act as a program with one of the highest risks of abuse and mismanagement." According to a recent report from the Intercept, "the E.P.A. office in charge of reviewing toxic chemicals tampered with the assessments of dozens of chemicals to make them appear safer."

Hackers Scrape 90,000 GETTR User Emails, Surprising No One (vice.com) 75

Posted by BeauHD on Tuesday July 06, 2021 @09:25PM from the rough-start dept.

A Threat to Privacy in the Expanded Use of License Plate-Scanning Cameras? (yahoo.com) 149

Posted by EditorDavid on Monday July 05, 2021 @07:04AM from the I'll-be-seeing-you dept.

Long-time Slashdot reader BigVig209 shares a Chicago Tribune report "on how suburban police departments in the Chicago area use license plate cameras as a crime-fighting tool." Critics of the cameras note that only a tiny percentage of the billions of plates photographed lead to an arrest, and that the cameras generally haven't been shown to prevent crime. More importantly they say the devices are unregulated, track innocent people and can be misused to invade drivers' privacy. The controversy comes as suburban police departments continue to expand the use of the cameras to combat rising crime. Law enforcement officials say they are taking steps to safeguard the data. But privacy advocates say the state should pass a law to ensure against improper use of a nationwide surveillance system operated by private companies.

Across the Chicago area, one survey by the nonprofit watchdog group Muckrock found 88 cameras used by more than two dozen police agencies. In response to a surge in shootings, after much delay, state police are taking steps to add the cameras to area expressways. In the northwest suburbs, Vernon Hills and Niles are among several departments that have added license plate cameras recently. The city of Chicago has ordered more than 200 cameras for its squad cars. In Indiana, the city of Hammond has taken steps to record nearly every vehicle that comes into town.

Not all police like the devices. In the southwest suburbs, Darien and La Grange had issues in years past with the cameras making false readings, and some officers stopped using them...

Homeowner associations may also tie their cameras into the systems, which is what led to the arrest in Vernon Hills. One of the leading sellers of such cameras, Vigilant Solutions, a part of Chicago-based Motorola Solutions, has collected billions of license plate numbers in its National Vehicle Location Service. The database shares information from thousands of police agencies, and can be used to find cars across the country... Then there is the potential for abuse by police. One investigation found that officers nationwide misused agency databases hundreds of times, to check on ex-girlfriends, romantic rivals, or perceived enemies. To address those concerns, 16 states have passed laws restricting the use of the cameras.
The article cites an EFF survey which found 99.5% of scanned plates weren't under suspicion — "and that police shared their data with an average of 160 other agencies."

"Two big concerns the American Civil Liberties Union has always had about the cameras are that the information can be used to track the movements of the general population, and often is sold by operators to third parties like credit and insurance companies."

OpenStreetMap Looks To Relocate To EU Due To Brexit Limitations (theguardian.com) 99

Posted by msmash on Friday July 02, 2021 @12:44PM from the how-about-that dept.

LinkedIn Breach Reportedly Exposes Data of 92% of Users, Including Inferred Salaries (9to5mac.com) 47

Posted by BeauHD on Tuesday June 29, 2021 @06:02PM from the another-day-another-breach dept.

Intel To Disable TSX By Default On More CPUs With New Microcode (phoronix.com) 46

Posted by BeauHD on Monday June 28, 2021 @05:25PM from the performance-implications dept.

Intel is going to be disabling Transactional Synchronization Extensions (TSX) by default for various Skylake through Coffee Lake processors with forthcoming microcode updates. Phoronix reports: Transactional Synchronization Extensions (TSX) have been around since Haswell for hardware transactional memory support and going off Intel's own past numbers can be around 40% faster in specific workloads or as much 4~5 times faster in database transaction benchmarks. TSX issues have been found in the past such as a possible side channel timing attack that could lead to KASLR being defeated and CVE-2019-11135 (TSX Async Abort) for an MDS-style flaw. Now in 2021 Intel is disabling TSX by default across multiple families of Intel CPUs from Skylake through Coffee Lake. [...] The Linux kernel is preparing for this microcode change as seen in the flow of new patches this morning for the 5.14 merge window.

A memory ordering issue is what is reportedly leading Intel to now deprecate TSX on various processors. There is this Intel whitepaper (PDF) updated this month that outlines the problem at length. As noted in the revision history, the memory ordering issue has been known to Intel since at least before October 2018 but only now in June 2021 are they pushing out microcode updates to disable TSX by default. With forthcoming microcode updates will effectively deprecate TSX for all Skylake Xeon CPUs prior to Stepping 5 (including Xeon D and 1st Gen Xeon Scalable), all 6th Gen Xeon E3-1500m v5 / E3-1200 v5 Skylake processors, all 7th/8th Gen Core and Pentium Kaby/Coffee/Whiskey CPUs prior to 0x8 stepping, and all 8th/9th Gen Core/Pentium Coffee Lake CPUs prior to 0xC stepping will be affected. That ultimately spans from various Skylake steppings through Coffee Lake; it was with 10th Gen Comet Lake and Ice Lake where TSX/TSX-NI was subsequently removed.

In addition to disabling TSX by default and force-aborting all RTM transactions by default, a new CPUID bit is being enumerated with the new microcode to indicate that the force aborting of RTM transactions. It's due to that new CPUID bit that the Linux kernel is seeing patches. Previously Linux and other operating systems applied a workaround for the TSX memory ordering issue but now when this feature is disabled, the kernel can drop said workaround. These patches are coming with the Linux 5.14 cycle and will likely be back-ported to stable too.

Scientist Finds Early Virus Sequences That Had Been Mysteriously Deleted (seattletimes.com) 336

Posted by BeauHD on Thursday June 24, 2021 @09:00AM from the missing-pieces dept.

UPDATE (7/30): All the missing virus sequences have now been published, with their deletion being explained as just "an editorial oversight by a scientific journal," according to the New York Times.

In Slashdot's original report, an anonymous reader quoted another report from The New York Times: About a year ago, genetic sequences from more than 200 virus samples from early cases of Covid-19 in Wuhan disappeared from an online scientific database. Now, by rooting through files stored on Google Cloud, a researcher in Seattle reports that he has recovered 13 of those original sequences -- intriguing new information for discerning when and how the virus may have spilled over from a bat or another animal into humans. The new analysis, released on Tuesday, bolsters earlier suggestions that a variety of coronaviruses may have been circulating in Wuhan before the initial outbreaks linked to animal and seafood markets in December 2019. As the Biden administration investigates the contested origins of the virus, known as SARS-CoV-2, the study neither strengthens nor discounts the hypothesis that the pathogen leaked out of a famous Wuhan lab. But it does raise questions about why original sequences were deleted, and suggests that there may be more revelations to recover from the far corners of the internet.
UPDATE (6/25): The Washington Post notes the data wasn't exactly suppressed. "Processed forms of the same data were included in a preprint paper from Chinese scientists posted in March 2020 and, after peer review, published that June in the journal Small." And in addition: The NIH released a statement Wednesday saying that a researcher who originally published the genetic sequences asked for them to be removed from the NIH database so that they could be included in a different database. The agency said it is standard practice to remove data if requested to do so...

Bloom's paper acknowledges that there are benign reasons why researchers might want to delete data from a public database. The data cited by Bloom are not alone in being removed by the NIH during the pandemic. The agency, in response to an inquiry from The Post, said the National Library of Medicine has so far identified eight instances since the start of the pandemic when researchers had withdrawn submissions to the library.

"This one from China and the rest from submitters predominantly in the U.S.," the NIH said in its response. "All of those followed standard operating procedures."
The New York Times writes: The genetic sequences of viral samples hold crucial clues about how SARS-CoV-2 shifted to our species from another animal, most likely a bat. Most precious of all are sequences from early in the pandemic, because they take scientists closer to the original spillover event. As [Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center who wrote the new report] was reviewing what genetic data had been published by various research groups, he came across a March 2020 study with a spreadsheet that included information on 241 genetic sequences collected by scientists at Wuhan University. The spreadsheet indicated that the scientists had uploaded the sequences to an online database called the Sequence Read Archive, managed by the U.S. government's National Library of Medicine. But when Dr. Bloom looked for the Wuhan sequences in the database earlier this month, his only result was "no item found." Puzzled, he went back to the spreadsheet for any further clues. It indicated that the 241 sequences had been collected by a scientist named Aisi Fu at Renmin Hospital in Wuhan. Searching medical literature, Dr. Bloom eventually found another study posted online in March 2020 by Dr. Fu and colleagues, describing a new experimental test for SARS-CoV-2. The Chinese scientists published it in a scientific journal three months later. In that study, the scientists wrote that they had looked at 45 samples from nasal swabs taken "from outpatients with suspected Covid-19 early in the epidemic." They then searched for a portion of SARS-CoV-2's genetic material in the swabs. The researchers did not publish the actual sequences of the genes they fished out of the samples. Instead, they only published some mutations in the viruses.

But a number of clues indicated to Dr. Bloom that the samples were the source of the 241 missing sequences. The papers included no explanation as to why the sequences had been uploaded to the Sequence Read Archive, only to disappear later. Perusing the archive, Dr. Bloom figured out that many of the sequences were stored as files on Google Cloud. Each sequence was contained in a file in the cloud, and the names of the files all shared the same basic format, he reported. Dr. Bloom swapped in the code for a missing sequence from Wuhan. Suddenly, he had the sequence. All told, he managed to recover 13 sequences from the cloud this way. With this new data, Dr. Bloom looked back once more at the early stages of the pandemic. He combined the 13 sequences with other published sequences of early coronaviruses, hoping to make progress on building the family tree of SARS-CoV-2. Working out all the steps by which SARS-CoV-2 evolved from a bat virus has been a challenge because scientists still have a limited number of samples to study. Some of the earliest samples come from the Huanan Seafood Wholesale Market in Wuhan, where an outbreak occurred in December 2019. But those market viruses actually have three extra mutations that are missing from SARS-CoV-2 samples collected weeks later. In other words, those later viruses look more like coronaviruses found in bats, supporting the idea that there was some early lineage of the virus that did not pass through the seafood market. Dr. Bloom found that the deleted sequences he recovered from the cloud also lack those extra mutations. "They're three steps more similar to the bat coronaviruses than the viruses from the Huanan fish market," Dr. Bloom said. This suggests, he said, that by the time SARS-CoV-2 reached the market, it had been circulating for awhile in Wuhan or beyond. The market viruses, he argued, aren't representative of full diversity of coronaviruses already loose in late 2019.
UPDATE (7/30): When republishing their sequences, the researchers indicated they actually came from January 30, 2020 (and not "late 2019").

A Real Estate Mogul Will Spend $100 Million to Fix Social Media Using Blockchain (msn.com) 93

Posted by EditorDavid on Sunday June 20, 2021 @08:44PM from the network-effects dept.

"Frank McCourt, the billionaire real estate mogul and former owner of the Los Angeles Dodgers, is pouring $100 million into an attempt to rebuild the foundations of social media," reports Bloomberg: The effort, which he has loftily named Project Liberty, centers on the construction of a publicly accessible database of people's social connections, allowing users to move records of their relationships between social media services instead of being locked into a few dominant apps.

The undercurrent to Project Liberty is a fear of the power that a few huge companies — and specifically Facebook Inc. — have amassed over the last decade... Project Liberty would use blockchain to construct a new internet infrastructure called the Decentralized Social Networking Protocol. With cryptocurrencies, blockchain stores information about the tokens in everyone's digital wallets; the DSNP would do the same for social connections. Facebook owns the data about the social connections between its users, giving it an enormous advantage over competitors. If all social media companies drew from a common social graph, the theory goes, they'd have to compete by offering better services, and the chance of any single company becoming so dominant would plummet.

Building DSNP falls to Braxton Woodham, the co-founder of the meal delivery service Sun Basket and former chief technology officer of Fandango, the movie ticket website... McCourt hired Woodham to build the protocol, and pledged to put $75 million into an institute at Georgetown University in Washington, D.C., and Sciences Po in Paris to research technology that serves the common good. The rest of his $100 million will go toward pushing entrepreneurs to build services that utilize the DSNP...

A decentralized approach to social media could actually undermine the power of content moderation, by making it easier for users who are kicked off one platform to simply migrate their audiences to more permissive ones. McCourt and Woodham say blockchain could discourage bad behavior because people would be tied to their posts forever...

Eventually, the group plans to create its own consumer product on top of the DSNP infrastructure, and wrote in a press release that the eventual result will be an "open, inclusive data economy where individuals own, control and derive greater social and economic value from their personal information."

The First 'Google Translate' For Elephants Debuts (scientificamerican.com) 50

Posted by BeauHD on Wednesday June 09, 2021 @11:30PM from the first-of-its-kind dept.

An anonymous reader quotes a report from Scientific American: Elephants possess an incredibly rich repertoire of communication techniques, including hundreds of calls and gestures that convey specific meanings and can change depending on the context. Different elephant populations also exhibit culturally learned behaviors unique to their specific group. Elephant behaviors are so complex, in fact, that even scientists may struggle to keep up with them all. Now, to get the animals and researchers on the same page, a renowned biologist who has been studying endangered savanna elephants for nearly 50 years has co-developed a digital elephant ethogram, a repository of everything known about their behavior and communication.

[Joyce Poole, co-founder and scientific director of ElephantVoices, a nonprofit science and conservation organization, and co-creator of the new ethogram] built the easily searchable public database with her husband and research partner Petter Granli after they came to realize that scientific papers alone would no longer cut it for cataloging the discoveries they and others were making. The Elephant Ethogram currently includes more than 500 behaviors depicted through nearly 3,000 annotated videos, photographs and audio files. The entries encompass the majority, if not all, of typical elephant behaviors, which Poole and Granli gleaned from more than 100 references spanning more than 100 years, with the oldest records dating back to 1907. About half of the described behaviors came from the two investigators' own studies and observations, while the rest came from around seven other leading savanna elephant research teams.

While the ethogram is primarily driven by Poole and Granli's observations, "there are very few, if any, examples of behaviors described in the literature that we have not seen ourselves," Poole points out. The project is also just beginning, she adds, because it is meant to be a living catalog that scientists actively contribute to as new findings come in. Poole and Granli believe the exhaustive, digitized Elephant Ethogram is the first of its kind for any nonhuman wild animal. The multimedia-based nature of the project is important, Poole adds, because with descriptions based only on the written word, audio files or photographs, "it is hard to show the often subtle differences in movement that differentiate one behavior from another." Now that the project is online, Poole hopes other researchers will begin contributing their own observations and discoveries, broadening the database to include cultural findings from additional savanna elephant populations and unusual behaviors Poole and Granli might have missed.

Twitter Restricts Accounts In India To Comply With Government Legal Request (techcrunch.com) 48

Posted by BeauHD on Tuesday June 08, 2021 @09:00AM from the giving-in-to-pressure dept.

An anonymous reader quotes a report from TechCrunch: Twitter disclosed on Monday that it blocked four accounts in India to comply with a new legal request from the Indian government. The American social network disclosed on Lumen Database, a Harvard University project, that it took action on four accounts -- including those of hip-hop artist L-Fresh the Lion and singer and song-writer Jazzy B -- to comply with a legal request from the Indian government it received over the weekend. The accounts are geo-restricted within India but accessible from outside of the South Asian nation. (As part of their transparency efforts, some companies including Twitter and Google make requests and orders they receive from governments and other entities public on Lumen Database.)

All four accounts, like several others that the Indian government ordered to be blocked in the country earlier this year, had protested New Delhi's agriculture reforms and some had posted other tweets that criticized Prime Minister Narendra Modi's seven years of governance in India, an analysis by TechCrunch found. The new legal request, which hasn't been previously reported, comes at a time when Twitter is making efforts to comply with the Indian government's new IT rules, new guidelines that several of its peers including Facebook and Google have already complied with. On Saturday, India's Ministry of Electronics and Information Technology had given a "final notice" to Twitter to comply with its new rules, which it unveiled in February this year. The new rules require significant social media firms to appoint and share contact details of representatives tasked with compliance, nodal point of reference and grievance redressals to address on-ground concerns. Last month, police in Delhi visited Twitter offices to "serve a notice" to Twitter's India head. Twitter responded by calling the visit a form of intimidation, and requested the government respect citizens' rights to free speech.

2008	Apollo 14 Moonwalker Claims Aliens Exist	1268 comments
2007	Why Linux Has Failed on the Desktop	995 comments
2005	Butterfly Unlocks Evolution Secret	1130 comments
2003	Will Humanoid Robots Take All the Jobs by 2050?	1457 comments
2002	Black Boxes to Track Driving Habits?	867 comments

Man Behind LinkedIn Scraping Said He Grabbed 700 Million Profiles 'For Fun' (9to5mac.com) 27

The Case Against SQL (scattered-thoughts.net) 297

EPA Approved Toxic Chemicals For Fracking a Decade Ago, New Files Show (nytimes.com) 137

Hackers Scrape 90,000 GETTR User Emails, Surprising No One (vice.com) 75

A Threat to Privacy in the Expanded Use of License Plate-Scanning Cameras? (yahoo.com) 149

OpenStreetMap Looks To Relocate To EU Due To Brexit Limitations (theguardian.com) 99

LinkedIn Breach Reportedly Exposes Data of 92% of Users, Including Inferred Salaries (9to5mac.com) 47

Intel To Disable TSX By Default On More CPUs With New Microcode (phoronix.com) 46

Scientist Finds Early Virus Sequences That Had Been Mysteriously Deleted (seattletimes.com) 336

A Real Estate Mogul Will Spend $100 Million to Fix Social Media Using Blockchain (msn.com) 93

The First 'Google Translate' For Elephants Debuts (scientificamerican.com) 50

Twitter Restricts Accounts In India To Comply With Government Legal Request (techcrunch.com) 48

Supreme Court Narrows Scope of CFAA Computer Hacking Law (therecord.media) 79

Rescuers Question What3words' Use in Emergencies (bbc.com) 122

Two New Laws Restrict Police Use of DNA Search Method (nytimes.com) 80

Clearview AI Hit With Sweeping Legal Complaints Over Controversial Face Scraping in Europe (theverge.com) 10

Redditors Aim to 'Free Science' From For-Profit Publishers (interestingengineering.com) 63

FTC is Prodding the Tech Giant To Punish Fake-Review Schemers (vox.com) 29

99.992% of Fully Vaccinated People Have Dodged COVID, CDC Data Shows (arstechnica.com) 143

There's Another Facebook Phone Number Database Online (vice.com) 7

Slashdot Top Deals

Slashdot