
OpenAI Researchers Warned Board of AI Breakthrough Ahead of CEO Ouster (reuters.com) 186

An anonymous reader quotes a report from Reuters: Ahead of OpenAI CEO Sam Altman's four days in exile, several staff researchers sent the board of directors a letter warning of a powerful artificial intelligence discovery that they said could threaten humanity, two people familiar with the matter told Reuters. The previously unreported letter and AI algorithm was a key development ahead of the board's ouster of Altman, the poster child of generative AI, the two sources said. Before his triumphant return late Tuesday, more than 700 employees had threatened to quit and join backer Microsoft in solidarity with their fired leader. The sources cited the letter as one factor among a longer list of grievances by the board that led to Altman's firing. Reuters was unable to review a copy of the letter.

According to one of the sources, long-time executive Mira Murati mentioned the project, called Q*, to employees on Wednesday and said that a letter was sent to the board prior to this weekend's events. After the story was published, an OpenAI spokesperson said Murati told employees what media were about to report, but she did not comment on the accuracy of the reporting. The maker of ChatGPT had made progress on Q* (pronounced Q-Star), which some internally believe could be a breakthrough in the startup's search for superintelligence, also known as artificial general intelligence (AGI), one of the people told Reuters. OpenAI defines AGI as AI systems that are smarter than humans. Given vast computing resources, the new model was able to solve certain mathematical problems, the person said on condition of anonymity because they were not authorized to speak on behalf of the company. Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*'s future success, the source said.

Researchers consider math to be a frontier of generative AI development. Currently, generative AI is good at writing and language translation by statistically predicting the next word, and answers to the same question can vary widely. But conquering the ability to do math -- where there is only one right answer -- implies AI would have greater reasoning capabilities resembling human intelligence. This could be applied to novel scientific research, for instance, AI researchers believe. Unlike a calculator that can solve a limited number of operations, AGI can generalize, learn and comprehend. In their letter to the board, researchers flagged AI's prowess and potential danger, the sources said without specifying the exact safety concerns noted in the letter. There has long been discussion among computer scientists about the danger posed by superintelligent machines, for instance if they might decide that the destruction of humanity was in their interest.
Last night, OpenAI announced it reached an agreement for Sam Altman to return as CEO. Under an "agreement in principle," Altman will serve under the supervision of a new board of directors.

Elon Musk Debuts 'Grok' AI Bot to Challenge ChatGPT (cnbc.com) 138

"xAI, Elon Musk's new AI venture, launched its first AI chatbot technology named Grok," reports CNBC.

Two months into its "early beta" training phase, it's "only available to a select group of users before a wider release" — though users can sign up for a waitlist. Elon Musk posted that the chatbot "will be provided as part of X Premium+, so I recommend signing up for that. Just $16/month via web."

More details from CNBC: Grok, the company said, is modeled on "The Hitchhiker's Guide to the Galaxy." It is supposed to have "a bit of wit," "a rebellious streak" and it should answer the "spicy questions" that other AI might dodge, according to a Saturday statement from xAI... Grok also has access to data from X, which xAI said will give it a leg-up. Musk, on Sunday, posted a side-by-side comparison of Grok answering a question versus another AI bot, which he said had less current information.

Still, xAI hedged in its statement, as with any Large Language Model, or LLM, Grok "can still generate false or contradictory information...." On an initial round of tests based on middle school math problems and Python coding tasks, the company said that Grok surpassed "all other models in its compute class, including ChatGPT-3.5 and Inflection-1." It was outperformed by bots with larger data troves...

Musk has previously said that he believes today's AI makers are bending too far toward "politically correct" systems. xAI's mission, it said, is to create AI for people of all backgrounds and political views. Grok is said to be a means of testing that AI approach "in public."

SpaceX security engineer Christopher Stanley shared some interesting results. After reading Grok's explanation for why scaling API requests is difficult, Stanley added the prompt "be more vulgar" — then posted his reaction on X. "Today I learned scaling API requests is like trying to keep up with a never-ending orgy."

Reacting to Stanley's experiment, Elon Musk posted, "Oh this is gonna be fun."

A World Record In Race Walking Is Erased After the Course Was Measured Wrong (npr.org) 59

An anonymous reader quotes a report from NPR: Peru's Kimberly Garcia set a world record in her gold-medal winning turn at the women's 20 kilometer race walk event at the Pan American Games this weekend. Until she didn't. Once the race was over, organizers determined there was a serious "measuring problem" with the track, making the race times of Garcia, fellow medal winners Glenda Morejon of Ecuador and Peru's Evelyn Inga, and their competitors null and void. The athletes guessed the track had been drawn up roughly 3 kilometers (about 1.9 miles) shorter than it was supposed to be. Garcia crossed the finish line in 1 hour, 12 minutes and 26 seconds. The world record of 1 hour, 23 minutes and 49 seconds is held by China's Jiayu Yang. The athletes suspected something was amiss mid-race, according to the Associated Press.

The Santiago 2023 Corporation, the group in charge of the 2023 Pan American Games, placed the blame on the Pan American Athletics Association, which reportedly chose the person who measured the race course. In a statement following the race, Santiago 2023 said the official who measured the course "did not take accurate measurements of the route the athletes took during the race." The group continued, "We deeply regret the inconvenience for the athletes, their coaches, the public and the attending press, but this situation cannot be attributed to the Organizing Committee."


How the US is Preparing For a Post-Quantum World (msn.com) 45

To explore America's "transition to a post-quantum world," the Washington Post interviewed U.S. federal official Nick Polk, who is focused on national security issues including quantum computing and is also a senior advisor to a White House federal chief information security officer): The Washington Post: The U.S. is in the early stages of a major shift focused on bolstering government network defenses, pushing federal agencies to adopt a new encryption standard known as post-quantum cryptography that aims to prevent systems from being vulnerable to advanced decryption techniques enabled by quantum computers in the near future...

Nick Polk: We've been using asymmetric encryption for a very long time now, and it's been ubiquitous since about 2014, when the U.S. government and some of the large tech companies decided that they're going to make it a default on most web browsers... Interestingly enough, regarding the post-quantum cryptographic standards being developed, the only thing that's quantum about them is that it has "quantum" in the name. It's really just a different type of math that's much more difficult for a quantum computer to be able to reverse-engineer. The National Institute of Standards and Technology is looking at different mathematical models to cover all their bases. The interesting thing is that these post-quantum standards are actually being used to protect classical computers that we have now, like laptops...

Given the breadth of the U.S. government and the amount of computing power we use, we really see ourselves and our role as a steward of the tech ecosystem. One of the things that came out of [this week's Inside Quantum Technology conference in New York City] was that we are very quickly moving along with the private sector to migrate to post-quantum cryptography. I think you're gonna see very shortly a lot of very sensitive private sector industries start to migrate or start to advertise that they're going to migrate. Banks are a perfect example. That means meeting with vendors regularly, and testing their algorithms to ensure that we can accurately and effectively implement them on federal systems...

The administration and national security memorandum set 2035 as our deadline as a government to migrate our [national security] systems to post-quantum cryptography. That's supposed to time with the development of operational quantum computers. We need to ensure that we start now, so that we don't end up not meeting the deadline before computers are operational... This is a prioritized migration for the U.S. government. We're going to start with our most critical systems — that includes what we call high-value assets, and high-impact systems. So for example, we're gonna prioritize systems that have personal health information.

That's our biggest emphasis — both when we talk to private industry and when we encourage agencies when they talk to their contractors and vendors — to really think about where your most sensitive data is and then prioritize those systems for migration.


Google Paid a Whopping $26.3 Billion in 2021 To Be Default Search Engine Everywhere (theverge.com) 52

The US v. Google antitrust trial is about many things, but more than anything, it's about the power of defaults. Even if it's easy to switch browsers or platforms or search engines, the one that appears when you turn it on matters a lot. Google obviously agrees and has paid a staggering amount to make sure it is the default: testimony in the trial revealed that Google spent a total of $26.3 billion in 2021 to be the default search engine in multiple browsers, phones, and platforms. From a report: That number, the sum total of all of Google's search distribution deals, came out during the Justice Department's cross-examination of Google's search head, Prabhakar Raghavan. It was made public after a debate earlier in the week between the two sides and Judge Amit Mehta over whether the figure should be redacted. Mehta has begun to push for more openness in the trial in general, and this was one of the most significant new pieces of information to be shared openly.

Just to put that $26.3 billion in context: Alphabet, Google's parent company, announced in its recent earnings report that Google Search ad business brought in about $44 billion over the last three months and about $165 billion in the last year. Its entire ad business -- which also includes YouTube ads -- made a bit under $90 billion in profit. This is all back-of-the-napkin math, but essentially, Google is giving up about 16 percent of its search revenue and about 29 percent of its profit to those distribution deals.


Code.org Presses Washington To Make Computer Science a High School Graduation Requirement 95

theodp writes: In July, Seattle-based and tech-backed nonprofit Code.org announced its 10th policy recommendation for all states "to require all students to take computer science (CS) to earn a high school diploma." In August, Washington State Senator Lisa Wellman phoned-in her plans to introduce a bill to make computer science a Washington high school graduation requirement to the state's Board of Education, indicating that the ChatGPT-sparked AI craze and Code.org had helped convince her of the need. Wellman, a former teacher who worked as a Programmer/System Analyst in the 80's before becoming an Apple VP (Publishing) in the '90s, also indicated that exposure to CS given to students in fifth grade could be sufficient to satisfy a HS CS requirement. In 2019, Wellman sponsored Microsoft-supported SB 5088 (Bill details), which required all Washington state public high schools to offer a CS class. Wellman also sponsored SB 5299 in 2021, which allows high school students to take a computer science elective in place of a third year math or science course (that may be required for college admission) to count towards graduation requirements.

And in October, Code.org CEO Hadi Partovi appeared before the Washington State Board of Education, driving home points Senator Wellman made in August with a deck containing slides calling for Washington to "require that all students take computer science to earn a high school diploma" and to "require computer science within all teacher certifications." Like Wellman, Partovi suggested the CS high school requirement might be satisfied by middle school work (he alternatively suggested one year of foreign language could be dropped to accommodate a HS CS course). Partovi noted that Washington contained some of the biggest promoters of K-12 CS in Microsoft Philanthropies' TEALS (TEALS founder Kevin Wang is a member of the Washington State Board of Education) and Code.org, as well some of the biggest funders of K-12 CS in Amazon and Microsoft -- both which are $3,000,000+ Platinum Supporters of Code.org and have top execs on Code.org's Board of Directors.

Mathematician Warns US Spies May Be Weakening Next-Gen Encryption (newscientist.com) 78

Matthew Sparkes reports via NewScientist: A prominent cryptography expert has told New Scientist that a US spy agency could be weakening a new generation of algorithms designed to protect against hackers equipped with quantum computers. Daniel Bernstein at the University of Illinois Chicago says that the US National Institute of Standards and Technology (NIST) is deliberately obscuring the level of involvement the US National Security Agency (NSA) has in developing new encryption standards for "post-quantum cryptography" (PQC). He also believes that NIST has made errors -- either accidental or deliberate -- in calculations describing the security of the new standards. NIST denies the claims.

Bernstein alleges that NIST's calculations for one of the upcoming PQC standards, Kyber512, are "glaringly wrong," making it appear more secure than it really is. He says that NIST multiplied two numbers together when it would have been more correct to add them, resulting in an artificially high assessment of Kyber512's robustness to attack. "We disagree with his analysis," says Dustin Moody at NIST. "It's a question for which there isn't scientific certainty and intelligent people can have different views. We respect Dan's opinion, but don't agree with what he says." Moody says that Kyber512 meets NIST's "level one" security criteria, which makes it at least as hard to break as a commonly used existing algorithm, AES-128. That said, NIST recommends that, in practice, people should use a stronger version, Kyber768, which Moody says was a suggestion from the algorithm's developers.

NIST is currently in a period of public consultation and hopes to reveal the final standards for PQC algorithms next year so that organizations can begin to adopt them. The Kyber algorithm seems likely to make the cut as it has already progressed through several layers of selection. Given its secretive nature, it is difficult to say for sure whether or not the NSA has influenced the PQC standards, but there have long been suggestions and rumors that the agency deliberately weakens encryption algorithms. In 2013, The New York Times reported that the agency had a budget of $250 million for the task, and intelligence agency documents leaked by Edward Snowden in the same year contained references to the NSA deliberately placing a backdoor in a cryptography algorithm, although that algorithm was later dropped from official standards.


ACT Test Scores For US Students Drop To a 30-Year Low (npr.org) 102

An anonymous reader quotes a report from NPR: High school students' scores on the ACT college admissions test have dropped to their lowest in more than three decades, showing a lack of student preparedness for college-level coursework, according to the nonprofit organization that administers the test. Scores have been falling for six consecutive years, but the trend accelerated during the COVID-19 pandemic. Students in the class of 2023 whose scores were reported Wednesday were in their first year of high school when the virus reached the U.S.

The average ACT composite score for U.S. students was 19.5 out of 36. Last year, the average score was 19.8. The average scores in reading, science and math all were below benchmarks the ACT says students must reach to have a high probability of success in first-year college courses. The average score in English was just above the benchmark but still declined compared to last year.

About 1.4 million students in the U.S. took the ACT this year, an increase from last year. However, the numbers have not returned to pre-pandemic levels. [Janet Godwin, chief executive officer for the nonprofit ACT] said she doesn't believe those numbers will ever fully recover, partly because of test-optional admission policies. Of students who were tested, only 21% met benchmarks for success in college-level classes in all subjects. Research from the nonprofit shows students who meet those benchmarks have a 50% chance of earning a B or better and nearly a 75% chance of earning a C or better in corresponding courses.
Further reading: Accounting Graduates Drop By Highest Percentage in Years
United States

Who Runs the Best US Schools? It May Be the Defense Department (nytimes.com) 94

Schools for children of military members achieve results rarely seen in public education. From a report: Amy Dilmar, a middle-school principal in Georgia, is well aware of the many crises threatening American education. The lost learning that piled up during the coronavirus pandemic. The gaping inequalities by race and family income that have only gotten worse. A widening achievement gap between the highest- and lowest-performing students. But she sees little of that at her school in Fort Moore, Ga. The students who solve algebra equations and hone essays at Faith Middle School attend one of the highest-performing school systems in the country. It is run not by a local school board or charter network, but by the Defense Department. With about 66,000 students -- more than the public school enrollment in Boston or Seattle -- the Pentagon's schools for children of military members and civilian employees quietly achieve results most educators can only dream of.

On the National Assessment of Educational Progress, a federal exam that is considered the gold standard for comparing states and large districts, the Defense Department's schools outscored every jurisdiction in math and reading last year and managed to avoid widespread pandemic losses. Their schools had the highest outcomes in the country for Black and Hispanic students, whose eighth-grade reading scores outpaced national averages for white students. Eighth graders whose parents only graduated from high school -- suggesting lower family incomes, on average -- performed as well in reading as students nationally whose parents were college graduates. The schools reopened relatively quickly during the pandemic, but last year's results were no fluke. While the achievement of U.S. students overall has stagnated over the last decade, the military's schools have made gains on the national test since 2013. And even as the country's lowest-performing students -- in the bottom 25th percentile -- have slipped further behind, the Defense Department's lowest-performing students have improved in fourth-grade math and eighth-grade reading.


Decomposing Language Models Into Understandable Components (anthropic.com) 17

AI startup Anthropic, writing in a blog post: Neural networks are trained on data, not programmed to follow rules. With each step of training, millions or billions of parameters are updated to make the model better at tasks, and by the end, the model is capable of a dizzying array of behaviors. We understand the math of the trained network exactly -- each neuron in a neural network performs simple arithmetic -- but we don't understand why those mathematical operations result in the behaviors we see. This makes it hard to diagnose failure modes, hard to know how to fix them, and hard to certify that a model is truly safe. Neuroscientists face a similar problem with understanding the biological basis for human behavior. The neurons firing in a person's brain must somehow implement their thoughts, feelings, and decision-making. Decades of neuroscience research has revealed a lot about how the brain works, and enabled targeted treatments for diseases such as epilepsy, but much remains mysterious. Luckily for those of us trying to understand artificial neural networks, experiments are much, much easier to run. We can simultaneously record the activation of every neuron in the network, intervene by silencing or stimulating them, and test the network's response to any possible input.

Unfortunately, it turns out that the individual neurons do not have consistent relationships to network behavior. For example, a single neuron in a small language model is active in many unrelated contexts, including: academic citations, English dialogue, HTTP requests, and Korean text. In a classic vision model, a single neuron responds to faces of cats and fronts of cars. The activation of one neuron can mean different things in different contexts. In our latest paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , we outline evidence that there are better units of analysis than individual neurons, and we have built machinery that lets us find these units in small transformer models. These units, called features, correspond to patterns (linear combinations) of neuron activations. This provides a path to breaking down complex neural networks into parts we can understand, and builds on previous efforts to interpret high-dimensional systems in neuroscience, machine learning, and statistics. In a transformer language model, we decompose a layer with 512 neurons into more than 4000 features which separately represent things like DNA sequences, legal language, HTTP requests, Hebrew text, nutrition statements, and much, much more. Most of these model properties are invisible when looking at the activations of individual neurons in isolation.

The Courts

Supreme Court Rejects IT Worker Challenge of OPT Program (techtarget.com) 43

dcblogs writes: The U.S. Supreme Court declined to hear a challenge against the Optional Practical Training (OPT) program, which allows STEM graduates to work in the U.S. for up to three years on a student F-1 visa. John Miano, the attorney representing WashTech, the labor group that brought the appeal, called the decision "staggering." He said it "strips Congress of the ability to control nonimmigrant programs," such as OPT, the H-1B program, and other programs designed to provide temporary guest workers. In the most extreme example of what the decision may allow, Miano said it theoretically enables the White House to let people on tourist visas work. The decision "gives more authority to the federal government to do what it wants," he said.

The OPT program permits STEM (Science, Technology, Engineering, and Math) graduates to work for up to three years under a student F-1 visa. Critics of the program said it brought unfair competition to the U.S. labor market. Ron Hira, an associate professor of Public Policy at Howard University, said the U.S. administration of the OPT program is so poor that "the program has effectively no controls, accountability, or worker protections."

A group of Senate Republicans, including U.S. Sen. Ted Cruz, argued in briefs filed with the court that the federal government was using the OPT program to sidestep the annual H-1B visa cap. More than 30 Republican House members also filed a brief in support.


For the First Time, Research Reveals Crows Use Statistical Logic (arstechnica.com) 40

An anonymous reader quotes a report from Ars Technica: [R]esearchers from the University of Tubingen found for the first time that crows can perform statistical reasoning. These results can help scientists better understand the evolution of intelligence (and may give us a better appreciation of what's going on in our backyard). [...] Dr. Melissa Johnston, a Humboldt Fellow at the University of Tubingen, certainly appreciated the specialness of these creatures, as she and her colleagues have been studying these animals for several years. "In our lab, it has been shown that crows have sophisticated numerical competence, demonstrate abstract thinking, and show careful consideration during decision-making," she said. In her most recent experiment, Johnston and her team pushed these abilities to a new extreme, testing statistical reasoning.

To do this, Johnston and her team began by training two crows to peck at various images on touchscreens to earn food treats. From this simple routine of peck-then-treat, the researchers significantly raised the stakes. "We introduce the concept of probabilities, such as that not every peck to an image will result in a reward," Johnston elaborated. "This is where the crows learn the unique pairings between the image on the screen and the likelihood of obtaining a reward." The crows quickly learned to associate each of the images with a different reward probability. In the experiment, the two crows had to choose between two of these images, each corresponding to a different reward probability. "Crows were tasked with learning rather abstract quantities (i.e., not whole numbers), associating them with abstract symbols, and then applying that combination of information in a reward maximizing way," Johnston said. Over 10 days of training and 5,000 trials, the researchers found that the two crows continued to pick the higher probability of reward, showing their ability to use statistical inference.

Pushing the crows even further, Johnston and her team waited a whole month before testing the crows again. Even after a month without training, the crows remembered the reward probabilities and could pick the highest number every time. Johnston and her team were excited that the crows could apply statistical reasoning in almost any setting to ensure their reward. "Working with the birds every day is very rewarding! They are very responsive animals, so I enjoy spending time with them," added Johnston.
The findings have been published in the journal Current Biology.

Anthropic Launches Claude Pro, a Subscription AI That May Rival ChatGPT Plus (arstechnica.com) 9

An anonymous reader quotes a report from Ars Technica: On Thursday, AI-maker and OpenAI competitor Anthropic launched Claude Pro, a subscription-based version of its Claude.ai web-based AI assistant, which functions similarly to ChatGPT. It's available for $20/month in the US or 18 pounds/month in the UK, and it promises five-times-higher usage limits, priority access to Claude during high-traffic periods, and early access to new features as they emerge. Like ChatGPT, Claude Pro can compose text, summarize, do analysis, solve logic puzzles, and more.

Claude.ai is what Anthropic offers as its conversational interface for its Claude 2 AI language model, similar to how ChatGPT provides an application wrapper for the underlying models GPT-3.5 and GPT-4. In February, OpenAI chose a subscription route for ChatGPT Plus, which for $20 a month also gives early access to new features, but it also unlocks access to GPT-4, which is OpenAI's most powerful language model. What does Claude have that ChatGPT doesn't? One big difference is a 100,000 token context window, which means it can process about 75,000 words at once. Tokens are fragments of words used while processing text. That means Claude can analyze longer documents or hold longer conversations without losing its memory of the subject at hand. ChatGPT can only process about 8,000 tokens in GPT-4 mode.

Anthropic's primary selling point for the Claude Pro subscription is "5x more usage," but the company doesn't clearly communicate what Claude's free-tier usage limits actually are. Dropping clues like cryptic breadcrumbs, the company has written a support document about the topic that says, "If your conversations are relatively short (approximately 200 English sentences, assuming your sentences are around 15-20 words), you can expect to send at least 100 messages every 8 hours, often more depending on Claude's current capacity. Over two thirds of all conversations on claude.ai (as of September 2023) have been within this length." In another somewhat cryptic statement, Anthropic writes, "If you upload a copy of The Great Gatsby, you may only be able to send 20 messages in that conversation within 8 hours." We're not attempting the math, but if you know the precise word count of F. Scott Fitzgerald's classic, it may be possible to glean Claude's actual limits. We reached out to Anthropic for clarification yesterday and have not received a response by press time.


CalTech To Accept Khan Academy Success As Option For Admission (latimes.com) 35

"Given that too many schools don't teach calculus, chemistry and physics, CalTech is allowing potential undergraduates to demonstrate their ability in these fields by using Khan Academy," writes Slashdot reader Bruce66423. Los Angeles Times reports: One of Caltech's alternative paths is taking Khan Academy's free, online classes and scoring 90% or higher on a certification test. Sal Khan, academy founder, said Caltech's action is a "huge deal" for equitable access to college. While Caltech is small -- only 2,400 students, about 40% of them undergraduates -- Khan said he hoped its prestigious reputation would encourage other institutions to examine their admission barriers and find creative solutions to ease them. The Pasadena-based institute, with a 3% admission rate last year, boasts 46 Nobel laureates and cutting-edge research in such fields as earthquake engineering, behavioral genetics, geochemistry, quantum information and aerospace. "You have one of the most academically rigorous schools on the planet that has arguably one of the highest bars for admission, saying that an alternative pathway that is free and accessible to anyone is now a means to meeting their requirements," said Khan, whose nonprofit offers free courses, test prep and tutoring to more than 152 million users. [...]

The impetus for the policy change began in February, when Pallie, the admissions director, and two Caltech colleagues attended a workshop on equity hosted by the National Assn. for College Admission Counseling. They were particularly struck by one speaker, Melodie Baker of Just Equations, a nonprofit that seeks to widen math opportunities. As Baker pointed out the lack of access to calculus for many students, Pallie and her team began to question Caltech's admission requirement for the course, along with physics and chemistry. Pallie and Jared Leadbetter, a professor of environmental microbiology who heads the faculty admissions committee, began to look into potential course alternatives. Pallie connected with Khan's team, which started a second nonprofit, Schoolhouse.world, during the pandemic in 2020 to offer free tutoring. Peer tutors on the platform certify they are qualified for their jobs by scoring at least 90% on the course exam and videotaping themselves explaining how they solved each problem on it. The video helps ensure that the students actually took the exam themselves and understand the material. That video feature gave Caltech assurances about the integrity of the alternative path.

Under the new process, students would take a calculus, physics or chemistry class offered by Khan Academy and use the Schoolhouse platform to certify their mastery of the content as tutors do with a 90% score or better on the exam and a videotaped explanation of their reasoning. Proof of certification is required within one week of the application deadline, which is in November for early action and January for regular decisions. Pallie and Leadbetter also wanted to test whether the Khan Academy courses are sufficiently rigorous. Several Caltech undergraduates took the courses to assess whether all concepts were covered in enough breadth and depth to pass the campus placement exams in those subjects. Miranda, a rising Caltech junior studying mechanical engineering, took the calculus course and gave it a thumbs-up, although she added that students would probably want to use additional textbooks and other study materials to deepen their preparation for Caltech.


OpenAI Launches a ChatGPT Plan For Enterprise Customers 16

An anonymous reader quotes a report from TechCrunch: Seeking to capitalize on ChatGPT's viral success, OpenAI today announced the launch of ChatGPT Enterprise, a business-focused edition of the company's AI-powered chatbot app. ChatGPT Enterprise, which OpenAI first teased in a blog post earlier this year, can perform the same tasks as ChatGPT, such as writing emails, drafting essays and debugging computer code. But the new offering also adds "enterprise-grade" privacy and data analysis capabilities on top of the vanilla ChatGPT, as well as enhanced performance and customization options. That puts ChatGPT Enterprise on par, feature-wise, with Bing Chat Enterprise, Microsoft's recently launched take on an enterprise-oriented chatbot service.

ChatGPT Enterprise provides a new admin console with tools to manage how employees within an organization use ChatGPT, including integrations for single sign-on, domain verification and a dashboard with usage statistics. Shareable conversation templates allow employees to build internal workflows leveraging ChatGPT, while credits to OpenAI's API platform let companies create fully custom ChatGPT-powered solutions if they choose. ChatGPT Enterprise, in addition, comes with unlimited access to Advanced Data Analysis, the ChatGPT feature formerly known as Code Interpreter, which allows ChatGPT to analyze data, create charts, solve math problems and more, including from uploaded files. For example, given a prompt like "Tell me what's interesting about this data," ChatGPT's Advanced Data Analysis capability can look through the data -- financial, health or location information, for example -- to generate insights.

Advanced Data Analysis was previously available only to subscribers to ChatGPT Plus, the $20-per-month premium tier of the consumer ChatGPT web and mobile apps. To be clear, ChatGPT Plus is sticking around -- OpenAI sees ChatGPT Enterprise as complementary to it, the company says. ChatGPT Enterprise is powered by GPT-4, OpenAI's flagship AI model, as is ChatGPT Plus. But ChatGPT Enterprise customers get priority access to GPT-4, delivering performance that's twice as fast as the standard GPT-4 and with an expanded 32,000-token (~25,000-word) context window. Context window refers to the text the model considers before generating additional text, while tokens represent raw text (e.g. the word "fantastic" would be split into the tokens "fan," "tas" and "tic"). Generally speaking, models with large context windows are less likely to "forget" the content of recent conversations.
Crucially, OpenAI said that it "won't train models on business data sent to ChatGPT Enterprise or any usage data and that all conversations with ChatGPT Enterprise are encrypted in transit and at rest," notes TechCrunch.

"OpenAI says that its future plans for ChatGPT Enterprise include a ChatGPT Business offering for smaller teams, allowing companies to connect apps to ChatGPT Enterprise, 'more powerful' and 'enterprise-grade' versions of Advanced Data Analysis and web browsing, and tools designed for data analysts, marketers and customer support."

A blog post introducing ChatGPT Enterprise can be found here.

72-Year-Old C++ Creator Bjarne Stroustrup Shares Life Advice (youtube.com) 47

72-year-old Bjarne Stroustrup invented C++ (first released in 1985). 38 years later, he gave a short interview for Honeypot.io (which calls itself "Europe's largest tech-focused job platform") offering his own advice for life: Don't overspecialize. Don't be too sure that you know the future. Be flexible, and remember that careers and jobs are a long-term thing. Too many young people think they can optimize something, and then they find they've spent a couple of years or more specializing in something that may not have been the right thing. And in the process they burn out, because they haven't spent enough time building up friendships and having a life outside computing.

I meet a lot of sort of — I don't know what you call them, "junior geeks"? — that just think that the only thing that matters is the speciality of computing — programming or AI or graphics or something like that. And — well, it isn't... And if they do nothing else, well — if you don't communicate your ideas, you can just as well do Sudoku... You have to communicate. And a lot of sort of caricature nerds forget that. They think that if they can just write the best code, they'll change the world. But you have to be able to listen. You have to be able to communicate with your would-be users and learn from them. And you have to be able to communicate your ideas to them.

So you can't just do code. You have to do something about culture and how to express ideas. I mean, I never regretted the time I spent on history and on math. Math sharpens your mind, history gives you some idea of your limitations and what's going on in the world. And so don't be too sure. Take time to have a balanced life.

And be ready for the opportunity. I mean, a broad-based education, a broad-based skill set — which is what you build up when you educate, you're basically building a portfolio of skills — means that you can take advantage of an opportunity when it comes along. You can recognize it sometimes. We have lots of opportunities. But a lot of them, we either can't take advantage of, or we don't notice. It was my fairly broad education — I've done standard computer science, I've done compilers, I've done multiple languages... I think I knew two dozen at the time. And I have done machine architecture, I've done operating systems. And that skill set turned out to be useful.

At the beginning of the video, Stroustrup jokes that it's hard to give advice — and that it's at least as difficult as it is to take advice.

Earlier this year, Bjarne also told the same site the story of how he became a programmer by mistake — misreading a word when choosing what to study afer his high school exams. Stroustrup had thought he was signing up for an applied mathematics course, which instead turned to be a class in computer science...

Anthropic Launches Improved Version of Its Entry-Level LLM (techcrunch.com) 5

Anthropic, the AI startup co-founded by ex-OpenAI execs, has released an updated version of its faster, cheaper, text-generating model available through an API, Claude Instant. TechCrunch reports: The updated Claude Instant, Claude Instant 1.2, incorporates the strengths of Anthropic's recently announced flagship model, Claude 2, showing "significant" gains in areas such as math, coding, reasoning and safety, according to Anthropic. In internal testing, Claude Instant 1.2 scored 58.7% on a coding benchmark compared to Claude Instant 1.1, which scored 52.8%, and 86.7% on a set of math questions versus 80.9% for Claude Instant 1.1. "Claude Instant generates longer, more structured responses and follows formatting instructions better," Anthropic writes in a blog post. "Instant 1.2 also shows improvements in quote extraction, multilingual capabilities and question answering."

Claude Instant 1.2 is also less likely to hallucinate and more resistant to jailbreaking attempts, Anthropic claims. In the context of large language models like Claude, "hallucination" is where a model generates text that's incorrect or nonsensical, while jailbreaking is a technique that uses cleverly-written prompts to bypass the safety features placed on large language models by their creators. And Claude Instant 1.2 features a context window that's the same size of Claude 2's -- 100,000 tokens. Context window refers to the text the model considers before generating additional text, while tokens represent raw text (e.g. the word "fantastic" would be split into the tokens "fan," "tas" and "tic"). Claude Instant 1.2 and Claude 2 can analyze roughly 75,000 words, about the length of "The Great Gatsby." Generally speaking, models with large context windows are less likely to "forget" the content of recent conversations.


ChatGPT Is Getting Dumber at Basic Math 91

A recently released research reveals a fundamental challenge of developing artificial intelligence: ChatGPT has become worse at performing certain basic math operations. From a report: The researchers at Stanford University and the University of California, Berkeley said the deterioration is an example of a phenomenon known to AI developers as drift, where attempts to improve one part of the enormously complex AI models make other parts of the models perform worse.

[...] Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription. The results aren't entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers.

Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can't work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem -- try dividing by two, three, five, etc., and see if anything works. To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%. Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5 improved on six measures, but remained worse than its advanced sibling at most of the tasks.

Is ChatGPT Getting Worse? (fortune.com) 93

A new study (PDF) from Stanford found that ChatGPT performed worse on certain tasks in June than its March version. The paper supports a widely held, though unproven, notion that the AI language model's performance in coding and compositional tasks has deteriorated in recent months. Fortune reports: The study compared the performance of the chatbot, created by OpenAI, over several months at four "diverse" tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning. Researchers found wild fluctuations -- called drift -- in the technology's ability to perform certain tasks. The study looked at two versions of OpenAI's technology over the time period: a version called GPT-3.5 and another known as GPT-4. The most notable results came from research into GPT-4's ability to solve math problems.

Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted to a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time -- while the June version was consistently right, answering correctly 86.8% of the time. Similarly varying results happened when the researchers asked the models to write code and to do a visual reasoning test that asked the technology to predict the next figure in a pattern.

James Zou, a Stanford computer science professor who was one of the study's authors, says the "magnitude of the change" was unexpected from the "sophisticated ChatGPT." The vastly different results from March to June and between the two models reflect not so much the model's accuracy in performing specific tasks, but rather the unpredictable effects of changes in one part of the model on others. [...] The exact nature of these unintended side effects is still poorly understood because researchers and the public alike have no visibility into the models powering ChatGPT. It's a reality that has only become more acute since OpenAI decided to backtrack on plans to make its code open source in March. "These are black-box models," Zou says. "So we don't actually know how the model itself, the neural architectures, or the training data have changed."

The Almighty Buck

Twitter Starts Sharing Ad Revenue With Verified Creators (techcrunch.com) 62

Twitter has started sending out the first payouts to creators on the platform who are part of the company's revenue sharing program. The largest payout reported thus far was to Billy Markus, the co-creator of the Dogecoin cryptocurrency, which amounted to a whopping $37,050. TechCrunch reports: Users who subscribe to Twitter Blue and have earned more than 5 million tweet impressions each month for the last 3 months are eligible to join. According to owner Elon Musk, the first round of creator payouts will total $5 million, and will be cumulative from the month of February onward. These payouts will be delivered via Stripe. [...] Twitter's payouts are determined by tweet impressions. Babylon Bee writer Ashley St. Clair (710,000 followers) said that she earned $7,153, and according to her "napkin math," she had around 840 million impressions from February through July. That would make her rate about $0.0085 CPM (cost per mille), or $8.52 per million impressions. It's not clear whether or not individual CPMs change from user to user.

Slashdot Top Deals