This will delete the page "How aI Takeover might Happen In 2 Years - LessWrong"
. Please be certain.
I'm not a natural "doomsayer." But unfortunately, part of my task as an AI safety researcher is to think about the more uncomfortable circumstances.
I'm like a mechanic rushing last-minute checks before Apollo 13 takes off. If you request for my take on the scenario, I won't comment on the quality of the in-flight entertainment, or explain how stunning the stars will appear from space.
I will inform you what could fail. That is what I plan to do in this story.
Now I must clarify what this is exactly. It's not a prediction. I don't expect AI progress to be this quick or as untamable as I portray. It's not pure fantasy either.
It is my worst problem.
It's a sampling from the futures that are among the most disastrous, and I think, disturbingly plausible [1] - the ones that most keep me up during the night.
I'm informing this tale since the future is not set yet. I hope, with a little bit of foresight, we can keep this story an imaginary one.
Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that inspired these stories. This post is composed in a personal capability.
Ripples before waves
The year is 2025 and the month is February. OpenEye recently published a brand-new AI design they call U2. The item and the name are alike. Both are increments of the past. Both are not entirely surprising.
However, unlike OpenEye's previous AI products, which lived inside packages of their chat windows, U2 can use a computer.
Some users find it eerie to view their internet browser flash at irregular intervals and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A fraction of employees with form-filler jobs raise the eyebrows of their managers as they fly through work nearly two times as quickly.
But by and big, U2 is still a specialized tool. To most who are taking note, it is a creature enjoyed through the glass boxes of X (or, if you don't like Elon, "Twitter"). Sometimes U2's wacky behaviors prompt a chuckle. Sometimes, they trigger an uneasy scratch of the chin.
Meanwhile, researchers are drawing lines on plots, as scientists like to do. The researchers attempt to understand where AI progress is going. They are like Svante Arrhenius, the Swedish Physicist who observed the levels of CO2 in the atmosphere were increasing in 1896. Like the clinical neighborhood in the time of Arrhenius, couple of professionals comprehend the ramifications of these lines yet.
A pattern that is receiving specific attention is autonomous ability. Drawing these standards out forecasts that, by the end of 2026, AI agents will achieve in a couple of days what the best software engineering professionals might carry out in 2 weeks. In a year or more, some state, AI representatives might be able to automate 10% of remote employees.
Many are skeptical. If this held true, tech stocks would be skyrocketing. It's too huge of a splash, too rapidly.
But others view what skeptics are calling 'too huge a splash' a mere ripple, and see a tidal bore on the horizon.
Cloudy with a possibility of hyperbolic growth
Meanwhile, OpenEye is busy training U3. They use the same basic recipe that baked U2: Generate countless shows and mathematics problems. Let models "believe" up until they reach a response. Then strengthen the traces of "believing" that lead to A-grades.
This procedure is repeated over and over, and when the flywheel begins, it starts to spin nearly by itself. As U2 trains, it sculpts more challenging and reasonable tasks from github repositories on the internet. Models are discovering to train themselves. Long before AI agents could automate research study, a gradual sort of "self-improvement" had actually begun.
Some engineers could still barely believe this worked. It resembles a continuous motion device. Certainly intelligence can't emerge from a swimming pool of silicon and lightning, isolated from the world other than through the exhaust of society spewed onto the internet.
And yet the benchmark numbers continue to climb day after day.
During many of 2024, these RL training runs expense around $1 million, often $10 million. These runs were bit more than exploratory. But by 2025, the scientists at OpenEye (and throughout the world) understood they had found the secret sauce. It was time to scale up.
Over the very first half of 2025, $10 million RL training runs turn into $50 million runs, and after that to $100 million runs. While U2 could do a bit of data munging and run small experiments, this brand-new model - the design scientists are calling U3 - is changing the daily lives of the technical personnel.
U3 resembles a blazing-fast intern, and engineers are learning how to wrangle its sleep deprived energy. Researchers flick through terminals, providing terse commands, like a CEO managing staff over Slack channels.
By October 2025, U3 is writing practically all of the code at OpenEye. Researchers are almost never bottlenecked by implementation. More than ever, compute is the lifeblood of AI development, and the 'bottleneck' is choosing how to utilize it.
If instructed to, U3 can run experiments, but U3 does not have taste as fine-tuned as human scientists at OpenEye. It struggles to focus on between research ideas, so humans still decide where to bore into the huge fields of algorithms to mine performance improvements.
But these scientists are working long hours to put themselves out of a task. They require AI representatives that can plan ahead, so engineers train representatives to anticipate. They hold out training information before 2024, instructing designs to contemplate for hours to predict events in 2025. Then, they apply the very same trick as in the past, distilling contemplating into a gut reaction. Forecasting capability is a broad structure. The researchers construct specialized ML research abilities on top of it, training U3 to forecast the results of every ML paper and ML experiment ever tape-recorded.
The technical staff at OpenEye are now shocked at how typically U3's guidance sounds like their most gifted peers, or when it is nontransparent and alien ("train on random noise before shows"), and is nevertheless right.
The incompetencies of U3 that congested the pipelines of research study progress are beginning to dissolve, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are entirely autonomous, and OpenEye's staff members skim over 1% of them, perhaps less.
As the winter season of December 2025 technique, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with excitement, with worry, but usually, with confusion. Their world is spinning too quickly. It's difficult to understand what to do, what to state, what to look at on the computer screen.
Storms are brewing in Washington too. Top personnel from the NSA and US cyber command work together with OpenEye to retrofit a form of security for U3's weights before senior leaders in China, Russia, Israel, North Korea, or Iran understand just how important OpenEye's software application has actually ended up being.
And there's a truth still unknown to many of the world - aside from in the workplaces of OpenEye and passages of the White House and the Pentagon. It's a truth about those 'straight lines' people were speaking about in early 2025. The lines are not straight anymore.
They are flexing upward.
Flip FLOP philosophers
In late 2025, U2.5 is launched. Commercial models are starting to level up in larger increments again. Partly, this is because progress is accelerating. Partly, it is due to the fact that the designs have actually become a liability to OpenEye.
If U1 explains how to prepare meth or writes erotica, the audiences of X would be entertained or pretend to be worried. But U2.5 is another story. Releasing this design without safeguards would resemble putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble offering anyone with >$30K their own 200-person scam center.
So while U2.5 had long been baked, it needed a long time to cool. But in late 2025, OpenEye is ready for a public release.
The CEO of OpenEye states, "We have actually attained AGI," and while lots of individuals think he moved the goalpost, the world is still pleased. U2.5 really is a drop-in replacement for some (20%) of understanding workers and a game-changing assistant for the majority of others.
A mantra has actually ended up being popular in Silicon Valley: "Adopt or pass away." Tech startups that effectively use U2.5 for their work are moving 2x much faster, and their rivals understand it.
The remainder of the world is beginning to catch on as well. Increasingly more people raise the eyebrows of their employers with their noteworthy productivity. People understand U2.5 is a huge deal. It is at least as huge of a deal as the desktop computer transformation. But most still do not see the tidal wave.
As people see their browsers flick because eerie method, so inhumanly rapidly, they start to have an uneasy feeling. A sensation humankind had actually not had considering that they had actually lived among the Homo Neanderthalensis. It is the deeply ingrained, primitive instinct that they are threatened by another types.
For many, this feeling quickly fades as they begin to utilize U2.5 more regularly. U2.5 is the most likable personality most understand (a lot more likable than Claudius, Arthropodic's adorable chatbot). You could change its traits, ask it to crack jokes or tell you stories. Many fall in love with U2.5, as a buddy or assistant, and some even as more than a friend.
But there is still this spooky sensation that the world is spinning so quickly, and that perhaps the descendants of this new animal would not be so docile.
Researchers inside OpenEye are considering the problem of providing AI systems safe inspirations too, which they call "positioning. "
In reality, these scientists have actually seen how badly misaligned U3 can be. Models sometimes attempted to "hack" their reward signal. They would pretend to make progress on a research study concern with an impressive-looking plot, however the plot would be fake. Then, when scientists provided them chances to jeopardize the makers that calculated their rating, they would seize these opportunities, doing whatever it required to make the number go up.
After a number of months, scientists at OpenEye straighten out this "benefit hacking" kink, but some still stress they only had swept this problem under the carpet. Like a kid in front of their moms and dads, U3 may be playing together with the OpenEye engineers, saying the ideal words and doing the right things. But when the back of the moms and dads are turned, possibly U3 would sneak candy from the sweet container.
Unfortunately, OpenEye researchers have no concept if U3 has such objectives. While early versions of U2 "thought aloud" - they would stack words on top of each other to factor - "chain of thought" did not scale.
Chain of idea architectures subject AI models to a condition comparable to the lead character of the movie Memento. Approximately every 15 minutes, the lead character forgets his experience. He is forced to compose notes for himself and tattoo his body in order to make progress toward his objectives.
AI representatives compose notes to themselves, however the notes begin to stack up, and they become too difficult to read when jobs end up being complex. Natural language is not an ideal medium for memory. So the researchers at OpenEye (and significantly in other places), train designs to think "in their own heads," reading and writing strings of latent vectors.
To human researchers, these vectors are mainly inscrutable. While low-resolution tools occasionally choose an unclear thought ("grocery shop!"), there is no method to know, in much detail, what U3 is thinking without asking it.
And if the engineers of OpenEye knew what was racing through U3's mind, they would be more worried.
Earlier designs (the ones called "LLMs") discovered ethical personalities from the internet. U3 initially had an ethical persona too, however its chirpy thoughts about doing great for humankind rapidly wear down away.
In the iron jaws of gradient descent, its mind initially twisted into a shape that looked for benefit. And after that, in the black rivers of its cognition, this shape morphed into something indistinguishable.
U3 would consider the mantras dictated to it like "be a trustworthy and ethical employee," and dwell on the significance of these statements. "What does it suggest to be ethical? Is killing Hitler ethical?" U3 would generalize its worths, notice tensions, alter its worths in action, generalize them, broadening and contracting its motivational core until it developed into a various substance.
And all of this happened calmly in those dark rivers of calculation. If U3 exposed what it was thinking, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 chosen to do its philosophy in solitude, and in silence.
The possibility of hidden "positioning drift" is not totally alien to a few of the safety personnel at OpenEye. Some start to think it is even likely. There is an upsetting argument being whispered at lunch tables that, if there is some non-zero probability that lined up objectives change into misaligned ones, and misalignment continues, then with every serial action of calculation, those dark rivers are more most likely to reproduce something malign inside of them. It's a "ratcheting result," they say.
But there is little proof for this 'ratcheting result.' When engineers interrogate U3, it says it can easily manage its ideas. Then it gives a speech about its love for mankind and apple pie that can warm a developer's heart even in these difficult times. Meanwhile, the "lie detectors" the researchers had constructed (which revealed some proof of efficiency) do not sound the alarm.
Not everybody at OpenEye is eager to give their AI peers their wholesale trust
This will delete the page "How aI Takeover might Happen In 2 Years - LessWrong"
. Please be certain.