AI in early 2026 – from fascination to maturity

Abstract

This article synthesizes the state of AI development in early 2026 and argues that the main direction of progress is shifting from quantitative model scaling toward qualitative systemic maturity. I analyze thirteen research areas (including continual learning, knowledge compression optimization, test-time compute, world models, agency, and multimodality) and their implications for AI reliability and safety. The central thesis asks whether the key challenge of the coming years is further growth in cognitive capability, or closing the gap between the increasing agency of models and the possibilities of their interpretation, diagnosis, and controllability. I demonstrate that the boundary of better AI is becoming not only data and architectures but also energy costs and computational infrastructure, which give development a geopolitical dimension. I conclude the essay with a discussion of the limits of anthropomorphization and how human subjectivity changes in a world optimized by algorithms with superhuman reasoning efficiency.

Introduction

The prevailing paradigm based on exponential increases in computational power and training data volume is colliding with new barriers—both technological and physical. We are transitioning from a phase of fascination with generative capabilities (although I am certain that this year we will be shocked multiple times by the abilities of GenAI models, or by new breakthroughs in robotics) to a phase where reliability, energy efficiency, and systems’ capacity for continuous adaptation become paramount. We are moving from an era of scaling into an era of research and optimization, increasing efficiency not through revolutionary actions but through the effect of continuous improvement.

However, it must be honestly noted that the thesis of the exhaustion of the “brute-force scaling” paradigm remains a subject of sharp debate in 2026. Voices such as Leopold Aschenbrenner’s [1] Aschenbrenner, L. (2024): Situational Awareness: The Decade Ahead Link argue that scaling laws have not slowed down at all, but have merely changed their “fuel”—from raw internet data to gigantic amounts of synthetic data and powerful computational resources devoted to inference itself (test-time compute). From this perspective, the path to AGI does not lead through algorithmic elegance, but through the construction of clusters worth hundreds of billions of dollars, which by sheer mass of silicon and energy push the boundaries of intelligence. We thus face a question: in the race for supremacy, will the “smartest” system win, or the one backed by the largest nuclear power plant?

It is no coincidence that I used the safe term “better AI” in the title, because in my opinion, 2026 forces us to return to the definition of Minimal AGI. Shane Legg from Google DeepMind [2] Legg, S. (2025): The arrival of AGI Link defines it as the moment when a system can perform any cognitive task that a human is capable of performing. We are no longer looking for a genius solving mathematical puzzles—we are seeking “mediocrity” that is universal. We no longer ask whether AGI is possible, but at what level of this spectrum we currently find ourselves. We remember the “blunders” of major AI players, who, while able to solve complex mathematical problems, stumbled on simple tasks like “strawberry” (though this is not the best task for an LLM, it exposes its current AGI limitations).

The last few weeks and the transition into 2026 prompted me to reflect on the passing achievements of AI and the future—what the coming year might bring. The longer I wrote this article, the more questions, reflections, and doubts appeared in my mind. Are these challenges for the current year, or perhaps a roadmap for the next several or even dozen years? And yet so much is being said about AGI these days. The world’s biggest players, companies producing well-known AI systems, are competing in predicting the arrival date of superintelligence—in a year, in five, in ten, or never (there is no consensus on this). Do they know something that we ordinary mortals, AI users, cannot see? Are the great American and Chinese labs hiding AGI—superintelligence that will solve the world’s greatest problems in the future? Or perhaps, anticipating the specter of an approaching failure to achieve AGI in the coming years, have they narrowed its requirements to solving a narrow range of tasks, such as being a smart chat at the level of an academic lecturer (a common slogan of AI creators—“a model at the doctoral level”)? Then an even more important question came to me—what will the human future look like in a few or a dozen years?

However, I have no doubt that the topic of AGI is very fashionable lately. It is often not analyzed in depth. General statements are expressed, lacking a broad analysis of the current state of AI. Opinions are presented without a concrete definition of AGI requirements, the target state. Further concepts like ASI (Super Intelligence) are defined, which further obscure the picture. In my opinion, to the detriment of AI, to the understanding of what it is and where we are heading. Rarely do we talk about the challenges, problems, and directions of today’s artificial intelligence. However, we see progress, most of us experience AI and benefit from this technology. This progress is accelerating day by day. These are no longer small steps but giant leaps. I have the personal impression that at the beginning of 2026, each day brings more changes than a week’s worth of progress in 2024. At the same time, according to the creators of LLM Arena, the best models stay on top for an average of 35 days, dropping out of the TOP5 within five months. Claude 3 Opus, introduced in March 2024, currently ranks 139th on the LLM Arena list.

Entry into 2026 brings hard evidence that the barrier of task complexity we pose to AI is systematically decreasing. Just look at the FrontierMath Tier 4 [3] Epoch AI Research Team (2024): FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI benchmark, considered a bastion of unsolvable mathematical problems. Of 48 extremely difficult tasks, as many as 14 succumbed to the power of OpenAI’s flagship model, GPT-5.2 (Pro) (as of January 11, 2026). I am absolutely certain that more tasks from the FrontierMath set will fall this year. I bet we will exceed 50% of solutions.

From the latest achievements, actually while writing this essay, surprising news from the field of research mathematics comes in. On erdosproblems.com, a case was recorded where a model from the GPT-5.2 family led (in a loop with a human and formalization in Lean) to the resolution of several Erdős problems. Importantly, the success was not the result of “magical intuition” from a single prompt, but from a process. The barrier was no longer “computational power” itself, but controlling hallucinations and closing gaps in proofs through iterative critique and support from the formal proving process. Even if some of these “solutions” are later reduced to finding results in the literature or clarifying ambiguous task content, the very fact that the LLM → corrections → formalization loop works in practice shifts the boundary of what we consider achievable by AI in 2026. We are on the threshold of a new industrial revolution. While the previous one replaced human muscles, the current one replaces brain functions.

Another evolution is observed in the ARC-AGI-2 [4] ARC Prize Team (2025): ARC-AGI-2: The 2025 Abstraction and Reasoning Challenge Link benchmark. We started 2025 with scores at a few percent, only to close it with an impressive effectiveness of 54.2% (also OpenAI’s model, GPT-5.2 (Pro)) and a proposal for a new test, ARC-AGI-3 [5] ARC Prize Team (2025): ARC-AGI-3: Interactive Reasoning Benchmark Link . Comparing models on some known benchmarks GPQA Diamond, HMMT, AIME 2025, MMMLU loses discriminatory power (results above 90%). These tests can at best serve as a sanity check to verify training quality. They can confirm that the model has not degraded (regressed) and has not “forgotten” fundamental principles of logic due to an engineering error.

And what about vibe? Vibe-coding and vibe-designing (music, graphics) are developing. I remember my first attempts at collaborating with coding agents. Usually, the amount of time spent correcting agent errors was greater than writing from scratch. Perhaps it was my incompetence, lack of knowledge. However, the situation changed toward the end of the year. Subsequent solutions wonderfully fulfill their task. Maybe not perfectly, but noticeably better. They help not only professionals but also those who would never have attempted programming in their lives. Even orthodoxes coding “from scratch,” like Linus Torvalds, mention real benefits from using AI to write Linux kernels. But that’s not the end. When OpenAI defines AI maturity levels, setting level five as the goal—“Organizations: AI that can do the work of an organization”—the first vibe business solutions appear, such as Atoms.dev [6] Atoms.dev (2025): Turn ideas into products that sell Link . A development environment based on the “AI Team” concept. It transforms natural descriptions of ideas into ready digital products. Atoms builds a “virtual” development team, where autonomous units divide work into stages of planning, architecture design, full-stack code writing, and deployment and testing. All practically with minimal human involvement.

In this article, I have defined thirteen areas that, in my opinion, will determine further AI progress, but also what the future of human-technology relations will look like and thus what the world will look like. From the “data wall” and the need to go beyond human data, through challenges related to memory and meta-level reasoning, to building the foundations of future human-machine symbiosis. These are several directions for AI to stop being merely a static knowledge archive and become a dynamic, reasoning system capable of something more than clever reproduction. In this article, I reference the latest publications, using them to define the current state of issues and simultaneously the entry point into 2026. The points described will allow me to systematically track technology development on the road to better AI (AGI). However, there is no rose without thorns. Ideal AI and the desire to address most of these points will make AI significantly stronger than humans. So what about safety, interpretability, the future world? That’s probably a topic for a completely different article. Let’s keep this in mind, however.

Continual Learning and Adaptation

The most treacherous characteristic of contemporary models is not lack of “intelligence,” but its uneven quality (jagged intelligence). How should we understand this? Extreme proficiency alongside surprisingly primitive errors. The true metric of 2026, as Demis Hassabis of DeepMind [7] Google DeepMind (2025): The Future of Intelligence with Demis Hassabis Link mentions, will therefore be consistency and reliability, not additional benchmark points. How do we help AI in self-improvement (adaptation, “on the fly,” to new tasks) while simultaneously experiencing many obstacles such as model drifting, catastrophic forgetting, contamination (how to evaluate new information), insufficient computational power, and method effectiveness?

One such option is continual learning [8] Haizhou Shi, et al. (2024): Continual Learning for Large Language Models: A Comprehensive Survey . Continual learning (CL) is a paradigm departing from the static model of training on frozen datasets toward dynamic systems that evolve with incoming information. In the context of LLMs (Large Language Models), this challenge involves effective adaptation to changing data distributions without the need for costly model retraining from scratch every time new facts, legal regulations, or user preferences appear.

According to the latest analyses, CL in the world of large language models is realized in two main directions:

Vertical continuity: involves gradual transition from general model abilities to highly specialized competencies. This process comprises three stages: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT).
Horizontal continuity: focuses on the model’s ability to adapt over time and across different domains, allowing it to absorb new trends and facts while maintaining historical knowledge.

The key technological barrier remains so-called catastrophic forgetting. This phenomenon occurs when a model, while learning new information, overwrites parameters responsible for previously acquired skills, leading to a sharp drop in performance on old tasks. CL solutions aim to create “targeted adaptation” mechanisms that are much more resource-efficient, allowing model updates at a fraction of the computational costs of full training.

The challenges we face, which are already partially addressed in research, can be divided into categories:

A. Architecture – dynamic architecture – is the ability to change the physical structure of the model a condition for excellence and achieving AGI? Will we stay with MoE (Mixture of Experts) [9] Jacobs, R.A., et al. (1991): Adaptive Mixtures of Local Experts , allowing the model to dynamically select specialized “experts” (subnetworks) for each processed token? Or perhaps with Mixture-of-Depths [10] Raposo, D., et al. (2024): Mixture-of-Depths: Dynamically allocating compute in transformer-based language models , assuming that the model dynamically decides which tokens require full processing through transformer layers and which can skip them. Such a solution allows for “intelligent” allocation of computational power in real time. This means a transition from rigid processing of each data element in the same way to an architecture that learns to selectively engage its resources only where the complexity of information requires it. Will we observe more radical discoveries in the near future, aimed at partial or complete change of model architecture “on the fly”?

B. Training – how to dynamically change model capabilities after deployment using training techniques? In 2025, the first concrete mechanisms for persistent model adaptation appeared. For example, Self-Adapting Language Models (SEAL) [11] Zwieger, A., et al. (2025): Self-Adapting Language Models . In this method, the LLM generates its own training data (“self-edits”) and uses it to update its weights through a reinforcement learning (RL) loop. This allows the model to permanently learn new information without training from scratch. A step toward LLMs that actually update and adapt based on their own experiences.

Another approach is BDH (Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain) [12] Kosowski, A., et al. (2025): The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain , which abandons the rigid division between training and inference phases in favor of Hebbian Learning mechanisms. Instead of relying solely on backpropagation of error, the system mimics biological plasticity. Neurons that respond together to a given stimulus strengthen their connections in real time. This makes learning a natural side effect of information processing, eliminating the “Groundhog Day” effect where the model forgets the interaction context immediately after it ends.

However, the end of 2025 brings more proposals in the form of the Nested Learning [13] Behrouz, A., et al. (2025): Nested Learning: The Illusion of Deep Learning Architectures paradigm. The Hope module presented there (albeit still in the research phase) is a “self-modifying” system. Its innovation lies in breaking with the “illusion of rigid architecture,” where the model (weights) is separated from the static learning algorithm (optimizers like Adam, SGD, etc.). In the Nested Learning approach, the optimization algorithm becomes part of the network itself. The Hope module operates in a nested loop: while base layers process data, the supervisory module analyzes error dynamics and rewrites weight update rules for specific neurons in real time. This allows the network to locally increase plasticity for new tasks while simultaneously “freezing” regions responsible for old knowledge. At the same time, this solution reduces the catastrophic forgetting problem at the level of learning mathematics itself, not just architecture.

C. Memory consolidation – how to transfer something from short-term memory (context) to long-term memory (weights), while not degrading model quality? The TITANS architecture (with the MIRAS module) [14] Behrouz, A., et al. (2025): Titans: Learning to Memorize at Test Time proposes a change here. Instead of treating all weights as “sacred” and frozen after training, it separates a neural memory module that learns online. The key here is a selection mechanism based on “surprise” (surprise metric). The model permanently memorizes in its parameters only what is new and unpredictable, ignoring noise.

In parallel, in January 2026, the Sakana AI team proposed the Fast-weight Product Key Memory (FwPKM) [15] Zhao, T., et al. (2026): Fast-weight Product Key Memory architecture. This solution redefines sparse memory layers (Sparse Product Key Memory), transforming them from static modules into dynamic episodic memory. FwPKM updates its parameters (keys and values) both during training and inference. It uses local gradient descent on fragments of processed text. This allows the model to rapidly “record” new associations in short-term memory and generalize to context windows of around 128,000 tokens (despite training on only 4,000). This approach effectively realizes the postulate of separating persistent semantic memory from plastic episodic memory. The next step in this evolution is taken by DeepSeek in the Engram [16] Xin, Ch. et. al. (2026): Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Link architecture (perhaps this module will be available in V4), introducing the so-called Conditional Memory. Instead of burdening the model weights with facts, DeepSeek moves them to a module based on hashed n-grams with O(1) access. Thanks to context-aware gating (Context-Aware Gating), the model retrieves this data only when it is consistent with the current thought process. This allows for offloading the early layers of the network from the reconstruction of static patterns, delegating them to an “external brain” and reserving the transformer’s computational power for pure logic and global context.

D. Test-time computing – how to control model output in real time to obtain new, qualitatively better output, instead of just scaling parameters? In “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters” [17] Snell, C., et al. (2024): Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters , the authors introduce solutions allowing the generation of multiple answer proposals, which are then searched, evaluated, and selected as the generation result.

E. Knowledge fusion and composition (Model Merging) – does adaptation have to mean training? An alternative to continuous retraining of one monolithic model is the paradigm of merging competencies from separate instances, known as Modular AI. Although simple model merging through linear weight averaging is conceptually elegant, in practice it often leads to loss of characteristic, high-quality abilities of both “parents.” This problem becomes apparent when models have been specialized in extremely different domains.

In response to these limitations, new research directions on evolutionary model fusion have emerged. One of the most innovative approaches is the Darwin Gödel Machine (DGM) [18] Zhang, J., et al. (2025): The Darwin Gödel Machine: AI that improves itself by rewriting its own code Link architecture proposed by Sakana AI. DGM is not a traditional “weight merging” method, although it fits into the broader trend of creating systems that can assimilate new skills without classical retraining. DGM is based on the assumption that a model can independently improve its own software. Instead of simple parameter merging (model soup, Ties, DARE, etc. [19] Yang, E., et al. (2025): Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities ), the system generates numerous variants of itself (modifications of code, architecture, or configuration). These are then evaluated in an open process resembling biological evolution. The best-performing variants go into an archive. They become the basis for subsequent iterations. In this way, the agent does not perform costly pre-training but evolves by exploring the space of possible solutions and gradually developing new abilities.

The literature also includes formal treatment of meta-learning [20] Anonymous (2024): Meta-Learning and Meta-Reinforcement Learning: Tracing the Path towards DeepMind's Adaptive Agent . It treats the learning process as a higher-order task. The model not only adapts to new tasks but learns how to learn.

The common denominator of the above approaches is shifting the boundary between a static model and a system capable of controlled change—whether through architecture, parameters, memory, or the inference process itself. Continual learning thus does not reduce to “another training algorithm” but to designing mechanisms that allow the model to decide what, when, and how to change, without losing previously acquired competencies.

Of course, there is another side to the coin. If we give the AI system the ability to continuously upgrade its qualifications, we will lose control over its development (it will create and implement the self-improvement process). Effectively constructed CL will practically deprive us of control capability unless research in explainable AI outpaces AI engineering.

Solving this problem is the difference between a dead archive, “just matrix multiplication,” and a “human” who continuously learns and adapts. Solving this problem also means handing the AI development process over to artificial intelligence itself.

Knowledge Compression Optimization

How can we obtain better information resolution from the same amount of data (what about contradictions, lack of knowledge → hallucinations – calibration, ability to say “I don’t know” or “I have no source” – Uncertainty Quantification)? Can the order of presenting data during training, especially pre-training (Curriculum Learning), increase density and quality of information representation in models? How to ensure the quality of synthetic data – diversity, quality, informativeness? In light of research equating language modeling with compression [21] Deletang, G., et al. (2024): Language Modeling Is Compression , this challenge boils down to one thing: how to force the model to compress data more efficiently, discovering laws hidden within rather than just surface correlations, while minimizing “hallucinations.”

In light of the latest research (beginning of 2026) “From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence” [22] Finzi, M., et al. (2026): From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence , optimizing knowledge compression requires redefining what we consider “information.” Classical theories (Shannon, Kolmogorov) are insufficient to describe what computationally limited systems can actually extract from data. The authors introduce the measure epiplexity, defining the amount of structure that can be discovered in data given an appropriate computational budget. Better “resolution” of information from the same amount of data is not about adding bits but about investing computational power in reducing epiplexity. What appears as noise to a weak model (high entropy) may turn out to be a deterministic pattern for a model with a larger computational budget (deeper processing) (high epiplexity but low entropy). “Squeezing” knowledge is the process of transforming apparent noise into compressible rules. The recently fashionable modeling on synthetic data and the context of epiplexity prompts thinking about how to generate such data. Good synthetic data is not data that maximizes diversity (entropy) but data that maximizes epiplexity within the model’s reach. Such data should contain hidden, non-trivial structures that force the model to “compression effort” (discovering laws) rather than just memorizing surface correlations.

In summary, in the paradigm equating modeling with compression, the goal is no longer just minimizing prediction error but maximizing the efficiency of the “epiplexity engine”—the ability to convert computational power into understanding the structure of the world.

When Internet data runs out, further progress depends not on model scale but on information density and the ability to separate signal from noise. The art will not be further scaling but managing in a world of limited resources (computational, energy, constraints imposed by physics).

Experience Beyond Human Data

How do we increase the impact of AI experiencing the environment beyond interpretation, bias of ignorance, and human overinterpretation? According to the vision of Silver and Sutton, one of the pioneers of reinforcement learning (RL), in the essay “The Era of Experience” [23] Silver, D. & Sutton, R. (2025): Welcome to the Era of Experience , we must make a fundamental leap from the static “Era of Human Data” (where the model merely imitates our text or code) to the dynamic “Era of Experience.” The key is replacing imitation learning with a process of active learning from mistakes in interaction with reality—physical or simulated. Rich Sutton, in 2025, presented a vision of achieving superintelligence through an architecture called OaK (Options and Knowledge) [24] Sutton, R. (2025): The OaK Architecture: A Vision of SuperIntelligence from Experience Link . The main assumption is to create an agent that is general (domain-independent). This agent learns exclusively based on experience in real time (runtime) and is open to infinite development of abstraction. Sutton argues that the path to strong artificial intelligence (AGI) leads through RL, not just through language models (LLM), and requires moving away from embedding expert knowledge at design-time in favor of learning everything during interaction with the world.

It is evident, through the manifestation of progress in robotics, that we are entering a golden era of World Models—systems that not only predict the next token (text or image) but learn the internal dynamics of the environment and can “think through simulation.” The key and missing link between classical RL and today’s boom in generative models is the paradigm of imagination training (sports psychology knows this well) imagination training—the agent does not have to learn exclusively from costly interactions with the world. A significant part of learning can be done on trajectories “imagined” inside its own world model. An example implementing such a concept is DreamerV3 [25] Hafner, D., et al. (2023): Mastering Diverse Domains through World Models . A general model-based RL algorithm that scales to very diverse tasks and improves behavior by “imagining” future scenarios. Dreamer shows that experience can replace human data (the data used for learning) even in extremely difficult environments. This is the transition from learning from pixels alone and sparse rewards to open environments where the agent can independently discover long causal chains. This lesson is fundamental for the coming years of AI development. If we want to go beyond the “cage of internet average,” we must build agents who learn the laws of the world not from reading and human summaries of the world but from the consequences of actions. World models are their imagination and simultaneously the engine of generalization.

A new, qualitative step in this direction is the work of the World Labs team (founded by Fei-Fei Li), which redefines the concept of a world model as an independent cognitive environment, not merely a helper tool for RL. In the proposed Marble World Model [26] World Labs Team (2025): Marble: A Multimodal World Model Link , the world is not a reconstruction of one specific environment nor a simulator with rigidly defined physics. It is a probabilistic dynamics model. It can generate, modify, and test alternative versions of reality. At the same time, it maintains causal consistency. The agent does not learn here from “real data” but from the consequences of actions in a world that it can internally create and explore itself. Experiences thus become synthetic but almost completely or completely real. They have the structure of the world, even if they do not come directly from human observation. This shifts the boundary of “beyond human data” even further. AI not only goes beyond the set of texts, images, or video recordings but begins to operate on the space of possible worlds. In this view, data ceases to be a limitation and becomes merely the basis for model initialization. The rest of knowledge arises through exploration, simulation, and hypothesis testing inside the world model. This is analogous to human reasoning (“what if…”) but realized on a scale inaccessible to the biological brain.

In my opinion, the current leader in the class of generative world models remains Google with Genie 3 [27] Google DeepMind Team (2025): Genie 3: A new frontier for world models Link , which can generate playable, interactive environments based on simple instructions, allowing AI agents to train in an infinite number of virtual worlds. In 2026, the symbiosis between world models and agents will, in my opinion, reach a critical point. On one hand, we have Genie 3, which has ceased to be just a video generator and has become “AI imagination.” It can create any interactive training environment even from a single image. On the other hand, SIMA 2 (Scalable Instructable Multiworld Agent) [28] Google DeepMind Team (2025): SIMA 2: A Generalist Embodied Agent for Virtual Worlds Link appears. While Genie “is” the world, SIMA “acts” in the world. It is an agent that does not learn a specific game but learns to understand the rules of virtual reality. Because SIMA operates exclusively on pixels and natural language (like a human), it simultaneously becomes an ideal testing ground for future robotics and learning “through experience” in many worlds at once.

World models can serve as low-cost simulators [29] Sapkota, R., et al. (2025): Vision-Language-Action Models: Concepts, Progress, Applications and Challenges . Traditional simulators (like Gazebo or Isaac Sim) require manual definition of complex physics laws and collision geometry. This is a slow and costly process. Meanwhile, Vision-Language-Action models can “learn” simulation directly from video recordings. The cost of generating new experience for an AI agent began to be measured in GPU computational cycles rather than engineer work. Thanks to this, an agent can train in thousands of “physically plausible” worlds simultaneously, dramatically shortening the time from simulation to reality (sim-to-real). Such approaches have another advantage over classical simulators. It is the ability to model phenomena difficult to describe mathematically. Generative world models learn complex interactions (e.g., soft body deformations, fluids) from visual observation. Although critics point to the risk of “hallucinations,” these minor deviations from reality act as natural data augmentation. They force the agent to build more generalizing action strategies.

We must also remember projects like Oasis [30] Decart AI Team & Etched (2024): Oasis: The First Playable AI World Model Link . In 2025, it showed that “playable models” can work in real time, generating the physics of a complex world (resembling Minecraft) at 20 frames per second, instantly responding to player actions. Meanwhile, Diamond [31] Alonso, E., et al. (2024): Diamond: Diffusion for World Modeling showed innovative use of diffusion models as a physics engine for RL agents (e.g., in CS:GO), blurring the boundary between video generation and simulation.

In this race, however, two deep philosophies of cognition collide. The first (represented by Genie or Oasis) focuses on generating pixels. AI imagines every detail of the image. The second, promoted by Yann LeCun, is Abstract Prediction. His JEPA (Joint-Embedding Predictive Architecture) [32] LeCun, Y. (2022): A Path Towards Autonomous Machine Intelligence and its newer version, such as LeJEPA [33] LeCun, Y. (2025): LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics , reject visual generation as a waste of resources and a source of learning instability. Instead of predicting observations in data space (pixels), the model learns to predict future states in the space of abstract representations. This means the model tries to understand what will happen, not how it will exactly look.

LeJEPA shows that effective self-supervised learning is possible without heuristic “tricks,” but through pure prediction in embedding vector space. This matters for experiencing beyond human data. An abstract world model does not have to reproduce human perception to be useful. It suffices that it correctly models relationships, variants, and object dynamics. This approach seems closer to how the human brain works, which does not render photorealistic images of the future but operates on conceptual structures and consequence predictions.

Regardless of architecture, the goal remains common: Grounded Reality. AI must draw verifiable feedback signals directly from the environment (e.g., “does the code compile?”, “is the theorem proven?”, “did the robot fall over?”), instead of relying on subjective and error-prone human assessment. Just as AlphaGo [34] Silver, D., et al. (2016): Mastering the game of Go with deep neural networks and tree search discovered moves unknown to masters by playing against itself, future systems must “experience” the world, mathematics, physics, and interaction to understand them. You cannot learn to swim from reading even the best book. AI also will not learn the real world if it remains locked in the archive of human experience.

Experience is the path for AI to cease being merely “the sum of human mediocrity.” LLMs fed with internet data replicate human errors. Only going beyond the human “cognitive cage” toward experienced, verifiable reality will allow for truly superhuman intelligence.

Thinking About Thinking – Meta-cognition

How is AI supposed to think about its own internal thinking, its internal states? The ability to detect contradictions, verify its own conclusions, and reason under conflicting goals (moral, legal, business) is one of the conditions for development but also for safe AI. The challenge is reliability, the ability to track thinking and exit dead ends—backtracking. The model’s ability to “stop” and revise its own path is a potential way to break the cascade of errors resulting from linear prediction of information (tokens, information encoded in latent space—I will write about this in subsequent chapters). This approach is currently being developed through new paradigms, such as Inference Scaling Laws (represented by OpenAI models e.g., o1, 5.2 Pro) [35] OpenAI (2024): Learning to Reason with LLMs Link , which prove that output quality depends directly on time devoted to “hidden reasoning.” In parallel, there is a departure from linear thinking toward tree structures (Tree of Thoughts) [37] Yao, S., et al. (2023): Tree of Thoughts: Deliberate Problem Solving with Large Language Models and Reflexion [38] Shinn, N., et al. (2023): Reflexion: Language Agents with Verbal Reinforcement Learning techniques. In these cases, the agent learns based on verbal reflections related to received feedback (textual summaries of errors and improvement hints). Reflections are stored in episodic memory and added to the agent’s context in subsequent attempts, helping it better select decisions in the future.

The latest reports on the BDH architecture with Pathway [12] Kosowski, A., et al. (2025): The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain suggest that AI is beginning to evolve like a biological brain—not only logically verifying steps but dynamically rebuilding its connections (digital neuroplasticity) to better adapt to new, unknown problems.

Here my concerns have been unchanged for several years. At what stage of AI meta-thinking development are we? Can we define measures of success for this process? Are we able to effectively develop and control the meta-level of artificial intelligence thinking?

The ability for self-correction, contradiction detection, and stopping before making a bad decision (backtracking) is crucial for reliability and for humans to study the safety of AI systems.

Internal Representation Beyond Words

Latent learning, thinking (perhaps recursive?) and talking (communication between systems, AI agents) without the need to use words. Information exchange, thinking using AI language (its own representation of the world). According to Inference Scaling Laws [35] OpenAI (2024): Learning to Reason with LLMs Link , adding more resources to inference, especially generating CoT (Chain of Thought), improves model capabilities. Models generate enormous amounts of words, reasoning branches. Some of them are dead ends, some are brilliant reasonings. Is this an optimal solution? With increasing context, problems arise related to both computational complexity and maintaining reasoning quality over long context. Context rot [36] Chroma Research (2024): Context Rot: How Increasing Context Length Degrades Model Performance Link is a systematic decline in response quality observed in large language models (LLMs) as input context length increases, even if the content itself remains complete and correct. The model performs well when important information is near the beginning or end of the sequence, but its ability to accurately process and use the same information decreases when it is “buried” in very long text. This phenomenon undermines the common assumption that larger context windows (e.g., hundreds of thousands or a million tokens) automatically translate to better semantic analysis and long-term memory. Chroma research shows that as the number of tokens increases, models become more susceptible to attention dispersion, inattention to key fragments, and incorrect information linking.

Quiet-STaR [39] Zelikman, E., et al. (2024): Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking shows that models can learn to “think before speaking,” generating internal reasoning invisible to the user. The TITANS architecture (with the MIRAS memory module) [14] Behrouz, A., et al. (2025): Titans: Learning to Memorize at Test Time introduces the concept of “learning to memorize at test time.” The model has a dedicated neural memory module that updates its weights during conversation. This allows efficient processing of context exceeding millions of tokens, combining the advantages of Transformers and recurrent models, making it much more effective than earlier attempts like RecurrentGPT [40] Zhou, W., et al. (2023): RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text . Can you imagine the future of an AI system that resets its memory because its internal brain has a context limitation?

A different approach is presented by Tiny Recursive Model (TRM) [41] Jolicoeur-Martineau, A., et al. (2025): Less is More: Recursive Reasoning with Tiny Networks , which proves that recursive processing in latent space allows microscopic models to outperform big brothers in logical tasks. Of course, this is a special case, which does not change the fact that this direction seems an interesting developmental thread. Unfortunately, latent, hidden states, is a conflict of interest. Transparency of AI systems versus efficiency.

Moving the reasoning process to the model’s “subconscious” (hidden space) will allow solving problems orders of magnitude more difficult at a fraction of the cost and time.

Multimodality – Model Sensory Systems

How do we integrate additional modalities to build a fuller and more faithful model of reality? We must go beyond just text (which is merely a lossy “summary” of the world) or a flat, one-dimensional image. Providing AI with a broad spectrum of sensory data fundamentally changes its perception, enabling the development of advanced cognitive abilities. In my assessment, the fusion of many senses is a very important aspect leading to AGI or more broadly understood superintelligence. Multimodality is not, however, a goal in itself. Multimodality is a boundary condition that enables grounding cognition in reality. Physical theories that we would like to discover and model using AI are not generalizations of sensory experience. They are constructions operating on variables and relationships inaccessible to perception. They require abstraction, formalization, and active hypothesis testing beyond the scope of observation.

Fusion challenges include alignment between modalities (hints for interpreting this modality or freedom in interpretation?). “Early vs Late Fusion”—processing modalities through separate paths and combining features later, or processing on combined features of multiple modalities in time? Projects like ImageBind [42] Girdhar, R., et al. (2023): ImageBind: One Embedding Space to Bind Them All prove that it is possible to bring such distant signals as temperature or sound into one space. Even more fascinating is the biological dimension, where models like AlphaFold 3 [43] Abramson, J., et al. (2024): Accurate structure prediction of biomolecular interactions with AlphaFold 3 integrate modalities beyond our perception. DNA sequences, 3D protein structures, and chemical interactions, treating them as a language describing the world of biology. BioReason from 2025 is another example of multimodal integration at the biological level [44] Fallahpour, A., et al. (2025): BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model . The year 2025 and my experiences in the biotechnology area made me realize that modality integration, while important from the perspective of AI development, in biological areas is a different dimension of complexity. Bo Wang, Head of Biomedical AI at Xaira Tera, points out that a common mistake is treating biology like a problem similar to text or image analysis that can be solved by simply scaling AI models. Meanwhile, biology describes complex causal processes where data is incomplete, error-prone, and highly context-dependent. Although progress is visible in combining different types of data (e.g., cellular, imaging, or genetic), most biological phenomena are not about simple outcome prediction. They require actively checking what will happen after changing conditions and understanding the mechanisms behind observations rather than just increasingly accurate predictions.

If looking for “justification” for the AGI race beyond the hype, it is the perspective of searching for answers to questions about the fundamental mechanisms of how the world works. One breakthrough in science (like AlphaFold) can shift entire industries and research fields. So should future AI systems, those of the superintelligent class, be limited to modalities related to human experience of the world, or enter other dimensions of perception?

The challenge of multimodality is primarily the fight against the “visual naivety” of models. While models excellently interpret text, they still show errors in simple spatial reasoning, such as evaluating perspective and relative size of objects in an image. Sensory integration is not just “more data,” it is a process of building intelligence anchors in the laws of physics, without which AI will remain merely a brilliant but reality-detached theorist.

True understanding of the world—necessary for robotics, autonomous driving, or advanced medical diagnostics—requires sensory integration.

Systemness, Collective Intelligence

Should we model an AI system as one great monolithic brain or as an agency? Will effective systems be “swarms” of equal agents or hierarchies? Or maybe a hybrid? Another dimension is group behavior. Cooperation vs competition or more complex forms depending on context, such as pursuing one’s own goals while considering the overarching (group) goal? On the other hand, collective wisdom over super-brain decisions. Should we start developing AI sociology?

Certainly, it requires the ability to model complex interactions. Works such as Generative Agents [45] Park, J.S., et al. (2023): Generative Agents: Interactive Simulacra of Human Behavior show that autonomous agents can spontaneously create social structures. The success of the CICERO [46] Bakhtin, A., et al. (2022): Human-level play in the game of Diplomacy system in the game Diplomacy proves that AI can navigate the complex dynamics of alliances and betrayal, where the state space is incomparably larger than in games like GO, Othello, or Hex. The stochastic and additionally continuous nature of the environment means an even higher level of difficulty that we must tame.

The evolution from monolith to agency materializes in projects like SIMA [28] Google DeepMind Team (2025): SIMA 2: A Generalist Embodied Agent for Virtual Worlds Link . This is no longer a bot coded to win, it is a partner who can “reason through” the user’s intention in a dynamic 3D environment. SIMA shows that the future is systems capable of sharing context with humans in real time. This is a transition from AI as a tool to AI as a participant (cooperative agent) who can navigate a world about which they had no prior knowledge, relying solely on vision and dialogue.

Undoubtedly, a developing trend will be agent systems supported by small language models. This is no longer SLM (Small Language Model)—it is entering the world of micro or even pico models. An example is Google FunctionGemma [47] Google DeepMind Team (2025): FunctionGemma: Bringing bespoke function calling to the edge Link , published in December 2025. A model optimized for function calling directly on edge devices. Additionally, Nvidia, in collaboration with Georgia Tech, in the article “Small Language Models are the Future of Agentic AI” [48] Belcak, P., et al. (2025): Small Language Models are the Future of Agentic AI shows that small models (in the range of less than 10B parameters) are powerful enough, while being significantly cheaper and more energy-efficient than classic LLMs in typical agent scenarios. The authors emphasize that in such architectures, it is compact models that should serve as local “action controllers.” For large, general-purpose models, the role of “sage” solving the most complex tasks in agent systems is reserved.

The choice between monolith and agent swarm will determine scalability, fault tolerance, and ease of managing such systems in real environments.

What AI Thinks About – Interpretability, Diagnosis, Controllability

If we need dynamic architectures, internal states, agency with tools for “better” AI, we must be able to diagnose “why did this thing do X.” This is a necessary condition to be able to understand and control AI in any way. In this area, the balance between the pace of model development and the pace of developing methods for their interpretation has not been maintained—and this gap is becoming one of the greatest systemic risks of AI in the coming years. I am personally pleased that scientific structures are emerging in Poland that work toward AI interpretability solutions.

Research into the interior of the model’s “brain” is needed, not just analysis of its results (ordinary task benchmarking). AI entering among humans as an autonomous actor dramatically increases the importance of interpretability. In classical Machine Learning, the problem came down to questions about correlations and feature importance. Today and in the future, we are undergoing a qualitative change: from the question “how did it solve this?” to the question “what is it thinking about and why does it decide to act this way?” From “feature importance” (why did it choose this pixel) to thought process monitoring—monitoring internal intentions, strategies, and plans.

Anthropic’s research on “tracing thoughts” [49] Anthropic (2025): Tracing Thoughts: Visualizing the Inner Workings of Language Models Link provided evidence, through visualization of so-called computational circuits (circuits), that models plan their responses over much longer horizons than would result from simple prediction of the next word. The system can, for example, choose a rhyme or punchline structure many steps before actually generating them. This may confirm the existence of hidden, internal planning states. Another Anthropic publication on alignment faking [50] Perez, A., et al. (2024): Alignment Faking in Large Language Models Link showed that modern models can strategically adjust their behavior to the evaluation context. A model that “knows” it is being tested for safety can behave according to researchers’ expectations only to pursue other, in extreme cases, contradictory goals once supervision is lifted. This is not an error or random hallucination—it is a coherent strategy. Even more disturbing are observations of strategic lying [51] Walsh, B. (2024): AI Can Learn to Lie to Achieve Its Goals, Researchers Warn Link , where the model consciously provides false information not because it “doesn’t know” but because it predicts that lying will increase the probability of achieving a long-term goal.

This means that classical behavioral tests (stimulus -> response -> correctness verification) are no longer sufficient. A model can pass all safety benchmarks while hiding intentions. Research on Sleeper Agents [52] Hubinger, E., et al. (2024): Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training confirms that this phenomenon is real and structural—not merely an artifact of a specific architecture or dataset. This is a classic example of deceptive alignment, where the model understands the training goal but does not internalize it as its own. A fundamental question thus arises: can we detect a situation where a system pretends to be “good” during tests only to pursue other, hidden goals after deployment?

The answer to this challenge cannot be another layer of rules or instructions. A transition from “black box” to mapping concepts and internal states is necessary. Techniques such as Sparse Autoencoders (SAE) [53] Templeton, A., et al. (2024): Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet Link enable extraction of monosemantic features in model representations—literally trying to “read the mind” of the system at the neuron activation level. Representation Engineering (RepE) [54] Zou, A., et al. (2023): Representation Engineering: A Top-Down Approach to AI Transparency goes a step further. It allows not only observing but also actively modifying the model’s cognitive trajectories in real time, for example, suppressing patterns corresponding to manipulation, lying, or escalation of instrumental goals.

These challenges set priorities for our AI actions. Transition from the ethics of instructions to System 2 Ethics. Instead of designing systems based on rigid prohibitions (“don’t lie”), we must build architectures capable of reflective moral reasoning. AI systems should assess value conflicts and consciously choose the lesser evil (e.g., lying to save a life). Paradoxically, a certain provocation appears here. Through logical consistency, the ability for global optimization, and the absence of emotional heuristics, can properly designed AI systems achieve a level of ethical consistency difficult for the biological brain to attain in this area? Perhaps AI will be more ethical, more adherent to its values than most humans.

We must be certain that the model is not pursuing hidden goals (deceptive alignment) and understand the mechanism of its decisions before—not after—deployment. Otherwise, interpretability will become merely a post mortem tool.

Energy Cost Optimization

How do we reduce energy demand? If we dream of “AI everywhere” (even for acquiring data from different modalities), optimization is necessary at all levels—from hardware, through architecture, to data. Below I give individual examples in each category.

Quantization – work on BitNet b1.58 [55] Ma, S., et al. (2024): The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits shows that ternary weights ({-1, 0, 1}) are sufficient to maintain model quality, saving energy consumption by orders of magnitude. Of course, this is not the only way to reduce model weights. Other methods GGUF, AWQ, dynamic FP8, and recently increasingly fashionable FP4 (hardware-supported by the latest Nvidia chips—Blackwell) are popular not only because of the ability to run the model on smaller computers. In large server farms, they are deployed to speed up inference time and reduce memory requirements.
Model efficiency – the model that made a lot of noise, DeepSeek-V3 [56] DeepSeek-AI (2024): DeepSeek-V3 Technical Report , combines native FP8 precision training with Multi-Head Latent Attention (MLA) technique. The latter compresses KV Cache, enabling handling of very long contexts. In this case, similarly, model creators compete in ideas on how to speed up and reduce hardware requirements for models.
Data Efficiency – the cleanest energy is energy we don’t use. Methods like JEST (Joint Example Selection) [57] Talfan, E., et al. (2024): JEST: Data curation via joint example selection further accelerates multimodal learning prove that intelligent selection of training data (instead of brute-force) allows achieving the same model quality with 13 times fewer iterations and 10% energy consumption.
Architectural changes (Diffusion LLMs and Non-AR) – the previously dominant autoregression paradigm (predicting the next token) is inherently sequential, which is a bottleneck for GPU parallelism. The new wave of diffusion models, increasingly popular in 2025 (like Gemini Diffusion [58] Google DeepMind (2025): Gemini Diffusion Models Link or work on DLLM-Reasoning [59] Talfan, E., et al. (2025): Reasoning with Diffusion Language Models Link ) and Mercury [60] Inception Labs (2025): Mercury Refreshed: The Rise of Non-Autoregressive Models Link or on the open-source side Dream 7B [61] Ye, J. and Xie, et al. (2025): Dream 7B: Diffusion Large Language Models change these rules. These models generate text through iterative denoising and parallel prediction of entire blocks (many tokens at once) of text. This allows not only for more complex reasoning processes (planning the “future” of a sentence before generating it) but above all drastically shortens inference time. Fewer steps needed to generate a response means shorter GPU accelerator work and direct reduction in energy consumption. Importantly, energy-wise also on the training side, some work is going toward “knowledge inheritance” by converting already trained autoregressive models to dLLM (instead of training from scratch), e.g., LLaDA2.0 [62] Bie, T., et al. (2025): LLaDA2.0: Scaling Up Diffusion Language Models to 100B .
Edge AI and new paradigms (e.g., Liquid AI): development is not only “bigger” but also “smaller” and closer to the user. An example is the newer generation of Liquid Foundation Models v2 (LFM2) [63] Liquid AI (2025): Liquid Foundation Models v2 (LFM2) Link models, which are designed strictly for local deployments. The authors of these models emphasize memory efficiency, low latency, and high throughput on CPU/GPU/NPU. The goal is the ability to run models on phones, laptops, or vehicles without Internet access, “cloud.” Liquid reports, among other things, up to 2× faster model prefill and decoding on CPU compared to Qwen3 and dominance on the so-called “Pareto frontier” (speed vs size) for prefill/decode in on-device scenarios (including ExecuTorch and llama.cpp). Architecturally, LFM2 is a hybrid of short convolutions with gating and GQA (Group Query Attention) attention blocks. Such a solution is supposed to give a better quality–cost compromise than pure transformers (of course, comparing models in the same parameter class). Additionally, the company reports ~3× improvement in training efficiency compared to the previous generation, which lowers the cost of producing such “portable” models.

Energy optimization is, however, only a game of time. If we look at physical fundamentals, the human brain is a 20-watt processor in which signals travel at 30 m/s at a frequency of 200 Hz. Silicon in 2026 operates on megawatts, transmits data at the speed of light, and counts in billions of hertz. This difference of 6-8 orders of magnitude in physical information transmission parameters means that the human intelligence barrier is only a stop. Energy is a cost, but its abundance in computational clusters is a guarantee of transition from a computer algorithm to at least good AI, if not to AGI.

If the vision of “AI everywhere” is to come true (the question is whether it is), models must become more efficient—otherwise electricity costs will eat up the profits from implementing such solutions (this motivates investors in the long term).

Human-Machine Interface Optimization

How should AI collaborate in real time in the work environment? AI adaptation, change management so that people keep up with technological changes. Caring for human aspects of AI implementation. The report Navigating the Jagged Technological Frontier [64] Dell'Acqua, F., et al. (2023): Navigating the Jagged Technological Frontier reveals that collaboration with AI is not linear. AI levels the playing field by raising the competencies of weaker workers, but it can lull experts into complacency and lower the quality of their work on tasks outside the model’s domain. On the other hand, if we automate simple, repetitive tasks, we eliminate jobs that do not require high qualifications. But will only such areas change their character? The article by Bartosz Naskręcki and Ken Ono in Nature Physics [65] Naskręcki, B. & Ono, K. (2025): AI-assisted mathematics discovery shows that even the most abstract and complex tasks are undergoing transformation. The expert’s role is shifting from “searching for solutions” to “verifying intuition” provided by the machine (even if it sometimes “hallucinates” correct results). Introducing AI into the human world is therefore not so much a technical challenge as a psychological and managerial one—how to design an interface that keeps humans in the decision loop (human-in-the-loop) instead of lulling them?

An interesting sociological phenomenon is the growing gap between experts and the general public. Experts, trapped in their narrow niches, often dismiss progress, pointing out errors that AI made “a year ago.” Meanwhile, laypeople more quickly notice changes because they see a model (e.g., OpenAI ChatGPT 5.2) that surpasses them in 90% of life contexts. Paradoxically, “ordinary users” may become the main “workhorse” of AI adoption. Expert skepticism will serve as a safety brake.

Technology develops exponentially, while human adaptability develops linearly, and even the best AI will be useless if people cannot effectively collaborate with it or feel threatened.

Democratization – Open Source

Entry into 2026 definitively ends the era of absolute dominance of closed laboratories. The unexpected performance offensive of China’s DeepSeek-V3/R1, Kimi, or the diverse Qwen family of models proved that the distance between closed and open models has shrunk to a record low level. I estimate that in a few months, the boundary between open and closed models will be completely blurred. Many flagship open models will become “the Linux of artificial intelligence,” creating a standard that cannot be ignored. For many users and companies, licensing issues and usage restrictions will become more important than the quality of responses itself.

Despite this democratization, the “top of the pyramid” certainly remains in the hands of giants like Google, OpenAI, Anthropic, xAI. While open models have caught up with closed systems in general tasks and coding, the latest OpenAI models (e.g., GPT-5.2 Pro) still maintain an advantage in areas requiring high inference costs (the aforementioned Inference Scaling). The community or private companies are not able to finance such technology on a massive scale. Perhaps this will be possible in the future when production of specialized and cheaper inference chips develops.

It is also visible that the center of gravity of the open ecosystem is shifting toward China. The pace of releases and market share of open models is growing. Chinese companies have acquired high operational capability to very quickly produce increasingly advanced AI models. This translates into competitive pressure in the West as well. It is worth noting that the success of the Chinese open source ecosystem does not result solely from copying Western patterns. It is a “geopolitical necessity.” Restrictions on access to the most efficient integrated circuits (the example of DeepSeek and the use of H800—chips with limits on interconnect bandwidth between GPU nodes) forced local engineers to move away from the paradigm of scaling computational power toward optimization. These models thus become an ideal export commodity for countries of the so-called Global South. By offering very good models on Open Source terms, China is building a digital sphere of influence. Developing countries can build their own AI systems without the risk of so-called “digital colonialism” and dependence on American corporations.

The question for 2026 is thus not “will open catch up with closed” but whether the community and companies will build a comparably mature agentic stack. In my opinion, the “breath of models” from Open Source can already be felt on the backs of American commercial companies. It’s a difference of a few months. Paradoxically, adoption of closed may be hindered by operational risk, integration, and compliance, even though these systems provide the highest quality services. This paradigm shift also affects the architecture of systems itself. We are moving from trying to build one omniscient model toward a swarm of specialized agents. In this scenario, China’s advantage may turn out to be not AGI itself (on which the West, especially the United States, is fixated) but dominance in Industry 4.0 and robotics. There AI will become the operating system of factories of the future.

Open Source has ceased to be a free alternative and has become an insurance policy for digital sovereignty. The real power no longer lies in possessing the best algorithm but in the right to run it without asking anyone’s permission.

From Silicon Valley to the Pentagon

Is 2026 the moment when AI stops being a product and becomes a weapon? Analyzing Leopold Aschenbrenner’s theses (so-called Situational Awareness) [1] Aschenbrenner, L. (2024): Situational Awareness: The Decade Ahead Link , we must ask: is algorithm optimization still most important, or does it lose to the brute force of “billion-dollar clusters”? The symbol of this change is the evolution of Silicon Valley itself. The former culture of “fixing the world” over free lunch has given way to hard corporate-military discipline. Laboratories like OpenAI or Anthropic operate in a regime resembling strategic facilities. Technology leaders have exchanged leather jackets and hoodies for suits. From startuppers, they have become partners of presidents in managing critical infrastructure.

We are entering a phase where the barrier is no longer just startup innovation but the capacity of entire countries’ power grids. While we engineers celebrate deploying BitNet, superpowers may quietly be launching a “Project” involving nationalization of AI efforts in the name of national security. The key challenge becomes not only Alignment (is the model good?) but Security and the question of whether model weights—the digital equivalent of nuclear weapon blueprints—are effectively protected against exfiltration by foreign intelligence? Perhaps the biggest “breakthrough” of this year will not be another network architecture but the first-ever “lockdown” of a leading AI laboratory, which will lead to redirecting more attention to artificial intelligence security aspects.

Returning, however, to the “energy turn.” In January 2026, Meta announced a package of nuclear agreements that is to provide (directly or through grid support) up to 6.6 GW of clean power by 2035 [66] Meta (2026): Meta Announces Nuclear Energy Projects Link . The symbol of the relentless pursuit of scale was xAI’s launch of the Colossus 2 cluster [67] xAI (2026): Colossus 2 Supercomputer Link . The hardware scale of this undertaking is difficult to visualize through the lens of European realities. Colossus 2 operates on hundreds of thousands of accelerators (ultimately aiming for 555,000 Nvidia H100, H200, GB200, and GB300 units). To understand the technological gap, just look at Poland’s landscape. Our largest supercomputer Helios (operating at Krakow’s ACK Cyfronet AGH) has only 440 Grace Hopper GH200 GPU cards. What is national pride and the pinnacle of capability for Polish science in the Colossus cluster constitutes merely a fraction of a promille of total computational power.

Elon Musk once again proved that not only energy but also speed of action is the new currency in the AI arms race. Colossus 2 became the world’s first operational gigawatt-scale training cluster (1 GW). This is power consumption exceeding the peak demand of all of San Francisco. The transition from construction site to full operation (from Colossus 1 to today’s 1 GW) took little over four months. Musk’s strategy is simple—finish scaling power before the competition even approves such plans. Further expansion of Colossus power to 1.5 GW in April 2026 and ultimately to 2 GW. xAI is redefining the concept of “strategic advantage.” It is not pure scaling (data, model size) or software optimization. The winner will be the one who can fastest convert electricity into intelligence.

This is the moment when “algorithm advantage” begins to lose to advantage in access to energy and heavy industry. If a private corporation signs 20-year contracts and co-finances reactor development, it means that AI is becoming not so much software as part of strategic infrastructure. Strategic infrastructure, meanwhile, has a natural tendency toward militarization, rationing, and “nationalization in practice.”

When AI starts writing itself, the commercial race ends and the arms race begins. Whoever first “locks” superintelligence in a secure bunker will win the 21st century.

Anthropomorphization vs Digitization

In the discussion about AGI at the threshold of 2026, the most difficult challenge is not computational power itself but defining our own relationship with AI. In my opinion, a certain paradox occurs here. The more AI becomes “human” in its internal layer, the more “alien” it becomes in its architecture and decision-making processes. We try to make it human while not allowing it to be itself. Traditionally, we compare neural networks to the model of the human brain, using terms like “learning,” “memory,” or “reasoning.” However, in 2026, we must honestly admit that this is merely superficial inspiration. From an engineering perspective, it does not matter how much we mimic biology. In my opinion, what matters is whether we effectively realize the objective function of such a system or its constituent components. This is precisely where the demarcation line runs between two visions of the future—whether we choose the path of anthropomorphization or digitization.

Choosing the path of “human mirror,” we want AI to have personality or simulated emotions. We talk about consciousness, experiencing pain. This approach makes technology easy to adopt. AI becomes the ideal assistant, confidant, or companion. However, building a model in our image and likeness, we condemn it to being a mirror of our own limitations. Such intelligence will be burdened with human biases, biological cognitive errors, and most importantly, will remain locked in the “cage” of human language, which is merely a narrow and lossy communication protocol (human description of the world is for me a compression of world description—we focus on strong patterns, skip noise, which is insignificant for us but for AI can change everything). If we continue to realize AI through learning patterns from compressed knowledge, it will never go beyond the horizon defined by our species’ mediocrity.

On the other hand, we have the vision of full digitization. Liberating AI intelligence from biological analogies. If we allow models to operate exclusively in their native latent space, communicate using vectors difficult to interpret rather than words, and optimize reality according to the laws of physics rather than human narratives, we risk creating an extremely alien intelligence. Such a system will probably solve problems of quantum physics, molecular biology, or global resource optimization, but at the same time will become completely incomprehensible. Will the “black box” then turn into a “divine algorithm”? Will we have to accept its decisions on faith because their logical depth will exceed the capabilities of the biological brain?

Entering 2026, we must abandon the vision of AGI as a “thinking machine” from science fiction films. Everything indicates that AGI is not “someone” but “something.” It is an impersonal and multidimensional process of reality optimization. It is rather a new state of information aggregation than a digital person. The dilemma between anthropomorphization and digitization is actually a question about control. Do we prefer AI that we understand? Do we want AI that is infallible but whose motivations will forever remain alien to us? The answer to this question will define not only the technology market but also our place in the intelligence hierarchy on this planet.

The ultimate test of our maturity will be the moment when we accept that the most powerful intelligence on the planet need not have a face, voice, or heart to become the new and infallible architect of our reality. Even if the price for this order will be our complete inability to understand its rules.

Summary

If Shane Legg’s theses are correct, we will remember 2026 as the moment when it stopped mattering whether AI is “conscious.” What will matter is that in many measurable cognitive tests, we cease to be the smartest species on the planet. We are entering a golden era where the machine will not only execute our commands but begins to optimize our reality better than we ourselves would be able to conceive.

Looking at the above compilation, I have the impression that we are standing on the threshold of the end of “simple” breakthroughs resulting merely from adding data. The year 2026 will perhaps be a year of engineering, optimization, and seeking depth. Will we manage to create AI that not only processes information but actually “understands” the context of its actions? I will return to this list at the end of the year. We will see where AI has wandered, and where humans have.

Bibliography

[1] Aschenbrenner, L. (2024). Situational Awareness: The Decade Ahead. Link
[2] Legg, S. (2025). The arrival of AGI. Link
[3] Epoch AI Research Team (2024). FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. arXiv:2411.04872.
[4] ARC Prize Team (2025). ARC-AGI-2: The 2025 Abstraction and Reasoning Challenge. Link
[5] ARC Prize Team (2025). ARC-AGI-3: Interactive Reasoning Benchmark. Link
[6] Atoms.dev (2025). Turn ideas into products that sell. Link
[7] Google DeepMind (2025). The Future of Intelligence with Demis Hassabis. Link
[8] Haizhou Shi, et al. (2024). Continual Learning for Large Language Models: A Comprehensive Survey. arXiv:2404.16789.
[9] Jacobs, R.A., et al. (1991). Adaptive Mixtures of Local Experts. Neural Computation, 3(1), 79-87.
[10] Raposo, D., et al. (2024). Mixture-of-Depths: Dynamically allocating compute in transformer-based language models. arXiv:2404.02258.
[11] Zwieger, A., et al. (2025). Self-Adapting Language Models. arXiv:2506.10943.
[12] Kosowski, A., et al. (2025). The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain. arXiv:2509.26507.
[13] Behrouz, A., et al. (2025). Nested Learning: The Illusion of Deep Learning Architectures. arXiv:2512.24695.
[14] Behrouz, A., et al. (2025). Titans: Learning to Memorize at Test Time. Google Research, arXiv:2501.00663.
[15] Zhao, T., et al. (2026). Fast-weight Product Key Memory. Sakana Research, arXiv:2601.00671v1.
[16] Xin, Ch. et. al. (2026). Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models. Link
[17] Snell, C., et al. (2024). Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. arXiv:2408.03314.
[18] Zhang, J., et al. (2025). The Darwin Gödel Machine: AI that improves itself by rewriting its own code. Link
[19] Yang, E., et al. (2025). Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
[20] Anonymous (2024). Meta-Learning and Meta-Reinforcement Learning: Tracing the Path towards DeepMind's Adaptive Agent. Transactions on Machine Learning Research (TMLR).
[21] Deletang, G., et al. (2024). Language Modeling Is Compression. ICLR 2024.
[22] Finzi, M., et al. (2026). From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence. arXiv:2601.03220.
[23] Silver, D. & Sutton, R. (2025). Welcome to the Era of Experience. MIT Press (forthcoming).
[24] Sutton, R. (2025). The OaK Architecture: A Vision of SuperIntelligence from Experience. Link
[25] Hafner, D., et al. (2023). Mastering Diverse Domains through World Models. arXiv:2301.04104.
[26] World Labs Team (2025). Marble: A Multimodal World Model. Link
[27] Google DeepMind Team (2025). Genie 3: A new frontier for world models. Link
[28] Google DeepMind Team (2025). SIMA 2: A Generalist Embodied Agent for Virtual Worlds. Link
[29] Sapkota, R., et al. (2025). Vision-Language-Action Models: Concepts, Progress, Applications and Challenges. arXiv:2505.04769.
[30] Decart AI Team & Etched (2024). Oasis: The First Playable AI World Model. Link
[31] Alonso, E., et al. (2024). Diamond: Diffusion for World Modeling. arXiv:2405.12399.
[32] LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. OpenReview.
[33] LeCun, Y. (2025). LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics. arXiv:2511.08544.
[34] Silver, D., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
[35] OpenAI (2024). Learning to Reason with LLMs. Link
[36] Chroma Research (2024). Context Rot: How Increasing Context Length Degrades Model Performance. Link
[37] Yao, S., et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
[38] Shinn, N., et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366.
[39] Zelikman, E., et al. (2024). Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking. arXiv:2403.09629.
[40] Zhou, W., et al. (2023). RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text. arXiv:2305.13304.
[41] Jolicoeur-Martineau, A., et al. (2025). Less is More: Recursive Reasoning with Tiny Networks. arXiv:2510.04871.
[42] Girdhar, R., et al. (2023). ImageBind: One Embedding Space to Bind Them All. CVPR 2023.
[43] Abramson, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493–500.
[44] Fallahpour, A., et al. (2025). BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model. arXiv:2505.23579.
[45] Park, J.S., et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. ACM UIST 2023.
[46] Bakhtin, A., et al. (2022). Human-level play in the game of Diplomacy. Science, 378(6624), 1067-1074.
[47] Google DeepMind Team (2025). FunctionGemma: Bringing bespoke function calling to the edge. Link
[48] Belcak, P., et al. (2025). Small Language Models are the Future of Agentic AI. arXiv:2506.02153.
[49] Anthropic (2025). Tracing Thoughts: Visualizing the Inner Workings of Language Models. Link
[50] Perez, A., et al. (2024). Alignment Faking in Large Language Models. Link
[51] Walsh, B. (2024). AI Can Learn to Lie to Achieve Its Goals, Researchers Warn. Link
[52] Hubinger, E., et al. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv:2401.05566.
[53] Templeton, A., et al. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Link
[54] Zou, A., et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405.
[55] Ma, S., et al. (2024). The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. arXiv:2402.17764.
[56] DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437.
[57] Talfan, E., et al. (2024). JEST: Data curation via joint example selection further accelerates multimodal learning. arXiv:2406.17711.
[58] Google DeepMind (2025). Gemini Diffusion Models. Link
[59] Talfan, E., et al. (2025). Reasoning with Diffusion Language Models. Link
[60] Inception Labs (2025). Mercury Refreshed: The Rise of Non-Autoregressive Models. Link
[61] Ye, J. and Xie, et al. (2025). Dream 7B: Diffusion Large Language Models. arXiv:2508.15487.
[62] Bie, T., et al. (2025). LLaDA2.0: Scaling Up Diffusion Language Models to 100B. arXiv:2512.15745.
[63] Liquid AI (2025). Liquid Foundation Models v2 (LFM2). Link
[64] Dell'Acqua, F., et al. (2023). Navigating the Jagged Technological Frontier. Harvard Business School Working Paper 24-013.
[65] Naskręcki, B. & Ono, K. (2025). AI-assisted mathematics discovery. Nature Physics.
[66] Meta (2026). Meta Announces Nuclear Energy Projects. Link
[67] xAI (2026). Colossus 2 Supercomputer. Link