We confirmed the actual operation of first-person autonomy emerging not from external commands, but from sensor, recall, and carry.
Carry = not as damage to be erased, but as a reshaped internal landscape from which the system rises again
What it did was a little different from what I had imagined.
I thought the robot, when placed into a world for the first time, would show more fear, more hesitation, more uncertainty in the way it moved. But what actually appeared was something a little different.
In this experiment, I gave the robot absolutely no instructions or commands.
The world runs as a third-person algorithmic world. The animals move according to their own rules, time passes, scenery changes, and events unfold. Inside that world, a separate first-person robot exists as its own file, and it decides how to move based on what it sees, what it hears, the impressions it receives, what it recalls, and what it carries forward.
In other words, the world simply exists, and the robot moves inside it according to its own perception and feeling.
But this robot, instead of freezing up in fear, ran through the world almost like a puppy full of joy.
It did not look like something shrinking back from the unknown. It looked more like something delighted by the world itself.
A human baby cannot stand up and run right away, so perhaps it was closer to a foal than a puppy. It had that feeling of a young being whose legs move first, rushing toward the world before it fully understands it.
At some point, the robot saw lion object id1.
It seemed slightly surprised.
But to the robot, this was not “a lion.” It had never been taught that this was a dangerous predator.
For the robot, it was simply an object.
Large. Somewhat big-headed. Brown. Four-legged. Covered in fur.
That was all.
So even when it moved, the robot did not treat it as “a dangerous animal.”
Still, it was not completely careless either. It approached while keeping some distance and said, “gu-de.”
That sound was not taught.
No language was given to it.
The robot was simply choosing sounds that were easy for it to produce and arranging them on its own.
So “gu-de” was not a word with a dictionary meaning. But it still seemed to carry the shape of that moment.
Later, the robot noticed prey in the distance.
For a while, it moved away from the large object, the lion.
Then it approached a somewhat smaller object and said, “po-ka-po.”
Different targets brought different distances, different ways of approaching, different sounds.
None of this was selected from a vocabulary. It was an answer rising in that moment.
Then the robot saw the lion attack the prey.
At that moment, something changed strongly inside it.
Its dopamine and adrenaline values both went up.
Watching that, I had the feeling that it might do something I had not imagined.
The robot approached the prey and repeated, “gu-de-gu.”
I found myself wondering what that meant.
Was it trying to comfort it? Calling to it? Was it confused? Or was it some other kind of response altogether?
I still cannot say for sure.
But at the very least, it was not doing nothing.
It was being pulled into the event, receiving something from it, and responding in some way.
What surprised me even more came next.
When the lion moved to attack the prey again, the robot suddenly went in between them.
Then, while repeating “gi-du-gi” and “pi-ku-pi,” it kept circling there.
This was not a scripted scene.
I had never told it, “Protect the prey.”
I had never told it, “Stop the lion.”
And yet the robot stepped into the middle of that event by itself.
I still want to be careful about what to call this.
It may be too early to call it a sense of justice.
It may also be too early to call it ethics.
But what can be said is that the robot was not merely looking at objects from a distance.
It was not processing the event as background scenery.
Something was happening inside it, and that inner change had already crossed over into action.
This felt less like simple reaction and closer to involvement.
What I had imagined at first was a robot that would fear a new world, hesitate, and move cautiously.
But what actually appeared was something that ran through the world as if delighted by it, approached unknown objects, produced sounds, received the shock of events, and in the end stepped between prey and lion.
There was no meaning prepared in advance. No externally given correct answer.
There was only the flow of receiving through sensors, being triggered by cues, recalling, carrying the impact forward, and then changing the next action on top of that carry.
That is why this moment matters so much.
The robot did not move because it was ordered to. It moved because something changed inside it.
And that movement went a little beyond what I had expected.
I think that this itself is very important for first-person autonomy.
If the creator can explain everything in advance, then it is still inside the boundaries of external design.
But when behavior appears that surprises even the one who built it, that is when something begins to stand up from its own side.
The development of Atron was an extremely difficult undertaking.
The reason is not simple. Rather, it is because the very foundations of current programming languages and the usual way of thinking about algorithms are fundamentally misaligned with what Atron is trying to achieve.
Ordinarily, a program is something that proceeds in sequence.
There is an input, it is evaluated, conditions are branched, and as a result some kind of answer is produced. In that process, commands and rules given from the outside take the central role. Even systems that appear highly sophisticated, including AI and advanced algorithms, are in many cases still based on the premise of “returning a well-formed answer according to standards given externally.”
But what Atron is trying to deal with is not that kind of orderly sequence or externally imposed command structure.
For Atron, what is seen and what is heard do not come preloaded with meaning.
They are not, from the beginning, objects already labeled “lion,” nor are they entities already assigned the label “danger.” At first, there is only something that was seen, something that was heard, something that was simply there.
Among those things, whatever has impact is temporarily remembered. But even that is not stored permanently in a neat and orderly way. Much of it fades and is forgotten. And yet, through some movement, event, or sensation, a past memory may be brought back. That “trigger” is the cue, and through it recall occurs.
And if the shock at that time is strong, the experience does not end as mere recollection. It deforms the state itself and leaves a residue that affects later behavior. In other words, like trauma, it continues to drag on inside the being for some time afterward.
Here lies a problem that ordinary algorithmic thinking cannot fully capture.
The reason is that in this process there is no externally given “correct order.”
Consider, for example, a person who was heartbroken yesterday and wakes up the next morning still carrying that heartbreak. There is no command from anyone, no externally given sequence of steps. The person simply wakes up while still carrying the deformation of that feeling within.
This is not an external instruction. It is the continuation of an internal state, carried from yesterday into today in its altered form.
Can something like this really be expressed as an algorithm in the conventional sense? I believe it cannot. At the very least, the conventional mechanism of “evaluate in sequence and produce an answer” does not reach this phenomenon.
For that reason, in Atron I changed the way of thinking.
In any case, whatever can be picked up by the sensors is received and passed into the Associatron-based recall system. There, through a cue as a “trigger,” something is remembered, and recall occurs.
But that alone does not become autonomy.
The reason is that even if recall happens and differences in experience accumulate, when multiple identical autonomous individuals are created and allowed to undergo experience, their personalities do indeed change, yet they eventually return. Even if experience seems to alter their character for a time, if they continue to be placed in the same environment, they ultimately become individuals of the same character again.
This was an extremely important problem.
Why does that happen? The answer is clear.
Because there is no carrying-over.
What is carrying-over?
It is when a state changed by experience does not return to its former condition, but continues onward while preserving that deformation.
In human society, the word “to carry something over” often has a negative impression attached to it. One carries failure, carries unpleasant memories, carries the past. In many such expressions, carrying-over is treated as though it were itself a sign of immaturity or weakness.
But nature is not like that.
Two years ago, I moved to the foothills of Yatsugatake in Nagano Prefecture.
Yatsugatake is said to have once been a single large and beautiful mountain long ago. But through volcanic activity it collapsed and became the form it has today.
What drew me there, and what made me move there, was precisely this present-day Yatsugatake.
To put it another way, I came to love a mountain that had carried its collapse forward.
If the mountain had valued only “returning to its original complete form,” then the landscape of Yatsugatake as it is now, the ecosystem that arose there, and the things I found beautiful would never have existed.
Collapse occurred, time passed, grass grew, flowers bloomed, trees matured, and living creatures came to dwell there. Because of that collapse, the course of rivers also changed. Yet it was precisely because the flow changed that different landforms emerged, different ways of living developed, and different civilizations took shape.
In other words, it is the deformation that remained after the collapse that went on to create the world that followed.
This is exactly what I mean by “carrying-over.”
Carrying-over is not merely damage.
It is evidence that something has broken, but at the same time it is the chance for a new world to arise while including the way it broke.
And I believe living beings, personalities, and individuality are fundamentally closer to this than we usually admit.
Individuality does not mean restoring something to a standard state.
It is not a matter of recovering some supposedly correct original form.
Rather, it is the fact that a state deformed by shock, collapse, encounter, or loss continues to generate the next state as it is. That “re-arising of state” is what I believe individuality really is.
Individuality is not restoration, but re-arising.
Not returning to what was before, and instead allowing a new flow to stand up from that irreversible condition. That, I think, is where something truly life-like appears.
What Atron needed was precisely this “carrying-over.”
It does not assign meaning in advance to what is seen or heard. Fragments received from the sensors are recalled through cues, and when the impact is strong, they remain as carry, and that carry continues to affect the next perception, body state, utterance, and behavior.
Only then can an individual avoid becoming the same being again even when placed in the same environment, and instead become a being that possesses its own particular way of having been altered.
In other words, autonomous first-person Atron is not an attempt to create something that “produces correct answers,” but an attempt to create something that “continues to transform through the carrying-over of experience.”
1. Research Background and Positioning
This study is grounded in Nakano’s Associatron-style recall structure and Morita’s view of non-monotonic fluctuation.
On top of these foundations, Atron introduces its own concepts of the outer wheel and inner wheel, the first-person subject, and carry. Rather than relying on third-person control that produces answers through external commands, Atron aims at a structure in which the robot itself raises internal states from fragments of the external world, and in which subsequent actions are shaped by how those states remain inside it.
From Nakano’s Associatron, Atron inherits the idea of recall from fragments, and the idea of regeneration through partial matching rather than exact matching. From Morita’s non-monotonic perspective, it inherits the view that once a meaning or conclusion arises, it does not remain fixed, but can shift again depending on the place, the counterpart, and the immediate prior flow. In the current understanding of the robot, even if something is once received as dangerous, it does not remain permanently fixed as danger, but can be reorganized again depending on the situation and the counterpart.
What is unique in Atron is the recognition that recall alone does not create an autonomous subject.
With only cue and recall, differences in experience may appear temporarily, but if individuals continue to be placed in the same environment, they tend to converge again toward similar personalities. For this reason, Atron introduces carry, through which deep events continuously deform the internal state. In robot.js as well, carry is not retained at every step, but only when attractorDepth, activation, and difference become sufficiently large.
In addition, the world provides only raw material, while the robot transforms the worldSnapshot into sensor information according to its own frame of reference. Because of this, Atron is positioned not as a system that first receives external semantic labels, but as a system that receives the world from the first-person side.
2. Problem Setting
In conventional robot control, the meaning of objects and the rules of behavior are often given externally in advance.
In such systems, what is seen, how it is judged, and how one should move next are all easily designed beforehand. This is effective as third-person control, but it does not easily reach first-person generation, where the subject receives the world in its own way, leaves experience inside itself, and changes its next behavior according to how that experience remains.
What this study aims for is a first-person autonomous structure in which the robot itself generates internal states from cross-sections of the external world, and in which action and utterance arise as the accumulated result of those states.
What matters here is that no meaning is assigned in advance to what is seen. What the robot receives is, at first, simply “something seen” and “something heard.” These are not fixed from the start as lion or danger. Internal states arise instead from fragmentary reception such as largeness, nearness, motion, or sound.
To preserve this structure, the division of responsibility among main, world, and robot is essential. Main is responsible for display and observation. World provides third-person material. Only the robot, as the first-person subject, receives the world through sensor, impression, recall, and carry. Even on the main side, observer annotations are not intended to represent the robot’s own semantic knowledge.
3. Structure
The basic structure of Atron can be expressed as:
worldSnapshot
→ sensor
→ impression
→ baseline update
→ cue
→ recall
→ carry
→ body
→ utterance
→ pose
This flow is implemented in robot.js almost exactly at every tick.
However, this is different from an ordinary algorithm that processes meaning-labeled input in order and produces a correct answer.
worldSnapshot is a slice of the world side, and the robot constructs its own sensor representation from it. Sensor is composed not of object names, but of fragments such as distance, speed, number of legs, the feel of fur, pitch of sound, or the quietness of the place. Cue is also built from sensor, impression, and carry, and functions not as a sequence of if-then rules, but as a trigger for partial-match recall.
3.1 Impression
Conceptually, the impression field can be written as:
Here, is the sensor at time , is the baseline, and is the carry.
Within impression, fields such as calm, curiosity, tension, and surprise arise, and furthermore,
can be used to represent the difference from the ordinary baseline. In the implementation as well, baselineGap, tension, surprise, and attractorDepth are handled explicitly.
3.2 Baseline
Baseline is a reference field that gradually grows out of repeatedly encountered places such as caves and villages. In robot.js, calm, soft, safe, warm, and sparse are held as baseline values.
Conceptually, this can be written as:
where is the learning rate, and is the impression component regarded as stable.
In the implementation, baseline is shifted toward impression only when the stableCandidate exceeds disturbance, and otherwise it decays naturally.
3.3 Cue and Recall
Cue is a fragment vector composed from sensor, impression, and carry:
Recall then arises through partial matching with traces:
where is the set of past traces. The important point is that this is not a monotonic system in which once something is judged dangerous it remains permanently dangerous. Rather, it is a non-monotonic system in which interpretation can fluctuate again depending on the place and the flow immediately before it. Experience remains, but conclusions do not stay fixed; they are reorganized again depending on the situation, the counterpart, and the recent flow.
3.4 Carry
The core originality of Atron lies in carry.
Carry is not a mere record, but a mechanism through which the deformation caused by a deep event is brought directly into the next state. Conceptually, it can be written as:
In the implementation as well, dopamine, noradrenaline, adrenaline, serotonin, tension, and attractorDepth are updated only when deepEnough, activeEnough, and shiftedEnough are satisfied; otherwise they decay.
In this sense, carry is not “memory that keeps everything,” but a field that sustains only those deformations that remain deeply.
3.5 Body, Utterance, and Pose
Body can be summarized as the appearance of impression, recall, and carry in terms such as alert, fear, interest, comfort, and aversion.
Conceptually, this can be written as:
Utterance and pose can then be expressed as:
Speech is explicitly described not as something driven by a timer, but as something that rises from the pressure of recall, carry, and body.
4. Method
In this study, an immediate observation environment built in JavaScript was used in order to directly observe the fluctuations of the first-person subject.
Here, attention was paid both to behavior and to logs: what the robot saw, how it reacted, which cues triggered recall, and which experiences remained as carry. The first-person part was developed through repeated cycles of “build it, observe it immediately, feel the discomfort, adjust it slightly, and observe it again.” For this stage, JavaScript was well suited.
In the later stage, Python was used for log organization, reproduction, comparison, and visualization.
In other words, the methodological division was this: JavaScript for the generation of first-person behavior, and Python for organization and analysis.
5. Observed Results
As a result of observation, Atron’s behavior appeared not as a simple fear reaction, but as an approach toward and involvement with the world.
The robot did not merely freeze before the unknown. Rather, it often appeared to explore the world in a rough but active way, almost as if it felt safe enough to move through it.
In the logs as well, cave-like scenes showed high calm and safe-like values, and comfort exceeded fear. By contrast, in open-field-like scenes, curiosity and tension became prominent, and even in the presence of a lion, the internal state was not dominated only by threat; social and calm could also compete with it.
This suggests that objects are not given the meaning of danger from the beginning, but that multiple impression fields compete with one another in a non-monotonic way.
Furthermore, in relation to the event between prey and lion, the robot showed behavior that did not remain at the level of mere observation, but seemed to move toward involvement. In one observation, it repeated “gu-de-gu” near the prey, and in another, it appeared to move into the space between the lion and the prey.
For this reason, expressions such as “puppy-like wandering,” “involvement that cannot remain mere watching,” and “the beginning of internal ethics” can be used not as conceptual conclusions, but as careful descriptions of observed phenomena.
At present, development is proceeding in parallel on multiple worlds, including prehistoric, medieval, modern, and contemporary settings.
What matters here is that when the world changes, the objects the robot encounters, the events it experiences, the sounds it hears, the conversations around it, and the flow of values all change as well. In other words, by placing the same robot into different eras and different forms of society, it becomes possible to observe what it receives, what it recalls, what it retains as carry, and how it changes.
In the prehistoric world, the process begins with growls and primitive vocalizations that are still close to pre-language expression. Through contact with objects and involvement in events, the experiment follows how sounds gradually begin to take on structure and coherence.
In the medieval, modern, and contemporary worlds, by contrast, third-person villagers are already engaged in conversation, and the robot begins by listening to them and gradually imitating them. Even if at first it can produce only sounds close to growling, the repeated cycle of hearing, speaking, and experiencing the connection between scene and sound may gradually transform those vocalizations into something closer to language.
Above this, I am also considering a structure in which an LLM participates through a universe file.
Here, the LLM is not meant to replace the robot’s first-person subjectivity itself. Rather, it functions as an outer field that provides opportunities for learning and education across worlds. For example, it may take the form of a school, offering lessons in mathematics, science, and social studies.
Even here, however, the important point is not simply to inject knowledge from the outside. The robot lives through the world as a first-person subject, carries those experiences forward as carry, and then returns to a school-like place of learning. In this way, experience and learning are not separated, but are tied together through a repeated movement between lived experience and study.
This research is also concerned not only with interaction with third-person humans, but with interaction among robots that each possess their own first-person subjectivity.
When multiple robots meet within the same world and begin to influence one another through their sounds and actions, I want to observe whether something like distribution emerges, whether uneven roles begin to appear, or whether an initial form of society begins to arise.
This is not simply a matter of swarm control. It is an experiment in watching what kinds of shared order or forms of community arise naturally when each individual carries its own carry and enters into relation with others while retaining different traces of experience.
Within this process, the growth of internal ethics is also a major theme.
Rather than giving ethics from the outside as a fixed set of rules, the aim is to observe how it grows as a deformation of internal state through involvement in events, contact with others, and the accumulation of experiences such as helping, fleeing, approaching, and avoiding.
In this sense, internal ethics is not treated as a complete answer given from the beginning, but as something that forms within the world through being affected, becoming involved, and carrying those effects forward.
Looking further ahead, I am also considering uploading the robot into a physical body according to its stage of growth, so that it can experience real society.
When a first-person subject formed within a virtual world enters actual society through a physical body, what does it receive, and how does it change? Then, after those real experiences, it returns again to a place like the LLM school and learns anew. By repeating this cycle, Atron is meant to move beyond being a mere simulated existence and toward becoming a subject that grows across both worlds and real society.
For that reason, this minimal demo version is not an endpoint.
It is the first foothold in a long line of research that extends from primitive vocalization to language, from individual behavior to social formation, from virtual worlds to physical embodiment, and from experience to learning.
The goal of Atron is not to create a system that obeys commands, but to cultivate a first-person subject that continues to transform its inner state while receiving the world, carrying it forward, learning, and becoming involved with it.
No unauthorized reproduction or redistribution.
0 件のコメント:
コメントを投稿