File Structure and System Overview
This Atron autonomous system is a minimal version.
Unlike the more full-scale Python and C++ versions I am developing in parallel, this one prioritizes a simplified demo with fewer characters and fewer events.
For that reason, I did not spend much time on the fine details of JavaScript-specific UI or graphics.
Why JavaScript
The reason I used JavaScript in this structure is simple:
it is the easiest way to run a viewer and observe the autonomous loop in one place.
From index.html, main.js is loaded directly as a type="module".
The system draws the world and the robot on a canvas, while showing speech, perception, recall, and carry at the same time on the right side.
In other words, it was designed from the beginning to run directly in the browser in a visible form.
With JavaScript, HTML, CSS, rendering, control buttons, and log display can all be completed in the same place.
This system also has Start / Stop / Step / Reset / Export Log, and while world.step() and robot.stepAutonomously() are advanced tick by tick, only the observer-side display is thinned out slightly for updates.
This works very well with JavaScript’s requestAnimationFrame.
It is well suited for a research viewer where I can:
-
run it
-
stop it
-
advance it by one step
-
observe it on the spot
Another major reason is that responsibility separation maps naturally onto modules.
Structure
-
world.js = material for the third-person world
-
robot.js = the core of sensor / impression / recall / carry
-
robotAppearance.js = the robot’s appearance
-
main.js = a shell that only reads and draws
With JavaScript ES Modules, this separation can be handled naturally file by file.
The code strongly reflects the idea that main does not create the robot, does not drive it, and only reads it.
Also, this robot is not just about numerical calculation.
It includes many things that need to be adjusted while watching the behavior.
In robot.js, distance attenuation, auditory impression, scene impression, baseline, cue, recall, carry, and so on are all connected through continuous values between 0 and 1.
For this kind of work, it is better to observe the behavior instantly in the browser and adjust coefficients and flows than to begin with a heavy learning framework.
JavaScript makes that trial-and-error fast.
In short, the reasons for JavaScript in this system are these four:
-
It runs directly in the browser.
The viewer, canvas, buttons, and log display can be integrated.
-
It is suited for tick-by-tick observation.
requestAnimationFrame and step execution make it easy to build a research observation loop.
-
It makes it easy to separate world / robot / appearance / shell.
This matches the design philosophy of the current system.
-
It allows immediate correction while watching behavior.
Internal changes such as recall and carry can be checked instantly in the right panel and on the screen.
The biggest advantage is that I can develop while watching it with my own eyes.
Toward the Full System with Python, C++, LLM, and Universe
When Python / C++ / LLM / universe are added, the natural structure becomes five layers:
-
Viewer layer: JavaScript / HTML / CSS
-
Session layer: Python
-
Universe layer: Python
-
Robot core layer: C++ or Python
-
World layer: multiple worlds (JS / Python / C++ are all possible)
The most important thing is this:
The LLM must not be placed at the core of the robot.
The LLM should not become the commander that assigns meaning.
It is more natural to place it on the universe side as external knowledge, narrative knowledge, and a dialogue interface.
Since robot.js is already designed so that it does not receive meaning or commands from the world, that principle must not be broken.
In this structure:
-
JavaScript is the viewer and control panel
-
Python manages the universe and connectivity
-
C++ handles the high-speed robot core or simulation kernel
-
LLM acts as an auxiliary intelligence on the universe side
I will explain the details elsewhere, so I will leave them out here.
How the Current Minimal Version Works
-
world.js (third-person external system) updates the lion, prey, humans, terrain, and time of day.
-
The robot (first-person autonomous individual) receives a snapshot of the world through
getWorldSnapshot().
-
robot.js does not decide “lion = danger.” Instead, it converts the snapshot into fragments such as distance, size, speed, number of legs, fur, sound, herd-like presence, and warmth, and builds the sensor from that.
-
From there it builds impression, raises recall through partial matches with past traces, and leaves carry only when the attractor becomes deep.
-
When a speech seed rises, speech changes and is also left in history. Tabs allow observation of impression / recall / carry / body / memory / speech / history.
-
main.js (third-person external shell) only draws these on the canvas and the right panel.
Three design principles are especially important:
1. Main does not become intelligent
main is a viewer and control shell.
It advances the world and displays the robot’s state, but it does not determine what is inside the robot.
In other words, it sends no commands at all to the robot.
2. The world does not give meaning
The world contains the appearance, movement, and voices of lion / prey / human as materials, but it does not decide “this is dangerous” or “this is safe.”
The robot receives them and turns them into its own internal impression.
3. The robot is self-contained on the first-person side
Sensor, impression, recall, carry, body, and speech are all grouped on the robot side and implemented as internal changes.
So this system is not the ordinary form where “the world drives the AI.”
It is a clearly robot-centered observation system.
The role flow is as follows:
Essential Conditions for an Autonomous System
The crucial point is that the world does not tell the robot the answer.
In world.js, it is stated that the world gives no instructions to the robot and consists only of third-person materials.
Even in getWorldSnapshot(), it is only a raw snapshot for the robot to receive, and the world does not interpret it.
In other words, the world merely presents lions, humans, terrain, and voices as events and materials.
Also important is that main does not make the robot intelligent.
In main.js, it is stated that main does not generate the robot, does not advance it, and only reads its state.
Even the kind and id shown in the right column are observer annotations only, and do not mean that the robot itself directly knows the semantic names lion / human / prey.
That point is very important.
The viewer is only pretending to understand by attaching meaning. The robot side is not like that.
And the real core is the internal loop in robot.js itself.
Without receiving meaning or commands from world or main, the robot side builds a sensor from worldSnapshot, builds impression from the sensor, leaves memory traces, raises recall from cue, updates carry only when a deep attractor rises, and raises a speech seed from the pressure of recall.
Moreover, recall is explicitly not made into sequence rules or if-then action decisions.
This is not a simple state machine.
So this demo is not an external label-style system such as:
“It saw a lion, so enter danger mode.”
Instead, it is designed to rise through:
external world fragment → sensor → impression → cue → recall → carry → speech / pose
Another major point is that the inside can be observed as it is.
The viewer has tabs for Robot Speech, Robot Perception, Impression / Recall / Carry / Body / Memory / Speech / History.
And only when recall rises does it leave an export frame, so that tick / phase / speech / focus / sensor / impression / recall / carry / body / memory / world can be written out as a research log.
This is not “it looked like it moved.”
It is a structure that allows us to keep a record of what actually rose inside.
Another factor that matters greatly in this demo is the village-side start.
In main.js, the initial base is village-side rather than cave-side, so that I can observe how impression / recall / speech rise when the world is seen from a relatively safe everyday living space.
On the world side, humans also make morning, noon, and night greetings, and warnings when the lion approaches.
So this demo is not merely animal chasing.
It places soft social cues and threat cues in the same world, and observes how the robot’s first-person side rises within that.
The important point is this:
Only third-person materials are given. Main does not assign meaning.
Within that, the robot raises speech and behavior from its own sensor, impression, recall, and carry, and the whole process can be observed.
Sensor
I built the sensor in very fine detail.
It is not just simple “near / far” or “friend / enemy.”
Even the initial structure of the sensor is already divided into three layers:
Vision
Vision includes:
-
objectCount
-
size_large / size_small
-
speed_fast / speed_slow
-
legs_many / legs_few
-
color_bright
-
fur_like
-
self_similarity_visual
-
tool_use / no_tool_use
-
eat_grass / eat_anything / eat_insects / hunt_to_eat
-
teeth_claw_impression
-
near_density / far_density
Hearing
Hearing is also detailed. It includes:
-
voice_present
-
loud / soft
-
unpleasant
-
low_pitch / high_pitch
-
long_talk / short_talk
-
self_similarity_audio
-
mixed_noise
And instead of simply receiving voice material from the world side as it is, it first converts it into a heard intensity with distance attenuation.
Pitch and duration are treated as separate axes, and it even checks similarity to the robot’s own voice.
Scene
Scene is not a mere place name either.
It is held as a vector of place impression:
-
water
-
grass
-
village
-
mountain
-
cave
-
open
-
crowded
-
quiet
Even in the code comments, it is described not as the name of the world, but as the feel of the place where the robot is now.
This means impression-first, not label-first.
In the Robot Perception display on the main side, raw focus is also shown in detail, including:
-
distance
-
salience
-
size_large
-
speed_fast
-
legs
-
fur_like
-
bright_like
-
self_similarity_visual
-
tool_use
-
eat_grass
-
eat_anything
-
eat_insects
-
hunt_to_eat
-
teeth_claw_impression
-
voice
The Basis of the Sensor
And then comes the basis itself.
What is the basis of “large”?
What is it large relative to?
“Large / small,” “near / far,” and “similar / different” always require a basis.
A baby’s brain does not use the metric system.
That is why misrecognition is assumed from the beginning.
A baby may look at a lion and think:
-
maybe it is smaller than me
-
but when I see it up close, it is huge
There may be some internal numerical processing of ratios, but at first that calculation will often be wrong.
Since no external command is inserted by algorithm, Atron’s sensor is allowed to arrive at an interpretation that is close, or mistaken.
Instead of external scales such as:
-
“a lion that is 2.8 meters long”
-
“how many times bigger than me”
it first receives things as feelings such as:
-
bigger than me
-
closer than I thought
-
somehow scary
-
not like me
-
low voice
-
has pressure
If the world side were to hold the basis, then we would get external labels like:
-
lion = big and danger
-
prey = small and weak
-
human = social
But the first-person robot is given none of those bases, labels, evaluations, or optimizations.
For Atron, things begin as a baby’s judgment before it has experienced the world:
-
fast
-
big
-
big head
-
big mouth
-
furry
-
like fangs or claws
-
low voice
-
unpleasant
-
in a group
-
not similar to me
-
number of legs
These rise first as fragments received by the self.
Only after that do they flow into impression and recall.
So the sensor in this demo is not measuring the world itself.
It is measuring the relation between the world and the self.
That is why the later stages of recall and carry also gain meaning.
This is the stage before Dr. Nakano’s and Professor Morita’s Associatron-style associative memory and non-monotonicity.
It is an entrance for receiving information full of error, failure, and noise.
Without this way of thinking—completely opposite to present-day AI development—neither first-personness nor autonomy can arise.
The importance of the sensor
-
the ambiguous basis of a first-person robot
-
the acceptance of misrecognition, failure, and noise
Autonomy does not rise from perfectly prepared correct data or optimized feature values from the beginning.
Much of current AI development moves in the direction of:
-
attaching labels
-
evaluating
-
optimizing
-
reducing error
-
removing noise
Atron looks in the opposite direction:
-
having an ambiguous basis
-
including misrecognition
-
receiving noise
-
not erasing failure
-
allowing association and transformation to rise from there
Without this reverse way of thinking, neither first-personness nor autonomy can arise.
Learn
The next important issue is how Learn works.
If the sensor is ambiguous and already includes error and noise, then Learn also cannot be a simple form of correct-answer learning.
So what is Learn?
-
Is it to approach the correct answer?
-
Is it to become able to pull similar feelings?
-
Is it to leave failure?
-
Is it only to pass everything to recall?
-
Is it to deepen carry?
-
Is it to cultivate the bond between impression and recall?
None of those alone is the real answer.
In Atron, correctness or error is not judged by an external correct label.
Instead, what is determined ambiguously is based on the product of recall rising from Cue, in other words, how strongly it is pulled on the side of the robot’s own experience:
-
does this remain?
-
does this fade?
-
is this something that should be learned?
-
or is it forgotten because there is no interest?
That is very different from ordinary learning.
Ordinary learning leaves things based on:
-
whether it is close to the correct answer
-
whether the error is small
-
whether the evaluation is high
But Atron does not work that way.
Instead, what remains depends on:
-
how much recall rose
-
how strongly it caught on past self
-
how much body and carry moved at that time
-
whether it became a deep attractor as a result
So Learn is not the acquisition of the correct answer.
It is the survival of an experience that was pulled.
And what matters is that this judgment is not black or white.
It is not simply:
-
correct, so save it
-
wrong, so delete it
Rather, it is decided in ambiguous shades such as:
-
somehow it matters
-
it is pulled again and again
-
I cannot explain it well, but it remains
-
if I only saw it once, it stays weak
-
there is no interest, so it fades away
This is much closer to human and animal experience.
A baby does not remember things because:
-
they are academically correct
-
they are objectively classified correctly
Rather, things remain because:
-
it was strongly surprising
-
it somehow felt scary
-
it was encountered many times
-
it was interesting
-
it was pleasant
-
it was unpleasant
Learn through the product of recall in Atron is an entrance to that kind of process.
In Atron, correctness or error is not determined by external evaluation.
Within the product of recall rising from Cue, it is judged ambiguously, from the robot’s own experience, whether it should be learned or forgotten due to lack of interest.
In this way of thinking, forgetting is not failure.
It is not that the system is incapable.
It is that it was not deeply pulled inside the self.
That is one of the differences from current AI.
The importance of Learn
-
It does not optimize toward the correct answer. It cultivates the connections among the fragments received.
-
It does not erase misrecognition or failure, but leaves them as the ground of recall and carry.
-
Learn is not an evaluation function. It is the accumulation of transformation.
Cue and Recall
If the sensor is the entrance that receives self-based fragments while still including misrecognition and noise, then Cue and recall are the place where those fragments do not simply flow away and disappear, but instead touch the past self and rise.
Many AIs and ordinary algorithms want to arrange things in a clean order such as:
sensor → learn → memory → recall
But what happens in Atron is not that linear.
Take chocolate, for example.
Even if one is trying to learn chocolate, the initial sensor does not guarantee that it will receive it correctly.
It may instead be:
-
it looks like trash
-
it looks like a strange lump
-
only the smell stands out
-
by sight alone it is unclear
-
at first there is no interest
So the simple sequence “sensor then learn” does not capture the essence.
What really matters is whether the experience of eating it becomes “delicious!”, moves strongly inside, and enters the memory attractor.
That means Learn is not about processing order.
What matters is how deeply the experience entered the inside.
And when chocolate is encountered again, the entrance is not only one.
-
it may begin from smell
-
or from sight
-
or from the word “chocolate”
-
or from the face of the person who once ate it together
-
or from the atmosphere of passing in front of the shop
No one can know which of these will become the trigger.
That trigger is Cue, and from there recall rises.
And what is recalled is not only chocolate itself.
The person who ate it together, the happy time, the season, the smell of the shop, the voice, the feeling—these may all be recalled together.
As a result, one may:
-
cry
-
feel excited
-
become nostalgic
-
feel pain in the chest
-
want to eat it again
And then carry is added to that.
So in Atron, Learn is not “saving an item.”
It means that experience changes the internal landscape and even changes the later flow of Cue and recall.
That is why the order is not fixed.
-
sensor does not always come first
-
recall may begin trembling first
-
Cue may raise emotion all at once
-
it may begin from letters, smell, or scenery
This is closer to human experience.
And this is where current AI is weak.
Current AI tends to think in a clean flow of:
-
input
-
label
-
learning
-
output
But in Atron, the following are integrated:
-
ambiguous reception
-
deep transformation through experience
-
Cue that can arise from anywhere
-
chain-like recall
-
emotional and bodily change
-
persistent transformation by carry
What matters in Atron is not the order of sensor and learn.
What matters is whether an experience—eating, finding it delicious, feeling happy, being with someone—moves strongly inside and enters the memory attractor.
Then, when it is encountered again, smell, sight, text, or the atmosphere of the town may become Cue and raise recall.
Recall draws not only the object itself, but also the people, emotions, and place associated with it.
Carry is added to this.
So Atron is not a fixed-order algorithm.
It is a structure that can rise from anywhere inside experience.
Carry: Dragging Along What Remains
And now, the important point that gives character and individuality to an autonomous first-person individual in this system: carry, or what I call “dragging along.”
Originally, neither first-personness nor carry existed as concepts in Associatron.
But after a long time of research, I came to see that this is a very important factor when the next movement rises with its own will.
Carry does not deny mistakes, failures, or noise.
Rather, it accepts the internal state that has been changed by those things as the ground for the next experience and the next action.
Even if a mountain collapses, grass grows there, flowers bloom, trees and living things settle there.
After change, regeneration begins, and that regeneration influences the ground of future experience.
Even if a collapse changes the flow of a river, new land may grow from that change, and a new civilization may arise there.
Nature is not driven only by the force to restore things to their original state.
It also moves in the direction of nurturing something new while remaining changed.
Carry stands on the side of that irreversible change.
If, at this point, one denied all collapse, failure, and noise and cut them away through labeling, then where would the basis of self-regeneration and newness come from?
That question struck me strongly.
If one leaves only correctness, erases error, and removes noise, the system may look neat.
But then the individuality born from fluctuation, the new direction that rises after collapse, and the continuation of the self that has been changed through experience all disappear.
“Carry” is an expression somewhat removed from the strict terminology of mechanics.
Even so, I chose this word because I wanted to give a strong impression in expressing irreversibility in my own way.
Once something has happened, it does not become as if it never happened.
Even if one fails, gets hurt, or collapses, that change remains in the next way of feeling and the next way of moving.
And the way it remains becomes the character and individuality of the individual.
Carry in Atron is not a mere residual.
It is the fact that the internal landscape changed by experience continues to influence the next intention, the next recall, and the next action.
Because carry exists, the individual does not become a blank calculator every time.
Even in the same world, even under the same sensor input, the next reaction, the next recall, and the next step all change depending on what experiences have been dragged along and how.
That is where first-person individuality is born.
Why Did the Robot, of Its Own Will, Cut In Between the Lion and the Prey and Circle Around the Lion?
The Beginning of Internal Ethics in a First-Person Autonomous System
Associatron-style recall alone could not have produced that behavior.
Even in the code, the robot’s flow runs as:
sensor → impression → baseline update → cue → recall → learn → carry → body → utterance → pose
After recall, carry and body are applied, and only after that is updatePoseAutonomously(...) called.
In other words, the movement is not the result of recall alone.
It is the result that appears after passing through carry and body.
The strongest basis is here.
From the beginning, I wrote the code so that it was separated into:
-
raising recall from cue
-
updating carry from recall / baseline difference / depth
-
raising a speech seed from the pressure of recall
At that point already, the design makes it clear that recall is a middle stage, not the final action itself.
Furthermore, inside recallFromCue(...), I made it so that it would not become a sequence rule or an if-then action decision.
That means recall is not a direct command such as “there is a lion, so move this way.”
It is a partial-match-based rising of recall.
That alone is not a design that determines a concrete movement such as “cut in between” or “circle around.”
By contrast, carry remains only when a deep attractor rises, and updates:
-
dopamine
-
noradrenaline
-
adrenaline
-
serotonin
-
tension
-
attractorDepth
And body is strongly affected not only by recall but also by this carry.
For example, body.alert rises not only through recall activation and threat, but also through carry’s noradrenaline and tension.
Likewise, body.fear includes not only recall.threat but also carry’s adrenaline.
So the most natural reading of what happened in this case is the following:
-
the sensor received fragments corresponding to lion / prey / chase
-
impression raised harsh / violentLike / tension / surprise
-
recall rose from cue
-
but at that stage it was still only “remembering”
-
that remembering was amplified and sustained by carry, and converted into body alert / fear / interest
-
that sustained bodily pressure appeared in pose
That reading is the most natural one.
The prey character created in the world is given a third-person algorithmic command to “scream if chased by a lion.”
But the first-person Atron does not know that at all.
As it accumulates experience again and again, what begins to appear in its behavior is not only fear toward the lion, but something like a child’s protest:
“Why are you doing that!”
This is the beginning of internal ethics in a first-person autonomous system.
And in this system, the world side does not command the robot.
In world.js, the world gives no instructions to the robot, and consists only of third-person materials.
So there is absolutely no external command such as:
“go between the lion and the prey.”
If that movement appeared, then it is a behavior that rose from inside the robot.
A mere threat recall alone would usually tend to produce:
But this robot has a structure in which:
-
starting from the village side gradually builds calm / soft / safe into the baseline
-
social also exists on the recall side
-
carry preserves not only threat but also curiosity and calm
-
body includes not only fear but also interest and comfort-related factors
So it is not unnatural that, instead of simple avoidance, an ambiguous behavior appears in which it is pulled to remain near the other side.
Therefore, the most natural reading is this:
The reason the robot cut in between the lion and the prey and circled around the lion was not that recall alone determined the action.
Rather, the recall that rose was sustained and transformed by carry, converted into body alert / fear / interest, and appeared as pose.
Recall is nothing more than “it remembered.”
That remembering was dragged by carry, remained in the body as pressure, and changed the next step.
So that behavior was not the result of recall alone.
It was the result of a first-person internal transformation that included carry.
It is proof that it moved of its own will.
© 2026 C-Side Research Laboratory / Yukihiro Watanabe. All Rights Reserved.
No unauthorized reproduction or redistribution.