アソシアトロン Associatron研究: Atraの聴覚

続いて
Atraの聴覚sensorへ繋げる前の「音の差分パーツ基準器」です。

Atra正式仕様の auditory_deltaではなく、
Atraに後々で渡せそうな、音の差分を圧縮した試作JSONを吐き出します。

recall なし、cue なし、話者識別なし、感情認識なし、文字起こしなしで、波形・スペクトログラム・auditory_delta だけを見る単体HTMLです。
Atraと連動したときにsensor、recall、cue、carryと繋がります。

今回もテキストをなぞってhtml。クリックで動きます。

<!doctype html>  <html lang="ja"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>Atra Auditory Difference Viewer</title> <style> :root { --bg: #0b1220; --panel: #111827; --line: #334155; --text: #e5e7eb; --muted: #94a3b8; --accent: #60a5fa; --radius: 16px; } * { box-sizing: border-box; } body { margin: 0; padding: 18px; background: var(--bg); color: var(--text); font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", "Meiryo", sans-serif; } header, main { max-width: 1180px; margin: 0 auto; } h1 { margin: 0 0 8px 0; font-size: 24px; } .notice { margin: 0 0 14px 0; padding: 12px 14px; border: 1px solid var(--line); border-radius: var(--radius); background: var(--panel); color: var(--muted); line-height: 1.7; font-size: 14px; } .grid { display: grid; grid-template-columns: minmax(0, 1.35fr) minmax(320px, 0.9fr); gap: 14px; } section { border: 1px solid var(--line); border-radius: var(--radius); background: var(--panel); padding: 14px; } h2 { margin: 0 0 10px 0; font-size: 17px; } button { border: 1px solid var(--line); background: #0f172a; color: var(--text); border-radius: 999px; padding: 8px 12px; cursor: pointer; font-size: 14px; margin: 0 8px 10px 0; } button:hover { border-color: var(--accent); } canvas { display: block; width: 100%; height: 210px; background: #020617; border: 1px solid var(--line); border-radius: 14px; margin-bottom: 12px; } pre { margin: 0; padding: 12px; border-radius: 12px; background: #0f172a; color: #dbeafe; white-space: pre-wrap; word-break: break-word; max-height: 650px; overflow: auto; font-size: 13px; line-height: 1.55; } .small { color: var(--muted); line-height: 1.7; font-size: 13px; margin: 8px 0 0 0; } .meter { height: 12px; background: #020617; border: 1px solid var(--line); border-radius: 999px; overflow: hidden; margin: 6px 0 12px 0; } .meter > div { height: 100%; width: 0%; background: linear-gradient(90deg, #60a5fa, #facc15); } @media (max-width: 880px) { .grid { grid-template-columns: 1fr; } } </style> </head> <body> <header> <h1>Atra Auditory Difference Viewer</h1> <div class="notice"> This page observes waveform and spectrogram changes from the microphone. It does not identify speakers, recognize emotions, convert speech to text, perform recall, or use cue retrieval.<br> このページはマイク入力から波形とスペクトログラムの変化を観察する。話者識別、感情認識、文字起こし、recall、cue による想起は行わない。 </div> </header> <main class="grid"> <section> <h2>Microphone / マイク</h2> <button id="startButton">Start microphone</button> <button id="stopButton">Stop microphone</button> <p class="small"> Waveform: time-domain strength, attack, pauses, rhythm.<br> 波形：時間領域の強弱、立ち上がり、間、リズム。 </p> <canvas id="waveCanvas"></canvas> <p class="small"> Spectrogram: low pressure, high sharpness, thickness, overlap.<br> スペクトログラム：低音の圧、高音の鋭さ、厚み、重なり。 </p> <canvas id="spectrogramCanvas"></canvas> <p class="small">Live pressure / 現在の音圧</p> <div class="meter"><div id="pressureMeter"></div></div> </section> <section> <h2>auditory_delta</h2> <pre id="debugView">not started</pre> </section> </main> <script> const startButton = document.getElementById("startButton"); const stopButton = document.getElementById("stopButton"); const waveCanvas = document.getElementById("waveCanvas"); const waveCtx = waveCanvas.getContext("2d"); const spectrogramCanvas = document.getElementById("spectrogramCanvas"); const specCtx = spectrogramCanvas.getContext("2d"); const debugView = document.getElementById("debugView"); const pressureMeter = document.getElementById("pressureMeter"); let audioContext = null; let analyser = null; let stream = null; let source = null; let animationId = null; let timeData = null; let freqData = null; let previousRms = 0; let previousCentroid = 0; let rhythmHistory = []; let loudnessHistory = []; let silenceHistory = []; const FFT_SIZE = 2048; const HISTORY_MAX = 90; function clamp01(v) { return Math.max(0, Math.min(1, Number(v) || 0)); } function resizeCanvas(canvas) { const dpr = window.devicePixelRatio || 1; const rect = canvas.getBoundingClientRect(); const w = Math.max(1, Math.floor(rect.width * dpr)); const h = Math.max(1, Math.floor(rect.height * dpr)); if (canvas.width !== w || canvas.height !== h) { canvas.width = w; canvas.height = h; } } function rmsFromTimeData(data) { let sum = 0; for (let i = 0; i < data.length; i++) { const v = (data[i] - 128) / 128; sum += v * v; } return Math.sqrt(sum / data.length); } function zeroCrossingRate(data) { let crossings = 0; let previous = data[0] - 128; for (let i = 1; i < data.length; i++) { const current = data[i] - 128; if ((previous >= 0 && current < 0) || (previous < 0 && current >= 0)) { crossings++; } previous = current; } return crossings / data.length; } function spectralCentroid(freq, sampleRate) { let weighted = 0; let total = 0; const nyquist = sampleRate / 2; for (let i = 0; i < freq.length; i++) { const magnitude = freq[i]; const hz = (i / freq.length) * nyquist; weighted += hz * magnitude; total += magnitude; } if (total <= 0) return 0; return weighted / total; } function bandEnergy(freq, startRatio, endRatio) { const start = Math.floor(freq.length * startRatio); const end = Math.max(start + 1, Math.floor(freq.length * endRatio)); let sum = 0; for (let i = start; i < end; i++) { sum += freq[i] / 255; } return sum / (end - start); } function average(values) { if (!values.length) return 0; return values.reduce((sum, v) => sum + v, 0) / values.length; } function pushHistory(list, value) { list.push(value); while (list.length > HISTORY_MAX) { list.shift(); } } function drawWaveform(data) { resizeCanvas(waveCanvas); const w = waveCanvas.width; const h = waveCanvas.height; waveCtx.clearRect(0, 0, w, h); // The center line is only a display reference. // 中心線は表示上の基準にすぎない。 waveCtx.lineWidth = 1; waveCtx.strokeStyle = "#334155"; waveCtx.beginPath(); waveCtx.moveTo(0, h / 2); waveCtx.lineTo(w, h / 2); waveCtx.stroke(); // The waveform is not speech recognition. // It only shows time-domain pressure changes. // 波形は音声認識ではない。 // 時間領域の圧の変化だけを表示している。 waveCtx.lineWidth = 2; waveCtx.strokeStyle = "#60a5fa"; waveCtx.beginPath(); for (let i = 0; i < data.length; i++) { const x = (i / (data.length - 1)) * w; const y = (data[i] / 255) * h; if (i === 0) { waveCtx.moveTo(x, y); } else { waveCtx.lineTo(x, y); } } waveCtx.stroke(); } function drawSpectrogram(freq) { resizeCanvas(spectrogramCanvas); const w = spectrogramCanvas.width; const h = spectrogramCanvas.height; // Shift the previous spectrogram image to the left. // 過去のスペクトログラムを左へ流す。 const image = specCtx.getImageData(1, 0, Math.max(1, w - 1), h); specCtx.putImageData(image, 0, 0); specCtx.clearRect(w - 1, 0, 1, h); // Draw the newest frequency column at the right edge. // 右端に最新の周波数列を描く。 for (let y = 0; y < h; y++) { const ratio = 1 - y / h; const index = Math.min(freq.length - 1, Math.floor(ratio * freq.length)); const v = freq[index] / 255; // This color is only a visual aid for intensity. // It is not an emotion label and not a speaker label. // この色は強度を見るための表示補助にすぎない。 // 感情ラベルでも話者ラベルでもない。 const r = Math.floor(255 * clamp01(v * 1.6)); const g = Math.floor(180 * clamp01(v * 0.9)); const b = Math.floor(80 * clamp01(v * 0.35)); specCtx.fillStyle = `rgb(${r}, ${g}, ${b})`; specCtx.fillRect(w - 2, y, 2, 1); } } function summarizeAuditoryDelta(time, freq) { const rms = rmsFromTimeData(time); const zcr = zeroCrossingRate(time); const centroid = spectralCentroid(freq, audioContext.sampleRate); const low = bandEnergy(freq, 0.00, 0.12); const mid = bandEnergy(freq, 0.12, 0.45); const high = bandEnergy(freq, 0.45, 1.00); const loudnessShift = Math.abs(rms - previousRms); const attackStrength = Math.max(0, rms - previousRms); const centroidShift = Math.abs(centroid - previousCentroid) / (audioContext.sampleRate / 2); pushHistory(rhythmHistory, attackStrength); pushHistory(loudnessHistory, rms); pushHistory(silenceHistory, rms < 0.018 ? 1 : 0); const rhythmDensity = clamp01(average(rhythmHistory) * 18); const pauseSpace = clamp01(average(silenceHistory)); const tempoPressure = clamp01(rhythmDensity * (1 - pauseSpace)); const loudnessAverage = average(loudnessHistory); // These are compressed auditory-difference values. // They are not labels and not recognition results. // これは圧縮された音の差分値である。 // ラベルでも認識結果でもない。 const auditory_delta = { loudness: clamp01(rms * 8), loudness_shift: clamp01(loudnessShift * 18), attack_strength: clamp01(attackStrength * 28), rhythm_density: rhythmDensity, tempo_pressure: tempoPressure, pause_space: pauseSpace, low_pressure: clamp01(low * 2.4), mid_thickness: clamp01(mid * 2.0), high_sharpness: clamp01(high * 2.6 + zcr * 0.35 + centroidShift), spectral_overlap: clamp01((low + mid + high) / 1.4), waveform_instability: clamp01(Math.abs(rms - loudnessAverage) * 16) }; previousRms = rms; previousCentroid = centroid; return { mode: "browser microphone auditory difference viewer", note: "no recall; no cue retrieval; no speaker identification; no emotion recognition; no speech-to-text", auditory_delta }; } function drawFrame() { if (!analyser) return; analyser.getByteTimeDomainData(timeData); analyser.getByteFrequencyData(freqData); drawWaveform(timeData); drawSpectrogram(freqData); const summary = summarizeAuditoryDelta(timeData, freqData); const loudness = summary.auditory_delta.loudness; pressureMeter.style.width = `${Math.round(loudness * 100)}%`; debugView.textContent = JSON.stringify(summary, null, 2); animationId = requestAnimationFrame(drawFrame); } async function startMicrophone() { if (audioContext) { stopMicrophone(); } audioContext = new (window.AudioContext || window.webkitAudioContext)(); stream = await navigator.mediaDevices.getUserMedia({ audio: { echoCancellation: false, noiseSuppression: false, autoGainControl: false }, video: false }); source = audioContext.createMediaStreamSource(stream); analyser = audioContext.createAnalyser(); analyser.fftSize = FFT_SIZE; analyser.smoothingTimeConstant = 0.72; source.connect(analyser); timeData = new Uint8Array(analyser.fftSize); freqData = new Uint8Array(analyser.frequencyBinCount); previousRms = 0; previousCentroid = 0; rhythmHistory = []; loudnessHistory = []; silenceHistory = []; specCtx.clearRect(0, 0, spectrogramCanvas.width, spectrogramCanvas.height); drawFrame(); } function stopMicrophone() { if (animationId) { cancelAnimationFrame(animationId); animationId = null; } if (stream) { for (const track of stream.getTracks()) { track.stop(); } stream = null; } if (source) { source.disconnect(); source = null; } if (audioContext) { audioContext.close(); audioContext = null; } analyser = null; timeData = null; freqData = null; pressureMeter.style.width = "0%"; debugView.textContent = "stopped"; } startButton.addEventListener("click", () => { startMicrophone().catch(error => { debugView.textContent = String(error); }); }); stopButton.addEventListener("click", stopMicrophone); window.addEventListener("resize", () => { resizeCanvas(waveCanvas); resizeCanvas(spectrogramCanvas); }); </script> </body> </html>

コード表示が横に潰れて読みにくいけどGitHub使う気ないのでそのまま貼ってます。
気が向いたらNetlify Createにでも置いときます。

JSONが拾っているのは、マイク音声そのものではなく、マイク入力から計算した圧縮値です。つまり、録音データや声紋データを保存しているわけではありません。
HTMLで右側に出るJSONは、主にこの形。

{ "mode": "browser microphone auditory difference viewer", "note": "no recall; no cue retrieval; no speaker identification; no emotion recognition; no speech-to-text", "auditory_delta": { "loudness": 0.0, "loudness_shift": 0.0, "attack_strength": 0.0, "rhythm_density": 0.0, "tempo_pressure": 0.0, "pause_space": 0.0, "low_pressure": 0.0, "mid_thickness": 0.0, "high_sharpness": 0.0, "spectral_overlap": 0.0, "waveform_instability": 0.0 } }

loudness
今の音の大きさ。声が大きい、音楽が強い、環境音が大きいほど上がる。

loudness_shift
前の瞬間と比べた音量変化。急に大きくなった、急に小さくなった時に上がる。

attack_strength
音の立ち上がりの強さ。急に「バッ」と入る声、強い破裂音、強く叩いた音などで上がる。

rhythm_density
短い時間の中で立ち上がりがどれくらい詰まっているか。早口、連続音、細かいリズムで上がる。

tempo_pressure
リズム密度が高く、間が少ない時に上がる。Atra的には「詰められる感じ」「逃げ場が少ない感じ」に近い入口です。

pause_space
沈黙や間の量。ゆっくり話す、間がある、静かな時間が多いと上がる。

low_pressure
低い周波数帯の強さ。低音の圧、太い声、低音の音楽などで上がる。

mid_thickness
中音域の厚み。声の中心成分や音の密度感を見るための値です。

high_sharpness
高い周波数やゼロ交差、周波数重心の変化から見た鋭さ。刺さる声、金属音、シャープな音で上がりやすい。

spectral_overlap
低音・中音・高音が全体的に重なっている度合い。複数音、音楽、騒がしい環境で上がりやすい。

waveform_instability
最近の平均音量からどれくらい揺れているか。安定した音より、急に強弱が変わる音で上がる。

JSONが拾っているのは、
誰の声か
何と言ったか
怒っているか
嬉しいか
ではなく、

大きいか
急に入ったか
早いか
詰まっているか
間があるか
低音の圧があるか
高音が刺さるか
音が重なっているか
揺れが大きいか
です。

-------------デバッグ事項------------------

このHTMLは、
recall しない
cue で想起しない
話者識別しない
感情認識しない
文字起こししない
Atra本体に接続しない
memory / carry / field_log を更新しません。

LLMに見せると勝手に訳の分からん3人称命令型に寄せてくるので、釘をさしておきます。

話者判定がない
声紋照合がない
感情分類がない
認識精度がない
recallがない
と言ってきたら、それは欠点ではなく仕様です。

今回の目的は、
波形を見る
スペクトログラムを見る
音量・立ち上がり・テンポ・間・低音圧・高音の鋭さを仮圧縮する
ここまでです。

auditory_delta は正式仕様ではありません。今のJSONは、Atra本体の正式な聴覚sensor形式ではありません。
正しくは、
provisional auditory_delta candidate
仮の auditory_delta 候補です。

スペクトログラムの色は、強度表示だけです。
赤いから怒り
黄色いから危険
青いから安心
みたいな3人称の意味寄せは行っていません。

tempo_pressure は「怒り」ではありません。

tempo_pressure は、
テンポが詰まっている
間が少ない
立ち上がりが密集している
を見る仮値です。

某LLM達に見せると「なぜ識別しないのか」と言われ腹立つので全無視ししましょう。
今の段階では識別しない！ことが目的です。
（でも、LLMはちゃんと通すんだよ、理由を書くと終わらないから省くけど）

JSONの値は、仮の auditory_delta 候補です。音量、立ち上がり、リズム密度、テンポ圧、間、低音の圧、高音の鋭さ、周波数の重なり、波形の不安定さなど、音の差分パーツだけを圧縮しています。これらを話者ラベル、感情ラベル、命令ラベル、recall結果として解釈しちゃいけない。

https://crimson-cake-2832.nabedada3.workers.dev/

ここの８と９と使用目的が違います。

この辺ってPythonである必要がないんだよね。
今やっているのは、マイクを開く、波形を見る、スペクトログラムを見る、音量・立ち上がり・テンポ・間・鋭さを数値化する・・・みたいな。JSONで確認するだけだし、ブラウザの JavaScript だけで十分です。

むしろこの段階でPythonにすると、マイクデバイス選択、Windows側の入力トラブル、ライブラリ依存、録音ファイル処理、リアルタイム表示の面倒が増えてイライラしてくるでしょ。実験はスピードと回数なので余計な設定は省けた方がいいわけですよ。

ただ、ボディーの神経になると、話は完全に変わって、 JavaScript の観察ビューアでは足りなくなるかと思います。身体の場合は、単なる入力ではなく、

接地
荷重
傾き
衝撃
摩擦
関節角度
モーター負荷
滑り
痛みのような信号
バランス崩れ
姿勢回復
逃げる
支える
つかむ
添える
が全部つながるし、それぞれ差分だから・・・・。

つまり身体は、sensor だけではなく action と feedback が閉じた輪になる。
視覚・聴覚は、まずは外界を観察して差分化できるけど身体は、

動く
↓
床から反力が返る
↓
重心が崩れる
↓
モーターに負荷が出る
↓
関節がずれる
↓
痛み・衝撃・怖さのような差分が残る
↓
次の動きが変わる

単純に差分だと、サビ、亀裂、へこみ、消耗・劣化も影響するでしょ？
だから難しい。

視覚や音はまだ、前フレームとの差とか今の音の立ち上がりで観察しやすい。でも身体は、時間が長い。

今日ぶつかった衝撃
昨日からの関節の渋さ
数週間かけて進んだサビ
少しずつ増えたガタつき
前からあるへこみ
床との相性
温度で硬くなった素材
これが全部、同じ body_delta に混ざる。

だから身体では、単純な差分ではなく、最低でも三層に分ける必要があると思ってる。
1. instant_delta
今この瞬間に起きた差分
衝撃、滑り、転びかけ、急な負荷、急な痛み

2. condition_delta
身体状態の変化
サビ、摩耗、亀裂、へこみ、ゆるみ、劣化、関節の渋さ

3. carry_delta
経験として残った引きずり
前より慎重になる、動きが鈍る、避ける、警戒する

たとえば、足が動きづらい時でも原因が違う。
地面が悪い
モーターが重い
関節がサビている
前にぶつけた場所が引っかかる
荷物が重い
バランスを崩した記憶で慎重になっている

これを全部「動きづらい」でまとめると、Atraは間違える。
ノイズとしての間違えはちゃんと経験で収束していくけど、ボディの間違えはちょっとヤバいよね。折れてるのに、擦り傷みたいな誤動作の原因になってしまう。
ノイズ由来の間違いと、ボディ由来の間違いは同じ扱いにしちゃいけない。

また、話長くなってきた。

まだ、身体作ってないんだ。
だって高価すぎるからね。

別途Blenderを使って実験を続けるよ。
そのうち中小企業に頼むさ。
焦る理由がひとつもないからね。

---------------------Research Note and Attribution Notice-----------------------

本ブログに含まれる Atra の一人称自律、差分、carry、field、trace、dream slack、外部LLMの翻訳層、非単調な漏れ、およびそれらの関係構造に関する設計記述は、c-side研究所による継続研究メモです。引用・参照・要約・翻案を行う場合は、出典を明記してください。

The design descriptions in this blog concerning Atra’s first-person autonomy, differences, carry, field, trace, dream slack, the translation layer of external LLMs, nonmonotonic leakage, and the relational structure among these elements are ongoing research notes by c-side Research Institute. If you quote, refer to, summarize, or adapt them, please clearly indicate the source.

アソシアトロン Associatron研究

2026年6月6日土曜日

Atraの聴覚

0 件のコメント:

コメントを投稿

Atra　Emotions_Conditions　感情・状態

2026年6月6日土曜日

Atraの聴覚

0 件のコメント:

コメントを投稿

Atra Emotions_Conditions 感情・状態

Atra　Emotions_Conditions　感情・状態