äŒè°ã®é³å£°ãåŠçããWebã¢ããªãäœãããã®æè¡èª¿æ»
â»ãã®èšäºã¯èªåãæå±ããçµç¹ã§æžãã以äžã®èšäºã®ã³ããŒã§ããæçš¿ããèšäºã¯å人ã®èäœç©ãšããŠèªããã°ã«ã³ããŒããŠè¯ãã«ãŒã«ãšããŠããŸãã
å èšäº: https://tech-blog.mitsucari.com/entry/2026/02/23/193315
ããã«ã¡ã¯ãããã«ãªCTOã®å¡æ¬ããšãã€ãã³ãŒ(@tsukaby0) ã§ãã
è¿å¹Žãé³å£°ãåŠçããWebã¢ããªãæ¥éã«å¢ããŠããŸããããšãã°äŒè°ã®æåèµ·ããã»èŠçŽãè¡ã Otter.ai ã Fireflies.aiãæ¥æ¬èªç¹åã® Notta ã Rimo Voiceãåè«è§£æã® amptalk ã MiiTelãªã©ãé³å£°ÃAIã®é åã¯çŸ€é岿 ãšããç¶æ³ã§ãã
ãããããµãŒãã¹ãèŠãŠãããšãèªåã§ãäŒè°äžã®çºèšãæåèµ·ãããããã話è ããšã®çºèšéãå 容ãåæããWebã¢ããªãäœããããªããŸããããèãããšãããŸãã¶ã€ããã®ãããããããã©ãŠã¶ã§ã©ããã£ãŠé³å£°ãæ±ãã®ãïŒããšããåé¡ã§ãã
æ¬èšäºã§ã¯ããã©ãŠã¶ã®é³å£°é¢é£APIã®æŽçããå§ããŠããªã³ã©ã€ã³äŒè°ã»ãªã¢ã«äŒè°ã®äž¡æ¹ã«å¯Ÿå¿ããäŒè°åæã¢ããªã®æè¡èª¿æ»ãã¢ãŒããã¯ãã£æ€èšãããŠã¿ãŸãã
çµè«
- ãã©ãŠã¶æšæºã®
SpeechRecognitionã¯ãã€ã¯å ¥åå°çšã§ãã¹ããŒã«ãŒé³å£°ïŒäŒè°çžæã®å£°ïŒã¯æŸããŸããããŸããé³å£°ããã©ãŠã¶ãã³ããŒã®ãµãŒããŒã«éä¿¡ãããç¹ã«ã泚æãå¿ èŠã§ã MediaStream Recording API+getDisplayMediaãçµã¿åãããã°ãã€ã¯ïŒã¿ãé³å£°ã®åæé²é³ã¯å¯èœã§ãããç»é¢å ±æãã€ã¢ãã°ã®æäœãå¿ èŠã«ãªããªã©UXäžã®èª²é¡ããããŸã- é³å£°èªèã¯ãµãŒããŒãµã€ãSTTïŒSpeech-to-TextïŒã«ä»»ããã®ãçŸå®çã§ãã話è åé¢ïŒãã€ã¢ã©ã€ãŒãŒã·ã§ã³ïŒããµãŒããŒåŽã§è¡ããŸã
- ãªã³ã©ã€ã³äŒè°ã®é³å£°ååŸã«ã¯ãRecall.ai ã®ãããªããŒãã£ã³ã°ãããPaaSã䜿ããšãGoogle Meetã»Zoomã»Teamsãªã©ã®å·®ç°ãåžåã§ããŸã
- ãªã¢ã«äŒè°ïŒå¯Ÿé¢ïŒã¯ãã©ãŠã¶ã®
getUserMedia+ äŒè°çšãã€ã¯ã§åé³ããåããµãŒããŒãµã€ãSTTã«æµãã°å¯Ÿå¿å¯èœã§ã
ãã©ãŠã¶ã§é³å£°ãæ±ãWeb API
Web Speech API
ãŸãã¯åºç€ç¥èãšããŠããã©ãŠã¶ã§é³å£°ãæ±ãAPIã§ãã Web Speech API ãæŽçããŠãããŸãã
詳现ã¯äžèšã®èšäºã®éãã§ãããé³å£°åæã® SpeechSynthesis ãš é³å£°èªèã® SpeechRecognition äºã€ã§æ§æãããŠããŸããé³å£°åæãšããã®ã¯ããã¹ãã®é³å£°åã®ããšã§ãã
åè«ã®ããŒãã¬ãAIãããŠãããããšãããããªãµãŒãã¹ãäœãå Žå㯠SpeechSynthesis ã䜿ãããã§ããããã¹ãããã©ãŠã¶äžã§ç°¡åã«ã¹ããŒããããããšãã§ããŸãã
äŒè°ã®æåèµ·ããããããããšãããããªã±ãŒã¹ã§ã¯ SpeechRecognition ã䜿ãããã§ãã
ä»ã«ã SpeechGrammar ã SpeechSynthesisUtterance ãªã©æ§ã
ãªã€ã³ã¿ãã§ãŒã¹ããããŸãããè£å©çãªèšå®ãªã©ã§ããã代衚çãªã®ã¯åè¿°ããäºã€ã§ãã
äœ¿ãæ¹ã¯å²æããŸããSpeechRecognition ã®ãã¢ã¯GoogleãçšæããŠãããããç°¡åã«è©Šãããšãã§ããŸãã
ãã¡ãã§æ¥æ¬èªãéžæããŠãã€ã¯ãã¿ã³ãæŒããŠããåããšæåèµ·ãããããŸãããªããªã粟床ã¯è¯ãã§ããã
ãã©ãŠã¶ãæšæºã§çšæããŠãããŠãã䟿å©ã§ãããæ¬ ç¹ããããŸãã
ã¡ã¢: Chrome ãªã©äžéšã®ãã©ãŠã¶ãŒã§ã¯ããŠã§ãããŒãžäžã§é³å£°èªèã䜿çšãããšãµãŒããŒããŒã¹ã®èªèãšã³ãžã³ã䜿çšãããŸããé³å£°ãèªèåŠçããããã«ãŠã§ããµãŒãã¹ãžéä¿¡ããããããªãã©ã€ã³ã§ã¯åäœããŸããã åŒçš: SpeechRecognition - https://developer.mozilla.org/ja/docs/Web/API/SpeechRecognition
æåèµ·ããã¯ãã©ãŠã¶ã§å®çµããèš³ã§ã¯ãªãããµãŒããŒã«éãããããšããããŸããããã¯äºæ¥è åŽãšããŠãå©çšè åŽãšããŠã蚱容ã§ããªãå ŽåããããŸãã
ãŸããã©ãŠã¶ã®å¯Ÿå¿ç¶æ³ãç°ãªããŸãã詳现ã¯caniuseãªã©ã§ç¢ºèªãããšããããŸãããäŸãã° SpeechRecognition ã¯EdgeãFirefoxã§ã¯äžå®å
šãªç¶æ
ã ã£ããããŸãã
ããã«éèŠãªãã€ã³ããšããŠãSpeechRecognition㯠ãã€ã¯å
¥åããåãä»ããªã ãšãã仿§ããããŸãã
ã¹ããŒã«ãŒã®é³ããã€ã¯ã«å
¥ã£ãå Žåã¯æåèµ·ãããããŸãããåºæ¬çã«ã¯ããã¯æåŸ
ã§ããªãã§ãããã·ã¹ãã é³å£°ãåã蟌ããããªããšã¯ã§ããªãã®ã§ãZoomãMeetã§æµããé³å£°ãæŸãããšã¯ã§ããŸããã
MediaStream Recording API (Media Recording API ãŸã㯠MediaRecorder API)
åçŽã«é³å£°ãé²é³ãããå Žåã¯ãã¡ãã䜿ããŸããæåèµ·ããã¯ãããªãã®ã§ãé²é³ããBlobããµãŒããŒã«éã£ãŠSTTãè¡ãå¿ èŠããããŸããSTTãšã¯Speech-To-Textã®ããšã§ãããæåèµ·ãããšããæå³ã§ããSTTã«ã€ããŠã¯åŸè¿°ããŸãã
å
ã»ã©ã® Web Speech API ã§ã¯ãã€ã¯ã¯åããŠãã¹ããŒã«ãŒïŒã·ã¹ãã é³å£°ïŒã¯åããªããšèª¬æããŸãããããã¡ãã®æ¹åŒã§ã¯ãããå¯èœã§ãã以äžã«ç°¡åãªã³ãŒããçšæããŠå®éšããŠã¿ãŸãã
<!DOCTYPE html>
<html lang="ja">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ãã€ã¯ïŒã¹ããŒã«ãŒé³å£°ãã£ããã£ãã¹ã</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
background: #f5f5f5;
color: #333;
padding: 2rem;
max-width: 720px;
margin: 0 auto;
}
h1 { font-size: 1.4rem; margin-bottom: 0.5rem; }
.desc { color: #666; font-size: 0.9rem; margin-bottom: 1.5rem; line-height: 1.6; }
.controls { display: flex; gap: 0.75rem; margin-bottom: 1.5rem; flex-wrap: wrap; }
button {
padding: 0.6rem 1.2rem;
border: none;
border-radius: 6px;
font-size: 0.95rem;
cursor: pointer;
transition: opacity 0.2s;
}
button:disabled { opacity: 0.4; cursor: not-allowed; }
button:hover:not(:disabled) { opacity: 0.85; }
#btnStart { background: #2563eb; color: #fff; }
#btnStop { background: #dc2626; color: #fff; }
.status {
padding: 0.75rem 1rem;
border-radius: 6px;
margin-bottom: 1rem;
font-size: 0.9rem;
line-height: 1.5;
}
.status.idle { background: #e5e7eb; }
.status.recording { background: #fef3c7; }
.status.done { background: #d1fae5; }
.status.error { background: #fee2e2; }
.recording-indicator {
display: inline-block;
width: 10px;
height: 10px;
background: #dc2626;
border-radius: 50%;
margin-right: 6px;
animation: blink 1s infinite;
}
@keyframes blink { 0%, 100% { opacity: 1; } 50% { opacity: 0.3; } }
#result { margin-top: 1rem; }
#result audio { width: 100%; margin-top: 0.5rem; }
.note {
margin-top: 1.5rem;
padding: 1rem;
background: #eff6ff;
border-radius: 6px;
font-size: 0.85rem;
line-height: 1.6;
color: #1e40af;
}
</style>
</head>
<body>
<h1>ãã€ã¯ïŒã¹ããŒã«ãŒé³å£°ãã£ããã£ãã¹ã</h1>
<p class="desc">
ãã€ã¯å
¥åïŒèªåã®å£°ïŒãšã¿ã/ã·ã¹ãã é³å£°ïŒã¹ããŒã«ãŒããåºãçžæã®å£°ïŒã<br>
åæã«ãã£ããã£ããŠé²é³ãããã¢ã§ãã
</p>
<div class="controls">
<button id="btnStart">é²é³éå§</button>
<button id="btnStop" disabled>é²é³åæ¢</button>
</div>
<div id="status" class="status idle">åŸ
æ©äž</div>
<div id="result"></div>
<div class="note">
<strong>äœ¿ãæ¹:</strong><br>
1.ãé²é³éå§ããæŒããšããŸããã€ã¯ã®èš±å¯ãæ±ããããŸã<br>
2. 次ã«ç»é¢å
±æãã€ã¢ãã°ãåºãŸãã<strong>Chromeã¿ã</strong>ãéžã³ããã¿ãã®é³å£°ãå
±æãã«ãã§ãã¯ãå
¥ããŠãã ãã<br>
3. é²é³äžã¯å¥ã¿ãã§é³æ¥œãåç»ãåçãããšãã¹ããŒã«ãŒé³å£°ãé²ããŠããã確èªã§ããŸã<br>
4.ãé²é³åæ¢ãã§ãªãŒãã£ãªãã¬ãŒã€ãŒã衚瀺ãããŸã
</div>
<script>
const btnStart = document.getElementById('btnStart');
const btnStop = document.getElementById('btnStop');
const statusEl = document.getElementById('status');
const resultEl = document.getElementById('result');
let recorder = null;
let chunks = [];
let micStream = null;
let displayStream = null;
let audioCtx = null;
function setStatus(text, type) {
statusEl.className = 'status ' + type;
statusEl.innerHTML = text;
}
btnStart.addEventListener('click', async () => {
try {
setStatus('ãã€ã¯ã®èš±å¯ã確èªäž...', 'idle');
// 1. ãã€ã¯é³å£°ãååŸ
micStream = await navigator.mediaDevices.getUserMedia({ audio: true });
setStatus('ã¿ãé³å£°ã®å
±æãã€ã¢ãã°ãåŸ
ã£ãŠããŸã...', 'idle');
// 2. ã¿ã/ã·ã¹ãã é³å£°ãååŸ
displayStream = await navigator.mediaDevices.getDisplayMedia({
video: true, // video ã¯å¿
é ïŒChrome ã®ä»æ§ïŒ
audio: true
});
// displayStream ã«é³å£°ãã©ãã¯ãããã確èª
const displayAudioTracks = displayStream.getAudioTracks();
if (displayAudioTracks.length === 0) {
setStatus('ã¿ãé³å£°ãååŸã§ããŸããã§ãããå
±ææã«ãã¿ãã®é³å£°ãå
±æãã«ãã§ãã¯ãå
¥ããŠãã ããã', 'error');
cleanup();
return;
}
// 3. Web Audio API ã§äž¡æ¹ãããã¯ã¹
audioCtx = new AudioContext();
const dest = audioCtx.createMediaStreamDestination();
const micSource = audioCtx.createMediaStreamSource(micStream);
micSource.connect(dest);
// displayStream ããé³å£°ãã©ãã¯ã ãã® MediaStream ãäœã
const displayAudioStream = new MediaStream(displayAudioTracks);
const displaySource = audioCtx.createMediaStreamSource(displayAudioStream);
displaySource.connect(dest);
// 4. ããã¯ã¹ããã¹ããªãŒã ãé²é³
chunks = [];
recorder = new MediaRecorder(dest.stream);
recorder.ondataavailable = (e) => {
if (e.data.size > 0) chunks.push(e.data);
};
recorder.onstop = () => {
const blob = new Blob(chunks, { type: 'audio/webm' });
const url = URL.createObjectURL(blob);
resultEl.innerHTML = `
<p style="font-size: 0.9rem; color: #666;">é²é³å®äºïŒ${(blob.size / 1024).toFixed(1)} KBïŒ</p>
<audio controls src="${url}"></audio>
<br>
<a href="${url}" download="recording.webm"
style="display: inline-block; margin-top: 0.5rem; font-size: 0.85rem; color: #2563eb;">
ããŠã³ããŒã
</a>
`;
setStatus('é²é³å®äº', 'done');
cleanup();
};
recorder.start(1000); // 1ç§ããšã«ããŒã¿ãååŸ
setStatus('<span class="recording-indicator"></span>é²é³äž... ãã€ã¯ïŒã¿ãé³å£°ããã£ããã£ããŠããŸã', 'recording');
btnStart.disabled = true;
btnStop.disabled = false;
// ç»é¢å
±æã忢ããããé²é³ãæ¢ãã
displayStream.getVideoTracks()[0].addEventListener('ended', () => {
if (recorder && recorder.state === 'recording') {
recorder.stop();
}
});
} catch (err) {
console.error(err);
if (err.name === 'NotAllowedError') {
setStatus('ãã€ã¯ãŸãã¯ç»é¢å
±æã®èš±å¯ãæåŠãããŸãã', 'error');
} else {
setStatus(`ãšã©ãŒ: ${err.message}`, 'error');
}
cleanup();
}
});
btnStop.addEventListener('click', () => {
if (recorder && recorder.state === 'recording') {
recorder.stop();
}
});
function cleanup() {
if (micStream) {
micStream.getTracks().forEach(t => t.stop());
micStream = null;
}
if (displayStream) {
displayStream.getTracks().forEach(t => t.stop());
displayStream = null;
}
if (audioCtx) {
audioCtx.close();
audioCtx = null;
}
btnStart.disabled = false;
btnStop.disabled = true;
}
</script>
</body>
</html>
ãã®ã³ãŒããlocalã«ä¿åããŠå®è¡ããŸãã
ãã©ãŠã¶ãèš±å¯ãæ±ããŠããã®ã§èš±å¯ãã€ã€ãé³å£°ãåãããã¿ããæå®ããŸããä»åã¯å®éšçšã«çšæããYouTubeã®ã¿ããéžæããŸãããå®éçšäžã§ã¯MeetãZoom(ãã©ãŠã¶ç)ã®ã¿ããæå®ããŸãã


æè¿ã®Windowsã¯åãããŸããããMacã®å Žåã¯ã·ã¹ãã åŽã®èš±å¯ãåºãã®ã§ããããONã«ããŠãããŸãã

é©åœã«YouTubeãåçãã€ã€ãèªåãåã£ãŠã¿ããšããç¡äºäž¡æ¹ãšãé²é³ãããŠããŸããããã®BlobããµãŒããŒã«éã£ãŠå¥éåŠç(STT)ããå¿ èŠã¯ãããŸãããããããããšã¯ã§ããŸããã


æ¬ ç¹ãšããŠã¯å¥ã¿ãã®å ±æãéžæããªããšãããªãããšãšãã·ã¹ãã åŽã§ã®èš±å¯ããããšããããšã§ããä»åã¯ã·ã¹ãã åŽã®èš±å¯ã¯äºåã«ããŠãã£ãã®ã§(ããããMeetå©çšãªã©ã§äºåã«ããŠãã£ãã®ã§)ãããã¯æäœããŸããã§ãããããŠãŒã¶ãŒã«ãã®äºã€ã®æäœãããããšããã®ã¯å°ãå«ã§ããã
åãã¿ãã§ããã°å ±æãçç¥ã§ããŸããã€ãŸãèªåã®æäŸããWebã¢ããªå ã«WebäŒè°æ©èœãããŒãã¬æ©èœãå èµã§ããã³ã¹ããæããã®ã§ããã°ãã®æäœãäžã€æžãããŸãã
ãŸããå¥ãã©ãŠã¶ã»å¥ã¢ããªïŒäŸïŒZoomãã¹ã¯ãããçïŒã«ãã©ãŠã¶ã®Web APIããã¯çŽæ¥ã¢ã¯ã»ã¹ã§ããŸããããã®ããããããã®é³å£°ãæŸãããã©ããã¯OSäŸåã®æåã«ãªããŸããã¡ãªã¿ã«ç§ã®Macã§å®éšãããšãããå ±æã¿ãèšå®ç¯å²å€ãã€ãŸãå¥ã¢ããªã®é³å£°ãæŸããŠããŸãããäžå¿é³å£°ã¯ã¯ãªã¢ãªã®ã§å€§äžå€«ã ãšæããŸãããããããããšãã€ã¯çµç±ã§é³ãæŸã£ãŠããå¯èœæ§ããããšã¯æããŸãã
ãµãŒããŒãµã€ãSTTãšããéžæ
ãããŸã§ã®èª¬æã§ SpeechRecognition ã§ã¯ã¹ããŒã«ãŒ(èªå以å€)ã®é³å£°ãããã¹ãåã§ããªãåé¡ãããã MediaStream Recording API ãªãã°å¯èœã ããããã¹ãåã®æ©èœã¯æã£ãŠããªããšãã説æãããŸããã
åŸè ã®æè¡ã䜿ãå Žåãé³å£°ãã¡ã€ã«ã¯ãµãŒããŒåŽã§ããã¹ãåããå¿ èŠããããŸãããããSTT(Speech-to-Text)ãšèšããŸããããããã¯STTã調æ»ããŠã¿ãŸãã
äž»ãªSTTãµãŒãã¹
| ãµãŒãã¹ | ç¹åŸŽ |
|---|---|
| Google Cloud Speech-to-Text | ã¹ããªãŒãã³ã°å¯Ÿå¿ã話è åé¢å¯Ÿå¿ |
| Azure Speech Services | ãªã¢ã«ã¿ã€ã 察å¿ã話è åé¢å¯Ÿå¿ |
| Amazon Transcribe | AWSãããŒãžããã¹ããªãŒãã³ã°å¯Ÿå¿ã話è åé¢å¯Ÿå¿ïŒæå€§10人ïŒãæ¥æ¬èªå¯Ÿå¿ |
| OpenAI Whisper API | é«ç²ŸåºŠã§å®äŸ¡ |
| OpenAI Whisper (OSS) | OSSçãã»ã«ããã¹ãå¯èœ |
| Deepgram | äœã¬ã€ãã³ã·ããªã¢ã«ã¿ã€ã ç¹å |
| AssemblyAI | é«ç²ŸåºŠã髿©èœ |
ä»åã¯ã©ããäžã€ãéžå®ãããã詳ãã調æ»ãè¡ããŸãããã䜿ããã®ã«ãã£ãŠç²ŸåºŠã¯ç°ãªããŸãã
Whisperã«ã€ããŠã¯OSSãã»ã«ããã¹ãçããããç°¡åã«è©Šãããšãã§ããŸãã以ååœç€Ÿã®ããªãã ãã以äžã®ãããªèšäºãæžããŠãããŸããã

whisper-small-mlx å®è¡æéã¯ãšãŠãã¿ãŒããããããã«éãçµæãšãªããŸãããã粟床ã¯é ·ãã§ããã
whisper-large-v3-mlx é ããé ãããããããç²ŸåºŠã¯æé«ã§ããããã£ã±ããã¬ãŒããªããªãã§ããã
ç§ãWhisperã®ã»ã«ããã¹ãçã¯è©Šããã®ã§ãã粟床ã¯åŸ®åŠã§ãããå©çšããã¢ãã«æ¬¡ç¬¬ã§ããããŸãããGPUæ§èœãšé床ã®ãã¬ãŒããªãã§ããWhisperã®APIçã¯åŸé課éãšãªããŸãããä»ã®ã¯ã©ãŠãã®STTããã ãã¶å®ãã§ããã髿§èœGPUãã·ã³ãçšæããªããŠæžãç¹ã¯å¬ããã§ãã
å°å ¥æã¯ã³ã¹ããããŒã¿ã®ç§å¿æ§ãèæ ®ããŠã»ã«ããã¹ããšãããããªã¹ã¯å容ãã€ã€ãããŒãžãåªå ã§APIãšãããããŸãèãããã®åŸã§åãµãŒãã¹ã®ã³ã¹ãããã³æ§èœãæ©èœãæ¯èŒãããšè¯ãããšæããŸãã
話è åé¢ïŒãã€ã¢ã©ã€ãŒãŒã·ã§ã³ïŒ
åè«ããŒãã¬ã®ãããªåäžã®è©±è é³å£°ã ãã§ããã°åé¡ãªãã®ã§ãããäŒè°ã®ãããªã·ãŒã³ã§ã¯ã誰ã話ããããã®å€å¥ãéèŠã§ãããã®æè¡ããã€ã¢ã©ã€ãŒãŒã·ã§ã³ïŒDiarizationïŒãšåŒã³ãŸãã
Google Cloud STTãAzure Speech Servicesã¯ãã€ã¢ã©ã€ãŒãŒã·ã§ã³æ©èœãå
èµããŠããŸããWhisperã«ã¯ãã®æ©èœã¯ãããŸãããå°å
¥ã®éã¯èŠä»¶ã«å¿ããæ©èœãåããŠããããæ€èšããå¿
èŠããããŸãããã ãWhisperã®å Žå㯠pyannote-audio ãªã©ã®è©±è
åé¢ã©ã€ãã©ãªãçµã¿åãããããšã§å®çŸãã§ããŸãã
äŸãã°ä»¥äžã®yousanããã®èšäºãªã©ãåèã«ãªããšæããŸãã
泚æç¹ãšããŠããã€ã¢ã©ã€ãŒãŒã·ã§ã³ãåºåããã®ã¯ãSpeaker 1ããSpeaker 2ãã®ãããªå¿åã©ãã«ã§ãããç°äžããããäœè€ãããã®ããã«ååãèªåã§ä»ãããã§ã¯ãªãã®ã§ãå®åãšã®çŽä»ãã¯ã¢ããªã±ãŒã·ã§ã³åŽã§å¥éå®è£ ããå¿ èŠããããŸãã
äŒè°åæWebã¢ããªã®ã¢ãŒããã¯ãã£
ãããŸã§ã§åºç€çãªç¥èãèŠçŽ ããŸãšããŠããŸããããããããæ¬é¡ã§ãã ãªã³ã©ã€ã³äŒè°ã®é³å£°ãååŸããŠåæããã¢ãããŒããã¢ãŒããã¯ãã£ã¯å€§ãã4ã€ãããšæããŸãã
1. Chromeæ¡åŒµæ¹åŒ
Chromeæ¡åŒµã® chrome.tabCapture APIã䜿ãã°ãGoogle Meetãªã©ã®ã¿ãé³å£°ããã£ããã£ã§ããŸãããŠãŒã¶ãŒã¯æ¡åŒµãã€ã³ã¹ããŒã«ããã ãã§ã远å ã®ãœãããŠã§ã¢ã¯äžèŠã§ãã
chrome.tabCapture ã«ã€ããŠã¯ä»¥äžãã芧ãã ããã
ãã®æ¹åŒã¯äŸãã°
ãªã©ã®æµ·å€ãµãŒãã¹ã§æ¡çšãããŠããŸããæ¥æ¬è£œã®ãµãŒãã¹ãšããŠã¯ä»¥äžããããŸãã
- Notta â æ¥æ¬èªå«ã58èšèªå¯Ÿå¿ã®æåèµ·ããïŒAIèŠçŽãChromeæ¡åŒµãã
ãŠãŒã¶ãŒã®å°å ¥ãç°¡åãšããäžæ¹ãChromeéå®ãæ¡åŒµã®å¯©æ»ã»é åžãå¿ èŠãšãããã¡ãªããããããŸãã
2. ãããåå è æ¹åŒ
äŒè°ã«ããããåå è ãšããŠéã蟌ã¿ãããããé³å£°ãåä¿¡ããŠåŠçããæ¹åŒã§ãã
ãã ããåãã©ãããã©ãŒã (Meet, Zoom, Teams, etc)ã§ã®ãããåå æ¹æ³ã¯ç°ãªããŸãã
èªåã§å šãã©ãããã©ãŒã ã«å¯Ÿå¿ããã®ã¯ã¡ã³ããã³ã¹ã³ã¹ããé«ãã§ããç¹ã«ãããã¬ã¹ãã©ãŠã¶ã§æäœããªããã°ãªããªãå ŽåããããããUIãããæ¥çªç¶å€ãã£ãŠããããããã°ã€ã³ã§ããªããªãããšãããªã¹ã¯ããããŸãã
ãã®åé¡ã解決ããPaaSãšããŠæè¿ã¯Recall.aiãæµè¡ãå§ããŠããŸããããã¡ã¯ãã«ãªãã€ã€ãããããããŸããããããå©çšãããšç°¡åã«BotãäŒè°ã«éã蟌ãããšãã§ããçµ±äžãããã€ã³ã¿ãã§ãŒã¹ã§æäœãã§ããŸãã
Recall.aiã®äœ¿ãæ¹ã¯ãã¡ãã®ããã¥ã¡ã³ããã芧ãã ããã
curl -X POST https://$RECALLAI_REGION.recall.ai/api/v1/bot \
-H 'Authorization: Token $RECALLAI_API_KEY' \
-H 'Content-Type: application/json' \
-d '
{
"meeting_url": "$MEETING_URL",
"bot_name": "My Bot",
"recording_config": {"transcript": {"provider": {"meeting_captions": {}}}}
}'
äŒè°URLãæž¡ãã ãã§ããã©ãããã©ãŒã å€å¥ã»ãããå ¥å®€ã»é³å£°ååŸãè£åŽã§åŠçããŠãããŸãããã©ãããã©ãŒã ããšã®å·®ç°ãåžåããŠãããã®ã¯éåžžã«ãããããã§ããã
ã¡ãªããã¯ãã«ããã©ãããã©ãŒã 察å¿ããŠãŒã¶ãŒã®ã€ã³ã¹ããŒã«äžèŠãšããç¹ã§ããç¹ã«ã€ã³ã¹ããŒã«äžèŠãšããã®ã¯è¯ãã§ããããã¡ãªããã¯ãããã®åå ãäŒè°åå è ã«èŠãããå€éšãµãŒãã¹ãžã®äŸåããªã¢ã«äŒè°(察é¢äŒè°)ã§ã¯äœ¿ããªãããšããç¹ã§ãã
â»ãªã¢ã«äŒè°ã§ã誰ããMeetçãéããŠBotãèªåã§å ¥ãã°è¯ãèš³ã§ããã
ãã ãRecall.aiã¯ãã®ãããªæ¬ ç¹ãåœç¶èªèããŠããããã§ãããDesktop Recording SDKãšãããã®ãçšæããŠããŸããå Žåã«ãã£ãŠã¯ãã¡ãã®æ¹ããŠãŒã¹ã±ãŒã¹ã«åãããã§ãã
3. èªåWebRTCäŒè°å®è£ æ¹åŒ
èªåã§WebRTCããŒã¹ã®äŒè°æ©èœãæ§ç¯ããæ¹åŒã§ããååå è ã®é³å£°ãç¬ç«ããã¹ããªãŒã ã«ãªãããããã€ã¢ã©ã€ãŒãŒã·ã§ã³äžèŠã§ã誰ã®çºèšãããæåããããããŸãã
ãæåãªéžæè¢ã§ãã
ã¡ãªãããšããŠã¯æãæè»ã§é³å£°ããŒã¿ãæ±ããããã§ãããã¡ãªãããšããŠã¯ã©ã€ãã©ãªã䜿ã£ããšããŠãäŒè°æ©èœèªäœã®éçºã³ã¹ãã倧ããããŠãŒã¶ãŒã«å¥ããŒã«ã®å©çšãæ±ãããšããç¹ããããŸãã
ãŠãŒã¶ãŒã«ãšã£ãŠã¯äŒç€Ÿã®åºæ¬çãªäŒè°ã¢ããªã¯Meetãšæ±ºãŸã£ãŠããã®ã«ãç¹å®ã®äŒè°ã§ã ãå¥ã®äŒè°ã¢ããªã䜿ããªããã°ãªããªãããšããã®ã¯äžäŸ¿ã§ãã
4. ãã©ãããã©ãŒã API飿º
åãã©ãããã©ãŒã ïŒZoom APIãGoogle Workspace APIãMicrosoft Graph APIïŒãæäŸããé²é³ããŒã¿ããã©ã³ã¹ã¯ãªãããäºåŸååŸããŠåæããæ¹åŒã§ãã
ã¡ãªãããšããŠã¯å®è£ ãã·ã³ãã«ã«ãªããŸãããã¡ãªãããšããŠã¯ãªã¢ã«ã¿ã€ã æ§ããªããååŸã§ããããŒã¿ããã©ãããã©ãŒã äŸåãå Žåã«ãã£ãŠã¯ããŒã¿ãåããªãããšãããããªç¹ããããŸãã
äŸãã°Meetã§ã¯äŒè°äžã«é²ç»ãã¿ã³ãæŒãã®ãå¿ããå ŽåãåŸã§ãã®é²ç»ããŒã¿ããšãããšã¯ã§ããŸãããMeetã®èšå®ã§äŒè°ãå§ãŸãåã«èªåã§é²ç»ãéå§ããããããªèšå®ã¯ã§ããŸãããAPIçã§èªåã§é²ç»éå§ããããšã¯ã§ããªããããé²ç»ãã¹ãšããåé¡ãã€ããŸãšããŸãã
ãªã¢ã«äŒè°ãžã®å¯Ÿå¿
ãªã³ã©ã€ã³äŒè°ã ãã§ãªããç©ççãªäŒè°å®€ã§ã®äŒè°ã«ã察å¿ãããã±ãŒã¹ããããšæããŸãããããæ¹åŒã¯ãªã³ã©ã€ã³äŒè°ã«ã¯åŒ·ãã§ããããªã¢ã«äŒè°ã«ã¯äœ¿ããŸãã(ããããã«ãã£ãŠã¯äœ¿ãããšã¯æããŸã)ã
ãªã¢ã«äŒè°ã§ã¯ããã©ãŠã¶ã§Webã¢ããªãéããŠããã€ã¹ã®ãã€ã¯ãã getUserMedia ã§åé³ããWebSocketçµç±ã§ãµãŒããŒã«ã¹ããªãŒãã³ã°ããŠSTTã«ãããæ¹åŒãè¯ããããããŸãããWebã¢ããªäžã«ããããæ©èœãå®è£
ããŠãè¯ãã§ãããChromeæ¡åŒµæ¹åŒã§ãè¯ããšæããŸãã
ã©ããªæ¹åŒãæ¡çšããã«ããŠãããªã¢ã«äŒè°ã®å Žåããã€ã¯ã®éžå®ãéèŠã§ãã以äžã®ãããªå šæåãã€ã¯ãã€ãšã³ãŒãã£ã³ã»ãªã³ã°ããã€ãºãªãã¯ã·ã§ã³ãåãã補åãäŒè°å®€ã«çœ®ããŠãããšè¯ãã§ãã

ãã ãã©ããªãã€ã¯ã䜿ããã¯æã ã®ãããªäºæ¥è ãã³ã³ãããŒã«ã§ããéšåã§ã¯ãªãã®ã§ããªã¢ã«äŒè°ã®å¯Ÿå¿ã¯å°ã倧å€ã§ã¯ãããŸããã
ãŸãããªã¢ã«äŒè°ã§ã¯å šå¡ã®å£°ã1ã€ã®ãã€ã¯ã«å ¥ãããããµãŒããŒãµã€ãã®ãã€ã¢ã©ã€ãŒãŒã·ã§ã³ãå¿ é ã«ãªããŸãããªã³ã©ã€ã³äŒè°ã§ã¯ååå è ã®é³å£°ãåãããŠããŸããããªã¢ã«äŒè°ã§ã¯ããã¯ãããªãã®ã§ãããã¯STTåŽã®è©±è åé¢ã«é Œãããšã«ãªããŸãã
ã¢ãŒããã¯ãã£æ¡
ã©ããªèŠä»¶ããããæ¬¡ç¬¬ã§ã¯ãããŸãããã¢ãŒããã¯ãã£ãç°¡åã«èããŠã¿ãŸãã
- å ¥åã¬ã€ã€: Recall.aiã«ãããããåå ã»é³å£°åé ãŸã㯠ãã©ãŠã¶ã¢ããªãChromeæ¡åŒµã«ããã¹ããŒã«ãŒãšãã€ã¯ã®é³å£°åé
- åŠçã¬ã€ã€: STT(Google Cloud Speech-to-Text)ã«ããæåèµ·ãããšãã€ã¢ã©ã€ãŒãŒã·ã§ã³
- DBã¬ã€ã€: äŒè©±ããã¹ãã¯RDBMSããã¯ãã«DBãå šææ€çŽ¢ãšã³ãžã³ãªã©ã«èŠä»¶ã«å¿ããŠæ ŒçŽãé³å£°ãã¡ã€ã«ã¯ãªããžã§ã¯ãã¹ãã¬ãŒãžãž
- UIã¬ã€ã€: Next.jsç奜ããªFWãChromeæ¡åŒµã䜿ã£ãŠé³å£°åéãå¯èœã«ããããŸãã¯èšå®çãéããŠbotãäŒè°ã«åå ã§ããããã«ãã
ããã
ä»åã¯å®éã«äœããšãããåãããšãããŸã§ã¯è§ŠããŸããã§ããããå¥ã®æ©äŒã«Recall.aiãªã©ã䜿ã£ãŠç°¡åã«å®è£ ããŠã¿ãããšæããŸãã
ä»åã®èª¿æ»ãèå¯ã§åŸãé³å£°åŠçWebã¢ããªãäœãäžã§ã®ãã€ã³ããæ¯ãè¿ããŸãã
- ãã©ãŠã¶æšæºã®SpeechRecognitionã ãã§ã¯ãäŒè°ã®åæ¹åé³å£°ã®æåèµ·ããã¯å°é£
- ã¹ããŒã«ãŒé³å£°ã®ãã£ããã£ã¯OSã»ãã©ãŠã¶ã®å¶çŽã倧ããããã©ãŠã¶ã®JSã ãã§ã¯éçããã
- ãµãŒããŒãµã€ãSTT + ãã€ã¢ã©ã€ãŒãŒã·ã§ã³ããå®çšçãªé³å£°åæã®åºç€
- ãªã³ã©ã€ã³äŒè°ã®é³å£°ååŸã«ã¯ãRecall.aiã®ãããªããŒãã£ã³ã°ãããPaaSãè¯ã
- ãªã¢ã«äŒè°ã¯ãã©ãŠã¶ã®
getUserMedia+ äŒè°ãã€ã¯ã§å¯Ÿå¿å¯èœ
çŸåšãããã«ãªã§ã¯ITãšã³ãžãã¢ãåéããŠããŸããèå³ã®ããæ¹ã¯ãã²ãæ°è»œã«ãé£çµ¡ãã ããïŒ