feat(tts): 新增通用流式 TTS 引擎并接入 AI 对话
- 新增 @wenwumap/tts 独立包:边流式边合成、按句排队顺序播放、 专业 TTS 失败自动降级浏览器朗读,含 README 使用说明 - AI 后端新增 /ai/tts 接口,改用 DashScope CosyVoice(cosyvoice-v3-flash) 输出 mp3,串行+退避重试规避 429 限流 - web 对话面板接入 SpeechQueue,按角色配音色,加语音开关与朗读按钮 - admin 支持 /admin/ 基路径部署 - 地图页移除大面积 backdrop-blur,降低 GPU 占用
This commit is contained in:
@@ -0,0 +1,258 @@
|
||||
# @wenwumap/tts
|
||||
|
||||
通用、与框架无关的**流式语音合成(TTS)播放引擎**。
|
||||
|
||||
把「AI 回答 → 语音播报」这件事沉淀成一个独立模块:边流式边合成、按句排队、顺序无缝播放,专业 TTS 失败时自动降级到浏览器朗读,绝不静默。
|
||||
|
||||
- ✅ 零业务耦合,零运行时依赖(仅用浏览器 `fetch` / `Audio` / `speechSynthesis`)
|
||||
- ✅ 不绑定 React / Vue,纯 TypeScript 类,任何前端都能用
|
||||
- ✅ 边流式边合成:第一句话出现后约 1 秒即可起声
|
||||
- ✅ 按入队顺序播放,合成乱序完成也不会乱
|
||||
- ✅ 双轨兜底:后端 TTS 限流/失败 → 自动浏览器朗读
|
||||
- ✅ 处理浏览器自动播放策略(`unlock()`)、blob 资源回收、会话防串音
|
||||
|
||||
---
|
||||
|
||||
## 1. 它依赖什么?
|
||||
|
||||
模块本身**不发起任何特定厂商的请求**。它只要求你提供一个**后端 TTS 接口**,满足下面的契约:
|
||||
|
||||
### 接口契约
|
||||
|
||||
```
|
||||
POST <endpoint>
|
||||
Content-Type: application/json
|
||||
|
||||
请求体: { "text": "要合成的纯文本", "voice": "音色ID(可选)" }
|
||||
|
||||
成功(2xx): 返回音频二进制(mp3 / wav 等,Content-Type 为 audio/*)
|
||||
失败(非2xx): 引擎自动对这一段降级为浏览器朗读
|
||||
```
|
||||
|
||||
> 引擎用 `fetch(endpoint).blob()` 拿音频并用 `URL.createObjectURL` 播放,所以**同源**或正确的 CORS 即可。
|
||||
|
||||
### 后端实现参考(任选一种 TTS 服务)
|
||||
|
||||
以阿里云百炼 CosyVoice 为例(Node/NestJS 伪代码):
|
||||
|
||||
```ts
|
||||
// POST /api/tts -> 返回 audio/mpeg
|
||||
app.post("/api/tts", async (req, res) => {
|
||||
const { text, voice } = req.body;
|
||||
const r = await fetch(
|
||||
"https://dashscope.aliyuncs.com/api/v1/services/audio/tts/SpeechSynthesizer",
|
||||
{
|
||||
method: "POST",
|
||||
headers: { Authorization: `Bearer ${process.env.DASHSCOPE_API_KEY}`, "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
model: "cosyvoice-v3-flash",
|
||||
input: { text: text.slice(0, 800), voice: voice || "longxiaochun_v3", format: "mp3", sample_rate: 22050 },
|
||||
}),
|
||||
}
|
||||
);
|
||||
const { output } = await r.json();
|
||||
const audio = await fetch(output.audio.url); // 拿到临时音频 URL 再回源,避免跨域/过期
|
||||
res.setHeader("Content-Type", "audio/mpeg");
|
||||
res.send(Buffer.from(await audio.arrayBuffer()));
|
||||
});
|
||||
```
|
||||
|
||||
> 提示:CosyVoice/qwen-tts 有**并发限流(429)**。建议后端把对上游的并发限制为 1,并对 429 做退避重试——引擎层即便偶发失败也会自动降级浏览器朗读,不会静默。
|
||||
|
||||
---
|
||||
|
||||
## 2. 安装 / 引入
|
||||
|
||||
### 方式 A:同一个 monorepo(pnpm workspace,推荐)
|
||||
|
||||
1. 把 `packages/tts` 整个目录拷到目标仓库的 `packages/` 下(或保留在本仓库共用)。
|
||||
2. 在要使用的 app 里加依赖:
|
||||
|
||||
```jsonc
|
||||
// apps/your-app/package.json
|
||||
{ "dependencies": { "@wenwumap/tts": "workspace:*" } }
|
||||
```
|
||||
|
||||
3. 因为包是「源码直出」(`main` 指向 `src/index.ts`),构建工具需要转译它:
|
||||
|
||||
- **Next.js**:`next.config.js`
|
||||
```js
|
||||
module.exports = { transpilePackages: ["@wenwumap/tts"] };
|
||||
```
|
||||
- **tsconfig 路径别名**(编辑器类型解析):
|
||||
```jsonc
|
||||
{ "compilerOptions": { "paths": { "@wenwumap/tts": ["../../packages/tts/src/index.ts"] } } }
|
||||
```
|
||||
- **Vite**:无需特殊配置;如遇到未转译可在 `optimizeDeps`/`build` 里包含它。
|
||||
|
||||
4. `pnpm install` 链接工作区依赖。
|
||||
|
||||
### 方式 B:独立项目,直接拷贝源码
|
||||
|
||||
包没有任何依赖,直接把 `src/` 三个文件拷进你的项目即可:
|
||||
|
||||
```
|
||||
src/text.ts # stripMarkdown / splitSpeakable
|
||||
src/speech-queue.ts # SpeechQueue
|
||||
src/index.ts # 导出
|
||||
```
|
||||
|
||||
然后 `import { SpeechQueue } from "./tts"`。
|
||||
|
||||
### 方式 C:改名复用
|
||||
|
||||
它叫 `@wenwumap/tts` 只是包名,与业务无关。换个项目可以把包名改成 `@yourorg/tts`,逻辑完全通用。
|
||||
|
||||
---
|
||||
|
||||
## 3. 快速上手
|
||||
|
||||
```ts
|
||||
import { SpeechQueue } from "@wenwumap/tts";
|
||||
|
||||
const tts = new SpeechQueue({
|
||||
endpoint: "/api/tts", // 你的后端 TTS 接口
|
||||
voice: "longxiaochun_v3", // 默认音色(可选)
|
||||
lang: "zh-CN", // 浏览器朗读兜底语言(可选)
|
||||
});
|
||||
|
||||
// 必须在“用户点击”里调一次,解锁浏览器自动播放
|
||||
button.addEventListener("click", () => {
|
||||
tts.unlock();
|
||||
tts.speakWhole("你好,我是这件文物,已经三千岁啦。");
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 两种典型用法
|
||||
|
||||
### 4.1 一次性朗读整段(已有完整文本)
|
||||
|
||||
```ts
|
||||
tts.unlock(); // 用户手势内
|
||||
tts.speakWhole(fullText); // 内部自动按句切分、排队、顺序播放
|
||||
```
|
||||
|
||||
### 4.2 边流式边合成(配合 LLM 流式输出,首声最快)
|
||||
|
||||
```ts
|
||||
tts.unlock(); // 用户点击“发送”时
|
||||
tts.begin(); // 开启一轮朗读会话
|
||||
|
||||
for await (const delta of llmStream) { // 大模型逐 token 输出
|
||||
appendToUI(delta);
|
||||
tts.feed(delta); // 凑齐整句即开始合成、播放
|
||||
}
|
||||
|
||||
tts.flush(); // 流结束,朗读剩余尾句
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. React 用法
|
||||
|
||||
直接用 `useRef` 持有实例即可(引擎自带状态,组件只需同步 UI):
|
||||
|
||||
```tsx
|
||||
import { useCallback, useEffect, useRef, useState } from "react";
|
||||
import { SpeechQueue } from "@wenwumap/tts";
|
||||
|
||||
function useTts(endpoint: string) {
|
||||
const ref = useRef<SpeechQueue | null>(null);
|
||||
const [speakingTag, setSpeakingTag] = useState<unknown>(null);
|
||||
|
||||
const get = useCallback(() => {
|
||||
if (!ref.current) {
|
||||
ref.current = new SpeechQueue({
|
||||
endpoint,
|
||||
onSpeakingChange: (tag) => setSpeakingTag(tag), // tag=null 表示停止
|
||||
});
|
||||
}
|
||||
return ref.current;
|
||||
}, [endpoint]);
|
||||
|
||||
useEffect(() => () => ref.current?.destroy(), []);
|
||||
return { get, speakingTag };
|
||||
}
|
||||
|
||||
// 组件内:
|
||||
const { get, speakingTag } = useTts("/api/tts");
|
||||
|
||||
// 发送提问(用户手势)
|
||||
function onSend(idx: number) {
|
||||
const q = get();
|
||||
q.unlock();
|
||||
q.setVoice("longsanshu_v3");
|
||||
q.begin(idx); // 用 idx 作为 tag,speakingTag === idx 即“这条在朗读”
|
||||
}
|
||||
// 流式: q.feed(delta);结束: get().flush()
|
||||
// 重播某条: get().speakWhole(text, idx)
|
||||
// 停止: get().stop()
|
||||
```
|
||||
|
||||
> `onSpeakingChange(tag)`:开始播放某会话时回调你传入的 `tag`(`begin(tag)` / `speakWhole(text, tag)`),停止或播完回调 `null`。用它驱动「朗读中」高亮。
|
||||
|
||||
---
|
||||
|
||||
## 6. API 参考
|
||||
|
||||
### `new SpeechQueue(options)`
|
||||
|
||||
| 选项 | 类型 | 默认 | 说明 |
|
||||
|---|---|---|---|
|
||||
| `endpoint` | `string` | (必填) | 后端 TTS 接口,`POST {text, voice}` → 音频二进制 |
|
||||
| `voice` | `string` | `""` | 默认音色 ID(由你的后端/TTS 服务定义) |
|
||||
| `lang` | `string` | `"zh-CN"` | 浏览器朗读兜底语言 |
|
||||
| `minSentenceLen` | `number` | `14` | 成句最小长度(去标记后字符数),越小起声越早、请求越碎 |
|
||||
| `maxInFlight` | `number` | `3` | 最大并发合成请求数 |
|
||||
| `fetchImpl` | `typeof fetch` | 全局 `fetch` | 自定义 fetch(如带鉴权头) |
|
||||
| `onSpeakingChange` | `(tag) => void` | — | 播放开始回调 tag,停止/结束回调 `null` |
|
||||
| `onError` | `(err) => void` | — | 合成/播放出错(非致命,会自动降级或跳过) |
|
||||
|
||||
### 方法
|
||||
|
||||
| 方法 | 说明 |
|
||||
|---|---|
|
||||
| `unlock()` | **必须在用户手势内同步调用一次**,解锁浏览器自动播放权限 |
|
||||
| `setVoice(v)` | 设置后续合成使用的音色 |
|
||||
| `begin(tag?)` | 开启一轮新的流式朗读会话(作废上一轮) |
|
||||
| `feed(delta)` | 喂入增量文本,凑齐整句即合成、播放 |
|
||||
| `flush()` | 流式结束,朗读尾部剩余文本 |
|
||||
| `speakWhole(text, tag?)` | 一次性朗读整段(内部 = begin + 切分入队 + flush) |
|
||||
| `stop()` | 停止播放、清空队列、取消浏览器朗读 |
|
||||
| `destroy()` | 释放资源(组件卸载时调用) |
|
||||
|
||||
### 工具函数
|
||||
|
||||
```ts
|
||||
import { stripMarkdown, splitSpeakable } from "@wenwumap/tts";
|
||||
|
||||
stripMarkdown("**加粗** `代码`"); // -> "加粗 代码"
|
||||
splitSpeakable("第一句。第二句还没完"); // -> { chunks: ["第一句。"], rest: "第二句还没完" }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. 重要注意事项
|
||||
|
||||
- **自动播放**:浏览器要求音频播放源于用户手势。务必在点击事件里**同步**调用一次 `unlock()`(它会播放一段静音占位以获授权)。否则首次自动朗读可能被拦截,此时降级到浏览器朗读或等用户手动点击。
|
||||
- **音色 ID 由谁定义**:`voice` 只是透传给你的后端;具体支持哪些音色取决于你接的 TTS 服务(如 CosyVoice 的 `longxiaochun_v3`、`longsanshu_v3` 等)。
|
||||
- **限流**:高并发下后端 TTS 可能 429。建议后端串行 + 退避;引擎本身已对失败段自动降级浏览器朗读,体验不中断。
|
||||
- **仅浏览器环境**:引擎使用 `Audio`/`speechSynthesis`,请在客户端(如 Next.js 的 `"use client"` 组件)中使用。
|
||||
|
||||
---
|
||||
|
||||
## 8. 工作原理(简述)
|
||||
|
||||
```
|
||||
feed(delta) ──► 累积文本,按句号/问号/换行切句
|
||||
│ (凑够 minSentenceLen)
|
||||
▼
|
||||
入队 + 受限并发合成 (maxInFlight)
|
||||
│ 失败→标记“浏览器朗读”
|
||||
▼
|
||||
按入队顺序逐段播放(audio 或 speechSynthesis)
|
||||
│
|
||||
onSpeakingChange(tag / null) 通知 UI
|
||||
```
|
||||
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"name": "@wenwumap/tts",
|
||||
"version": "0.1.0",
|
||||
"private": true,
|
||||
"description": "通用、与框架无关的流式语音合成(TTS)播放引擎:边流式边合成、按句排队、浏览器朗读兜底。",
|
||||
"main": "./src/index.ts",
|
||||
"types": "./src/index.ts",
|
||||
"scripts": {
|
||||
"type-check": "tsc --noEmit"
|
||||
},
|
||||
"devDependencies": {
|
||||
"typescript": "^5.5.3"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,3 @@
|
||||
export { SpeechQueue } from "./speech-queue";
|
||||
export type { SpeechQueueOptions } from "./speech-queue";
|
||||
export { stripMarkdown, splitSpeakable } from "./text";
|
||||
@@ -0,0 +1,313 @@
|
||||
import { splitSpeakable, stripMarkdown } from "./text";
|
||||
|
||||
export interface SpeechQueueOptions {
|
||||
/** 后端 TTS 接口地址:POST { text, voice } -> 音频二进制(mp3/wav 等) */
|
||||
endpoint: string;
|
||||
/** 默认音色(可随时用 setVoice 覆盖) */
|
||||
voice?: string;
|
||||
/** 浏览器朗读兜底语言,默认 zh-CN */
|
||||
lang?: string;
|
||||
/** 成句最小长度(去标记后),默认 14 */
|
||||
minSentenceLen?: number;
|
||||
/** 最大在途合成请求数,默认 3 */
|
||||
maxInFlight?: number;
|
||||
/** 自定义 fetch(默认使用全局 fetch) */
|
||||
fetchImpl?: typeof fetch;
|
||||
/** 播放状态变化:开始播放某会话时回调 tag,停止/结束时回调 null */
|
||||
onSpeakingChange?: (tag: unknown | null) => void;
|
||||
/** 合成或播放出错(不致命,会自动降级/跳过) */
|
||||
onError?: (err: unknown) => void;
|
||||
}
|
||||
|
||||
type Slot = string | null | "error" | "speech";
|
||||
// null = 合成中;string = 已就绪的 objectURL;"error" = 跳过;"speech" = 浏览器朗读兜底
|
||||
|
||||
/**
|
||||
* 通用流式 TTS 播放引擎(与框架无关,仅依赖浏览器 API)。
|
||||
*
|
||||
* 设计要点:
|
||||
* - 边流式边合成:feed() 持续喂入增量文本,凑齐整句即合成,降低首声延迟。
|
||||
* - 顺序播放:合成可并发/乱序完成,但严格按入队顺序播放。
|
||||
* - 双轨兜底:专业 TTS 失败(限流/异常)自动降级到浏览器 Web Speech,绝不静默。
|
||||
* - 自动播放授权:unlock() 须在用户手势内调用一次。
|
||||
*/
|
||||
export class SpeechQueue {
|
||||
private readonly endpoint: string;
|
||||
private readonly lang: string;
|
||||
private readonly minLen: number;
|
||||
private readonly maxInFlight: number;
|
||||
private readonly fetchImpl: typeof fetch;
|
||||
private readonly onSpeakingChange?: (tag: unknown | null) => void;
|
||||
private readonly onError?: (err: unknown) => void;
|
||||
|
||||
private voice: string;
|
||||
private audio: HTMLAudioElement | null = null;
|
||||
private unlocked = false;
|
||||
|
||||
private texts: string[] = [];
|
||||
private slots: Slot[] = [];
|
||||
private playIdx = 0;
|
||||
private nextFetch = 0;
|
||||
private inFlight = 0;
|
||||
private playing = false;
|
||||
private streamDone = false;
|
||||
private pending = "";
|
||||
private tag: unknown = null;
|
||||
private session = 0;
|
||||
|
||||
private static readonly SILENT_WAV =
|
||||
"data:audio/wav;base64,UklGRjIAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAAAAAA==";
|
||||
|
||||
constructor(opts: SpeechQueueOptions) {
|
||||
this.endpoint = opts.endpoint;
|
||||
this.voice = opts.voice ?? "";
|
||||
this.lang = opts.lang ?? "zh-CN";
|
||||
this.minLen = opts.minSentenceLen ?? 14;
|
||||
this.maxInFlight = opts.maxInFlight ?? 3;
|
||||
this.fetchImpl = opts.fetchImpl ?? globalThis.fetch?.bind(globalThis);
|
||||
this.onSpeakingChange = opts.onSpeakingChange;
|
||||
this.onError = opts.onError;
|
||||
}
|
||||
|
||||
/** 当前音色 */
|
||||
setVoice(v: string): void {
|
||||
if (v) this.voice = v;
|
||||
}
|
||||
|
||||
private getAudio(): HTMLAudioElement {
|
||||
if (!this.audio) {
|
||||
this.audio = new Audio();
|
||||
this.audio.preload = "auto";
|
||||
}
|
||||
return this.audio;
|
||||
}
|
||||
|
||||
/** 必须在用户手势(点击)同步调用一次:解锁音频自动播放权限 */
|
||||
unlock(): void {
|
||||
if (this.unlocked || typeof window === "undefined") return;
|
||||
const el = this.getAudio();
|
||||
try {
|
||||
el.src = SpeechQueue.SILENT_WAV;
|
||||
el.muted = true;
|
||||
const p = el.play();
|
||||
if (p && typeof p.then === "function") {
|
||||
p.then(() => {
|
||||
el.pause();
|
||||
el.currentTime = 0;
|
||||
el.muted = false;
|
||||
this.unlocked = true;
|
||||
}).catch(() => {});
|
||||
} else {
|
||||
this.unlocked = true;
|
||||
}
|
||||
} catch {
|
||||
/* ignore */
|
||||
}
|
||||
}
|
||||
|
||||
/** 开启一个新的朗读会话(作废旧会话)。tag 用于标识当前在读的内容(如消息下标) */
|
||||
begin(tag: unknown = null): void {
|
||||
this.session += 1;
|
||||
this.revokeAll();
|
||||
this.texts = [];
|
||||
this.slots = [];
|
||||
this.playIdx = 0;
|
||||
this.nextFetch = 0;
|
||||
this.inFlight = 0;
|
||||
this.playing = false;
|
||||
this.streamDone = false;
|
||||
this.pending = "";
|
||||
this.tag = tag;
|
||||
if (typeof window !== "undefined") window.speechSynthesis?.cancel();
|
||||
if (this.audio) {
|
||||
this.audio.pause();
|
||||
this.audio.onended = null;
|
||||
}
|
||||
}
|
||||
|
||||
/** 流式喂入增量文本:凑齐整句即入队合成 */
|
||||
feed(delta: string): void {
|
||||
this.pending += delta;
|
||||
const { chunks, rest } = splitSpeakable(this.pending, this.minLen);
|
||||
this.pending = rest;
|
||||
for (const c of chunks) this.enqueue(c);
|
||||
}
|
||||
|
||||
/** 流式结束:把剩余文本作为最后一段入队并尝试播放 */
|
||||
flush(): void {
|
||||
const rest = this.pending.trim();
|
||||
this.pending = "";
|
||||
if (rest) this.enqueue(rest);
|
||||
this.streamDone = true;
|
||||
this.pumpPlay();
|
||||
}
|
||||
|
||||
/** 一次性朗读整段文本(如重播某条消息) */
|
||||
speakWhole(text: string, tag: unknown = null): void {
|
||||
this.begin(tag);
|
||||
const { chunks, rest } = splitSpeakable(text, this.minLen);
|
||||
for (const c of chunks) this.enqueue(c);
|
||||
if (rest.trim()) this.enqueue(rest);
|
||||
this.streamDone = true;
|
||||
this.pumpPlay();
|
||||
}
|
||||
|
||||
/** 停止播放并清空队列 */
|
||||
stop(): void {
|
||||
this.session += 1;
|
||||
this.streamDone = true;
|
||||
this.playing = false;
|
||||
this.revokeAll();
|
||||
this.texts = [];
|
||||
this.slots = [];
|
||||
this.playIdx = 0;
|
||||
this.nextFetch = 0;
|
||||
this.inFlight = 0;
|
||||
this.pending = "";
|
||||
this.tag = null;
|
||||
if (typeof window !== "undefined") window.speechSynthesis?.cancel();
|
||||
if (this.audio) {
|
||||
this.audio.pause();
|
||||
this.audio.onended = null;
|
||||
}
|
||||
this.onSpeakingChange?.(null);
|
||||
}
|
||||
|
||||
/** 释放资源(组件卸载时调用) */
|
||||
destroy(): void {
|
||||
this.session += 1;
|
||||
this.revokeAll();
|
||||
if (typeof window !== "undefined") window.speechSynthesis?.cancel();
|
||||
if (this.audio) this.audio.pause();
|
||||
}
|
||||
|
||||
// ===== 内部实现 =====
|
||||
|
||||
private revokeAll(): void {
|
||||
for (const u of this.slots) {
|
||||
if (typeof u === "string" && u.startsWith("blob:")) URL.revokeObjectURL(u);
|
||||
}
|
||||
}
|
||||
|
||||
private enqueue(text: string): void {
|
||||
this.texts.push(text);
|
||||
this.slots.push(null);
|
||||
this.pumpFetch();
|
||||
}
|
||||
|
||||
private pumpFetch(): void {
|
||||
while (this.inFlight < this.maxInFlight && this.nextFetch < this.texts.length) {
|
||||
const i = this.nextFetch++;
|
||||
this.inFlight += 1;
|
||||
void this.fetchChunk(i, this.session);
|
||||
}
|
||||
}
|
||||
|
||||
private async fetchChunk(i: number, session: number): Promise<void> {
|
||||
const clean = stripMarkdown(this.texts[i] ?? "").slice(0, 600);
|
||||
if (!clean) {
|
||||
if (session === this.session) {
|
||||
this.slots[i] = "error";
|
||||
this.pumpPlay();
|
||||
}
|
||||
return;
|
||||
}
|
||||
try {
|
||||
const res = await this.fetchImpl(this.endpoint, {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ text: clean, voice: this.voice }),
|
||||
});
|
||||
if (session !== this.session) return;
|
||||
if (!res.ok) {
|
||||
this.slots[i] = "speech"; // 专业 TTS 失败 → 浏览器朗读兜底
|
||||
} else {
|
||||
const blob = await res.blob();
|
||||
if (session !== this.session) return;
|
||||
this.slots[i] = URL.createObjectURL(blob);
|
||||
}
|
||||
} catch (err) {
|
||||
if (session === this.session) {
|
||||
this.slots[i] = "speech";
|
||||
this.onError?.(err);
|
||||
}
|
||||
}
|
||||
if (session === this.session) {
|
||||
this.inFlight = Math.max(0, this.inFlight - 1);
|
||||
this.pumpFetch();
|
||||
this.pumpPlay();
|
||||
}
|
||||
}
|
||||
|
||||
private pumpPlay(): void {
|
||||
if (this.playing) return;
|
||||
const i = this.playIdx;
|
||||
if (i >= this.texts.length) {
|
||||
if (this.streamDone) {
|
||||
this.playing = false;
|
||||
this.onSpeakingChange?.(null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
const slot = this.slots[i];
|
||||
if (slot === null || slot === undefined) return; // 合成中,待回调
|
||||
if (slot === "error") {
|
||||
this.playIdx = i + 1;
|
||||
this.pumpPlay();
|
||||
return;
|
||||
}
|
||||
if (slot === "speech") {
|
||||
this.playViaBrowser(i);
|
||||
return;
|
||||
}
|
||||
this.playViaAudio(i, slot);
|
||||
}
|
||||
|
||||
private playViaAudio(i: number, url: string): void {
|
||||
const el = this.getAudio();
|
||||
el.src = url;
|
||||
el.muted = false;
|
||||
this.playing = true;
|
||||
this.onSpeakingChange?.(this.tag);
|
||||
el.onended = () => {
|
||||
this.playing = false;
|
||||
if (url.startsWith("blob:")) URL.revokeObjectURL(url);
|
||||
this.slots[i] = "error";
|
||||
this.playIdx = i + 1;
|
||||
this.pumpPlay();
|
||||
};
|
||||
el.play().catch((err) => {
|
||||
this.playing = false;
|
||||
this.onError?.(err);
|
||||
this.onSpeakingChange?.(null);
|
||||
});
|
||||
}
|
||||
|
||||
private playViaBrowser(i: number): void {
|
||||
const text = stripMarkdown(this.texts[i] ?? "");
|
||||
const synth = typeof window !== "undefined" ? window.speechSynthesis : undefined;
|
||||
if (!text || !synth) {
|
||||
this.playIdx = i + 1;
|
||||
this.pumpPlay();
|
||||
return;
|
||||
}
|
||||
this.playing = true;
|
||||
this.onSpeakingChange?.(this.tag);
|
||||
const done = () => {
|
||||
this.playing = false;
|
||||
this.slots[i] = "error";
|
||||
this.playIdx = i + 1;
|
||||
this.pumpPlay();
|
||||
};
|
||||
try {
|
||||
const u = new SpeechSynthesisUtterance(text);
|
||||
u.lang = this.lang;
|
||||
u.rate = 1;
|
||||
u.onend = done;
|
||||
u.onerror = done;
|
||||
synth.speak(u);
|
||||
} catch {
|
||||
done();
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
/**
|
||||
* 文本处理工具:用于把 Markdown 回答转成适合朗读的纯文本,
|
||||
* 以及在流式输出时按句切分,便于「边流式边合成」。
|
||||
*/
|
||||
|
||||
/** 去除 Markdown 标记,得到适合朗读的纯文本 */
|
||||
export function stripMarkdown(md: string): string {
|
||||
return md
|
||||
.replace(/```[\s\S]*?```/g, "")
|
||||
.replace(/`([^`]+)`/g, "$1")
|
||||
.replace(/!\[[^\]]*\]\([^)]*\)/g, "")
|
||||
.replace(/\[([^\]]+)\]\([^)]*\)/g, "$1")
|
||||
.replace(/^#{1,6}\s+/gm, "")
|
||||
.replace(/^\s*>\s?/gm, "")
|
||||
.replace(/^\s*[-*+]\s+/gm, "")
|
||||
.replace(/\*\*([^*]+)\*\*/g, "$1")
|
||||
.replace(/\*([^*]+)\*/g, "$1")
|
||||
.replace(/_{1,2}([^_]+)_{1,2}/g, "$1")
|
||||
.replace(/~~([^~]+)~~/g, "$1")
|
||||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
|
||||
/**
|
||||
* 按句子切分文本:返回可朗读的完整句子块 chunks(每块去标记后长度 >= minLen),
|
||||
* 以及尾部尚未成句的剩余文本 rest。用于边流式边合成、降低首声延迟。
|
||||
*/
|
||||
export function splitSpeakable(
|
||||
text: string,
|
||||
minLen = 14
|
||||
): { chunks: string[]; rest: string } {
|
||||
const chunks: string[] = [];
|
||||
let rest = text;
|
||||
let buf = "";
|
||||
const re = /^[\s\S]*?[。!?!?\n;;…]+/;
|
||||
let m: RegExpExecArray | null;
|
||||
while ((m = re.exec(rest))) {
|
||||
buf += m[0];
|
||||
rest = rest.slice(m[0].length);
|
||||
if (stripMarkdown(buf).length >= minLen) {
|
||||
chunks.push(buf);
|
||||
buf = "";
|
||||
}
|
||||
}
|
||||
rest = buf + rest;
|
||||
return { chunks, rest };
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"extends": "../../tsconfig.base.json",
|
||||
"compilerOptions": {
|
||||
"rootDir": "./src",
|
||||
"outDir": "./dist",
|
||||
"lib": ["dom", "dom.iterable", "ES2022"],
|
||||
"types": []
|
||||
},
|
||||
"include": ["src"]
|
||||
}
|
||||
Reference in New Issue
Block a user