快速入门

前提条件

确保你已遵循 Agents SDK 的基础快速入门说明，并设置了虚拟环境。然后，从 SDK 安装可选的语音依赖项：

pip install 'openai-agents[voice]'

概念

需要了解的主要概念是VoicePipeline，它是一个包含 3 个步骤的过程： 1. 运行语音转文本模型，将音频转换为文本。 2. 运行你的代码，通常是一个智能体工作流程，以生成结果。 3. 运行文本转语音模型，将结果文本转换回音频。

graph LR
    %% 输入
    A["🎤 音频输入"]

    %% 语音管道
    subgraph Voice_Pipeline [语音管道]
        direction TB
        B["转录（语音转文本）"]
        C["你的代码"]:::highlight
        D["文本转语音"]
        B --> C --> D
    end

    %% 输出
    E["🎧 音频输出"]

    %% 流程
    A --> Voice_Pipeline
    Voice_Pipeline --> E

    %% 自定义样式
    classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;

智能体

首先，我们来设置一些智能体。如果你使用这个SDK构建过任何智能体，那么这对你来说应该很熟悉。我们将创建几个智能体、一个交接（handoff）和一个工具。

import asyncio
import random

from agents import (
    Agent,
    function_tool,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions



@function_tool
def get_weather(city: str) -> str:
    """获取给定城市的天气。"""
    print(f"[调试] get_weather 被调用，城市为: {city}")
    choices = ["晴朗", "多云", "下雨", "下雪"]
    return f"{city} 的天气是 {random.choice(choices)}。"


spanish_agent = Agent(
    name="西班牙语",
    handoff_description="一个说西班牙语的智能体。",
    instructions=prompt_with_handoff_instructions(
        "你正在与人类对话，所以要有礼貌且简洁。用西班牙语交流。",
    ),
    model="gpt-4o-mini",
)

agent = Agent(
    name="助手",
    instructions=prompt_with_handoff_instructions(
        "你正在与人类对话，所以要有礼貌且简洁。如果用户说西班牙语，交接给西班牙语智能体。",
    ),
    model="gpt-4o-mini",
    handoffs=[spanish_agent],
    tools=[get_weather],
)

语音管道

我们将使用SingleAgentVoiceWorkflow作为工作流程，设置一个简单的语音管道。

from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))

运行管道

import numpy as np
import sounddevice as sd
from agents.voice import AudioInput


# 为简单起见，我们只创建3秒的静音

# 实际上，你会获取麦克风数据
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)

result = await pipeline.run(audio_input)


# 使用 `sounddevice` 创建一个音频播放器
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()

实时播放音频流

async for event in result.stream():
    if event.type == "voice_stream_event_audio":
        player.write(event.data)

整合所有内容

import asyncio
import random

import numpy as np
import sounddevice as sd

from agents import (
    Agent,
    function_tool,
    set_tracing_disabled,
)
from agents.voice import (
    AudioInput,
    SingleAgentVoiceWorkflow,
    VoicePipeline,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions


@function_tool
def get_weather(city: str) -> str:
    """获取给定城市的天气。"""
    print(f"[调试] get_weather 被调用，城市为: {city}")
    choices = ["晴朗", "多云", "下雨", "下雪"]
    return f"{city} 的天气是 {random.choice(choices)}。"


西班牙语代理 = Agent(
    name="西班牙语",
    handoff_description="一个说西班牙语的代理。",
    instructions=prompt_with_handoff_instructions(
        "你正在与人类对话，所以要有礼貌且简洁。用西班牙语交流。",
    ),
    model="gpt-4o-mini",
)

代理 = Agent(
    name="助手",
    instructions=prompt_with_handoff_instructions(
        "你正在与人类对话，所以要有礼貌且简洁。如果用户说西班牙语，切换到西班牙语代理。",
    ),
    model="gpt-4o-mini",
    handoffs=[西班牙语代理],
    tools=[get_weather],
)


async def main():
    pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(代理))
    buffer = np.zeros(24000 * 3, dtype=np.int16)
    audio_input = AudioInput(buffer=buffer)

    result = await pipeline.run(audio_input)

    # 使用 `sounddevice` 创建一个音频播放器
    player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
    player.start()

    # 实时播放音频流
    async for event in result.stream():
        if event.type == "voice_stream_event_audio":
            player.write(event.data)


if __name__ == "__main__":
    asyncio.run(main())

如果运行此示例，代理将与你对话！查看 examples/voice/static 中的示例，以了解你可以亲自与代理对话的演示。