我正在尝试获取流音频,并使用谷歌文本到语音转换为文本。然后将该文本作为输入传递到不在Watson上的对话。然后沃森返回答案。后半部分效果很好。
我遇到的问题是,我无法让脚本将录制的语音中的文本传递给我创建的Watson服务。
我没有错误,我什么都没有。麦克风正在工作(我用另一个脚本测试过)。这个程序实际上表明我可以理解我的回答(我想没有文本)。这是我的密码
import os
import watson_developer_cloud
import speech_recognition as sr
from gtts import gTTS
import watson_developer_cloud
import time
# Set up Assistant service.
service = watson_developer_cloud.AssistantV1(
#username = 'USERNAME', # replace with service username
#password = 'PASSWORD', # replace with service password
iam_api_key = 'xxxxxxxxxx', # replace with service username
url = 'xxxxxxxxxx', # replace with service password
version = 'xxxxxxxxxx'
)
workspace_id = 'xxxxxxxxxxxxxx' # replace with workspace ID
def getaudiodevices():
devices = os.popen("arecord -l")
device_string = devices.read()
device_string = device_string.split("\n")
for line in device_string:
if line.find("card") != -1:
print("hw:" + line[line.find("card") + 5] + "," + line[line.find("device") + 7])
def speak(audiostring):
print(audiostring)
tts = gTTS(text=audiostring, lang='en')
tts.save('audio.mp3')
os.system('mpg321 audio.mp3')
def recordaudio():
# Record Audio
r = sr.Recognizer()
with sr.Microphone(0) as source:
print("Say something!")
audio = r.listen(source,phrase_time_limit=10)
# Speech recognition ******
data = " "
try:
data = r.recognize_google(audio)
print("You said: " + data)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return data
# Initialize with empty value to start the conversation.
user_input = ''
context = {}
current_action = ''
# Main input/output loop
while current_action != 'end_conversation':
# Send message to Assistant service.
response = service.message(
workspace_id = workspace_id,
input = {
'text': user_input
},
context = context
)
# Print the output from dialog, if any.
if response['output']['text']:
print(response['output']['text'][0])
speak(response['output']['text'][0])
# Update the stored context with the latest received from the dialog.
context = response['context']
# Check for action flags sent by the dialog.
if 'action' in response['output']:
current_action = response['output']['action']
# User asked what time it is, so we output the local system time.
if current_action == 'display_time':
print('The current time is ' + time.strftime('%I:%M:%S %p') + '.')
speak('The current time is ' + time.strftime('%I:%M:%S %p') + '.')
# If we're not done, prompt for next round of input.
if current_action != 'end_conversation':
user_input = input('>> ')
现在我可以用键盘来写演讲稿,而且很管用。我希望用户输入来自使用Google文本到语音转换(text-to-speech)从音频转录生成的文本。我需要将录制的音频中的数据放入Python脚本的主要部分,在那里它与Watson服务通信。