I think voice messages in instant messengers are a nice feature, but a voice message of 6 minutes is a pain. When you listen to it? Where are the key parts of the messages?
For that reason I've been working on a Telegram chat bot to transcribe these audio messages.
You just need to forward the message to the bot and it automatically sends the audio to OpenAI servers to transcribe and return the message as text to the chat.
git clone https://github.com/lardissone/telegram-ai-bot.git
cd telegram-ai-bot
settings.example.py
to settings.py
: cp settings.example.py settings.py
settings.py
file. You'll need to enter OPENAI_API_KEY
, TELEGRAM_BOT_TOKEN
and TELEGRAM_CHAT_IDS
.docker compose up -d
You just need to send the bot a voice message and it automatically will transcribe for you.
Note: I tried with English and Spanish, and both worked perfectly.
I've also used it for sending a text message when I'm on the car, by just sending a voice message directly to the bot and then forwarding the resulting text to my contact.
It also includes a quick command called /image
to generate a image using DALL-E. You just need to send /image prompt
where the prompt is what you want the AI create for you.