OpenAI Whisper Audio Transcriber

InfoGeneratePackages

The code defines a class `SpeechToTextAction` within a module `Sublayer::Actions`, which is designed to convert audio data to text. It takes an audio data input upon initialization. The `call` method processes this audio data by first storing it in a temporary file with a `.webm` extension, using the ASCII-8-bit encoding.

This temp file is then sent to the OpenAI API endpoint for audio transcriptions (`https://api.openai.com/v1/audio/transcriptions`) via a POST request. The request includes an authorization header with an API key from the environment variables and specifies the use of the `multipart/form-data` content type. It also sets the `model` to 'whisper-1' in the request body, indicating the use of a specific model for transcription.

After posting the audio data, the method cleans up by closing and deleting the temporary file. It returns the transcribed text extracted from the response of the OpenAI API, specifically from the `text` field of the response JSON.

module Sublayer
  module Actions
    class SpeechToTextAction < Base
      def initialize(audio_data)
        @audio_data = audio_data
      end

      def call
        tempfile = Tempfile.new(['audio', '.webm'], encoding: 'ascii-8bit')
        tempfile.write(@audio_data.read)
        tempfile.rewind

        text = HTTParty.post(
          "https://api.openai.com/v1/audio/transcriptions",
          headers: {
            "Authorization" => "Bearer #{ENV["OPENAI_API_KEY"]}",
            "Content-Type" => "multipart/form-data",
          },
          body: {
            file: tempfile,
            model: "whisper-1"
          })

        tempfile.close
        tempfile.unlink

        text["text"]
      end
    end
  end
end