B4A Class OpenAI Realtime Speech2Speech

🔊 Realtime Speech-to-Speech with OpenAI GPT-4o​


This small example project demonstrates how to build a realtime speech-to-speech assistant in B4A using OpenAI’s GPT-4o realtime API and a direct WebSocket connection — all in just about 100 lines of code!

Example with functioncalls in my current and biggest B4A Project (german audio).
Code with function calls will be added soon!



You need to enter a valid OpenAI API Key with access to GPT-4o realtime in the class globals!

🎯 Features:​

  • Realtime speech input & AI-generated speech output via multimodal model
  • Uses the AudioStreamer object for input and playback
  • Fully event-driven WebSocket communication
  • Minimal layout with a ToggleButton and Label

🧠 How It Works​

  1. Audio Input (Microphone → AI):
    • When you press the ToggleButton, the app starts recording audio via AudioStreamer.
    • Each audio buffer is base64-encoded and sent in realtime using a WebSocket message of type input_audio_buffer.append.
  2. Session Initialization:
    • On connection, the app sends a session.update JSON payload defining:
      • Input/output audio format (pcm16)
      • Voice ("sage")
      • Transcription model (whisper-1)
      • Turn detection and silence thresholds
      • Modalities (["text", "audio"])
  3. AI Response (Audio):
    • The API sends audio chunks as response.audio.delta messages, base64-encoded.
    • Audio response chunks are decoded and written directly to AudioStreamer.
  4. Looping Conversation:
    • On response.audio.done, playback finishes and switches back to recording — enabling seamless continuous conversation.
To ensure a clean audio interaction and avoid false detections or feedback issues, the microphone is deactivated during the playback of the AI's response. This design decision was made because in testing, the recorder consistently picked up the spoken output from the device’s speakers and interpreted it as new input.
As a result:
  • The response cannot be interrupted once playback starts. (Except of the toggle switch, but new start will create a new session)
  • The system resumes listening only after the full audio response has finished.
  • This avoids self-triggering loops or premature cut-offs of speech.

B4X:
Sub Class_Globals
    Private Root As B4XView
    Private xui As XUI
    Private streamer As AudioStreamer
    Private rp As RuntimePermissions
    Private su As StringUtils
    Private socket As WebSocket
    Private sessionId As String
    Private apiKey As String = "sk-proj-..."
    Private lblStatus As Label
    Private btnToggle As ToggleButton
End Sub

' Initializes the audio streamer and WebSocket connection
Public Sub Initialize
    streamer.Initialize("streamer", 24000, True, 16, streamer.VOLUME_MUSIC)
    socket.Initialize("socket")
End Sub

' Page setup and permission request
Private Sub B4XPage_Created (Root1 As B4XView)
    Root = Root1
    B4XPages.SetTitle(Me, "Realtime Speech2Speech")
    Root.LoadLayout("MainPage")

    rp.CheckAndRequest(rp.PERMISSION_RECORD_AUDIO)
    Wait For B4XPage_PermissionResult (Permission As String, Result As Boolean)
    If Result = False Then
        lblStatus.Text = "Microphone permission is required"
        btnToggle.Enabled = False
    End If
End Sub

' Establishes the WebSocket connection to OpenAI's Realtime API
Private Sub Connect
    socket.Headers = CreateMap( _
        "Authorization": "Bearer " & apiKey, _
        "OpenAI-Beta": "realtime=v1")
    socket.Connect("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview")
End Sub

' Handles incoming messages from the WebSocket
Private Sub socket_TextMessage (Message As String)
    Dim response As Map = Message.As(JSON).ToMap
    Dim eventType As String = response.Get("type")
    Select eventType
        Case "response.created"
            streamer.StopRecording
            streamer.StartPlaying
        Case "session.created"
            sessionId = response.Get("event_id")
        Case "session.updated"
            lblStatus.Text = "Listening..."
        Case "response.audio.delta"
            Dim audioData As String = response.Get("delta")
            streamer.Write(su.DecodeBase64(audioData))
            lblStatus.Text = "Playing response..."
        Case "response.audio.done"
            streamer.Write(Null)
    End Select
End Sub

' Sends the recorded audio buffer to the WebSocket
Private Sub streamer_RecordBuffer (Buffer() As Byte)
    Dim encoded As String = su.EncodeBase64(Buffer)
    socket.SendText($"{
        "event_id": "${sessionId}",
        "type": "input_audio_buffer.append",
        "audio": "${encoded}"
    }"$)
End Sub

' Restarts recording after playback finishes
Private Sub streamer_PlaybackComplete
    streamer.StartRecording
    lblStatus.Text = "Listening..."
End Sub

' Handles toggle button to start or stop the streaming session
Private Sub btnToggle_CheckedChange(Checked As Boolean)
    If Checked Then
        lblStatus.Text = "Connecting..."
        Connect
        Wait For socket_Connected
        socket.SendText(SessionUpdateMessage)
        streamer.StartRecording
    Else
        socket.Close
        streamer.StopRecording
        streamer.StopPlaying
        lblStatus.Text = "Session stopped"
    End If
End Sub

' Returns the JSON string for session setup
Private Sub SessionUpdateMessage As String
    Return $"{
        "event_id": "${sessionId}",
        "type": "session.update",
        "session": {
            "modalities": ["text", "audio"],
            "voice": "sage",
            "input_audio_format": "pcm16",
            "output_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "whisper-1"
            },
            "input_audio_noise_reduction": {
                "type": "far_field"
            },
            "turn_detection": {
                "type": "server_vad",
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 500,
                "create_response": true
            },
            "tools": [],
            "tool_choice": "auto",
            "temperature": 0.8,
            "max_response_output_tokens": "inf"
        }
    }"$
End Sub
 

Attachments

  • Speech2Speech.zip
    11.2 KB · Views: 223
Last edited:

Blueforcer

Well-Known Member
Licensed User
Longtime User
Playing more with function calls and UI.
The App UI is NOT hardcoded, each page and card item is generic and loaded from a project json. And GPT is still able to find everything!

Use youtube translation subtitles to understand whats goin on :)
 

JohnC

Expert
Licensed User
Longtime User
The App UI is NOT hardcoded, each page and card item is generic and loaded from a project json. And GPT is still able to find everything!
Can you elaborate on this comment (ie. sample json, what do you mean by the AI finds everything)?

Also, if you created that youtube video, would it also be possible to modify your example project to show how the AI can interact with the UI of the app?
 

hanyelmehy

Well-Known Member
Licensed User
Longtime User
Hi when try get error for this line
B4X:
socket.Headers = CreateMap("Authorization": "Bearer " & apiKey,"OpenAI-Beta": "realtime=v1")

Error :
socket.Headers = CreateMap(\
src\de\dinotec\bot\b4xmainpage.java:244: error: cannot find symbol
__ref._socket /*anywheresoftware.b4a.objects.WebSocketWrapper*/ .Headers = __c.createMap(new Object[] {(Object)("Authorization"),(Object)("Bearer "+__ref._apikey /*String*/ ),(Object)("OpenAI-Beta"),(Object)("realtime=v1")});
                                                                ^
  symbol:   variable Headers
  location: variable _socket of type WebSocketWrapper
1 error
Sloved :i was have another WebSocke lib version
 
Last edited:

Jerryk

Active Member
Licensed User
Longtime User
Error occurred on line: 91 (B4XMainPage)
java.lang.IllegalStateException: startRecording() called on an uninitialized AudioRecord.
at android.media.AudioRecord.startRecording(AudioRecord.java:1403)
at anywheresoftware.b4a.audio.AudioStreamer.StartRecording(AudioStreamer.java:98)
at de.dinotec.bot.b4xmainpage$ResumableSub_btnToggle_CheckedChange.resume(b4xmainpage.java:203)
at anywheresoftware.b4a.shell.DebugResumableSub$DelegatableResumableSub.resumeAsUserSub(DebugResumableSub.java:48)
at java.lang.reflect.Method.invoke(Native Method)
at anywheresoftware.b4a.shell.Shell.runMethod(Shell.java:732)
at anywheresoftware.b4a.shell.Shell.raiseEventImpl(Shell.java:348)
at anywheresoftware.b4a.shell.Shell.raiseEvent(Shell.java:255)
at java.lang.reflect.Method.invoke(Native Method)
at anywheresoftware.b4a.ShellBA.raiseEvent2(ShellBA.java:157)
at anywheresoftware.b4a.BA.raiseEvent(BA.java:201)
at anywheresoftware.b4a.shell.DebugResumableSub$DelegatableResumableSub.resume(DebugResumableSub.java:43)
at anywheresoftware.b4a.BA.checkAndRunWaitForEvent(BA.java:275)
at anywheresoftware.b4a.ShellBA.raiseEvent2(ShellBA.java:150)
at anywheresoftware.b4a.BA.raiseEvent(BA.java:201)
at anywheresoftware.b4a.objects.WebSocketWrapper$1.onOpen(WebSocketWrapper.java:94)
at io.crossbar.autobahn.websocket.WebSocketConnection$3.handleMessage(WebSocketConnection.java:535)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loopOnce(Looper.java:240)
at android.os.Looper.loop(Looper.java:351)
at android.app.ActivityThread.main(ActivityThread.java:8423)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:584)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)
 

asales

Expert
Licensed User
Longtime User
Error occurred on line: 91 (B4XMainPage)
java.lang.IllegalStateException: startRecording() called on an uninitialized AudioRecord.
at android.media.AudioRecord.startRecording(AudioRecord.java:1403)
You can fix it moving this lines:
B4X:
streamer.Initialize("streamer", 24000, True, 16, streamer.VOLUME_MUSIC)
socket.Initialize("socket")
from Initialize sub to B4XPage_Created sub (after request the permission).
But I couldn't make the example works.
 

Blueforcer

Well-Known Member
Licensed User
Longtime User
Hi when try get error for this line
B4X:
socket.Headers = CreateMap("Authorization": "Bearer " & apiKey,"OpenAI-Beta": "realtime=v1")

Error :
socket.Headers = CreateMap(\
src\de\dinotec\bot\b4xmainpage.java:244: error: cannot find symbol
__ref._socket /*anywheresoftware.b4a.objects.WebSocketWrapper*/ .Headers = __c.createMap(new Object[] {(Object)("Authorization"),(Object)("Bearer "+__ref._apikey /*String*/ ),(Object)("OpenAI-Beta"),(Object)("realtime=v1")});
                                                                ^
  symbol:   variable Headers
  location: variable _socket of type WebSocketWrapper
1 error
Make sure to use the latest websocketclient to support headers
 
Top