B4J Question [PyBridge] TTS with Pocket (Slow to play)

LGS · Mar 14, 2026

Hi everyone.
I'm testing the example [PyBridge] Text to speech with Pocket TTS , and after clicking the Speak button, the sound plays after 4 or 6 seconds.
How can I make the sound play as soon as possible?

Thanks in advance.

zed · Mar 14, 2026

The idea is to use streaming. The goal is for the sound to start playing after 200–400 ms, instead of waiting for the entire sentence to be generated.

Pocket-TTS offers a function called "tts_model.generate_audio_stream(voice_state, text)".

It returns small audio chunks as they are generated.
So you can receive a chunk, play it immediately, and continue receiving the next ones.

Script: python_streaming.py:

from pocket_tts import TTSModel
import numpy as np

# Load the model and voice only once
tts_model = TTSModel.load_model()
voice_state = tts_model.get_state_for_audio_prompt("alba")

def stream_tts(text):
    # Returns a list of audio chunks (numpy arrays)
    chunks = []
    for chunk in tts_model.generate_audio_stream(voice_state, text):
        chunks.append(chunk)
    return chunks

With this, we can play each chunk as soon as it arrives, which greatly reduces latency.

B4X:

Private Sub Button1_Click
    StreamSpeak(TextArea1.Text)
End Sub

Private Sub StreamSpeak(Text As String)
    Dim PyStream As PyWrapper = Py.ImportModule("python_streaming")
    Dim Result As PyWrapper = PyStream.Run("stream_tts").Arg(Text)

    ' Résultat = liste de chunks numpy
    Dim chunks As List = Result.ToList

    For Each chunk As PyWrapper In chunks
        PlayChunk(chunk)
    Next
End Sub

Private Sub PlayChunk(chunk As PyWrapper)
    ' Convertir chunk numpy → bytes WAV
    Dim buffer As PyWrapper = IO.Run("BytesIO")
    ScipyWavfile.Run("write").Arg(buffer).Arg(TTSModel.GetField("sample_rate")).Arg(chunk)
    Winsound.Run("PlaySound").Arg(buffer.Run("getvalue")).Arg(Winsound.GetField("SND_MEMORY"))
End Sub

With this code, the first chunk arrives in 200–400 ms. The sound starts immediately, and the phrase continues to generate during playback.
The perceived latency is very low.

Untested code.

https://github.com/kyutai-labs/pocket-tts/blob/main/docs/API%20Reference/python-api.md

LGS · Mar 14, 2026

Hi Zed, thanks for your support.

I initially tried testing your B4X code in the example, and on line

Warning #22:

    ' Résultat = liste de chunks numpy
    Dim chunks As List = Result.ToList

I get a "Type mismatch" warning.

I'm not really familiar enough with Python code to implement it in the example.

Thanks again for your help.

zed · Mar 15, 2026

This is just an example. You need to adapt it to your code.
Posting your project would help us understand what isn't working correctly.

LGS · Mar 15, 2026

I'm testing Erel's example.
This one here

Code

B4J Question [PyBridge] TTS with Pocket (Slow to play)

LGS

Member

zed

Well-Known Member

LGS

Member

zed

Well-Known Member

LGS

Member

Similar Threads