This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:
  1. As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
  2. The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
  3. Remember to add RECORD_AUDIO permission.
How to use:
  1. Download the required voice model from here.
  2. Change the file name to a simple one like "model.zip"
  3. Copy it to the Files folder of your project.
  4. Now to use that model check the attached example.

SpeechToText

Author:
@Biswajit
Version: 1.5
  • SpeechToText
    • Events:
      • Error (message As String)
      • FinalResult (text As String)
      • MicrophoneBuffer (buffer() As Byte)
      • PartialResult (text As String)
      • Paused (paused As Boolean)
      • ReadyToListen
      • ReadyToListenEx new
      • ReadyToRead
      • Restarted
      • Result (text As String)
    • Fields:
      • sampleRate As Int
        Default 16000
    • Functions:
      • cancel As Boolean
        Cancel microphone recognition. Do not post any new events, simply cancel processing.
        Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
      • FeedExternalBuffer (ExBuffer As Byte()) new
        For recognizing the external audio buffer, feed the buffer here.
        ExBuffer: The external audio byte buffer.
      • Initialize (eventName As String, modelPath As String)
        Initialize the object.
        eventName: The event name prefix.
        modelPath: The model folder path.
      • pause (pause As Boolean)
        Pause microphone recognition.
        pause: Pass true to pause and false to continue.
      • prepareAudioFile (audioPath As String, predefinedWords As String)
        Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
        Call startReading to start reading the file.
        audioPath: Audio file path.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareListenerEx (predefinedWords As String) new
        Prepare the listener for external audio buffer. On success Eventname_ReadyToListenEx event will be raised.
        Call startListeningEx to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareMicrophone (predefinedWords As String)
        Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
        Call startListening to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • reset
        Resets microphone recognizer in a thread, starts microphone recognition over again
      • shutdown
        Shutdown the microphone recognizer and release the recorder.
        Call this on activity or service closing event.
      • startListening (timeout As Int) As Boolean
        Starts microphone recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • startListeningEx As Boolean new
        Starts external audio buffer recognition.
        Return type: @return:true if recognition was actually started
      • startReading (timeout As Int) As Boolean
        Starts file recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • stop As Boolean
        Stops microphone/file recognition. Listener should receive final result if there is
        any. Does nothing if recognition is not active.
        Call this on activity or service closing event.
        Return type: @return:true if recognition was actually stopped
Downloads:
  1. Library
  2. Example
  3. Voice Model
  4. Test app
Update:
  • Version 1.1:
    1. Added audio file to text functionality. (For now only WAV format is supported)
    2. Added predefined word/phrase detection functionality.
    3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
  • Version 1.2:
    1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
  • Version 1.3:
    1. Added method to change the sampling rate.
  • Version 1.4:
    1. Fixed the app crashing issue while calling shutdown without stating the recognizer
  • Version 1.5:
    1. Added option to feed external audio buffer. Instead of using the internal audio recorder you can feed external audio buffer from another audio source.
      (Check the latest example project)
    2. Updated VOSK and JNA library. (Please delete old dependencies before coping the new ones.)

If you like my work, please donate. Your donations will encourage me to add more features in the future.

 
Last edited:

Guenter Becker

Active Member
Licensed User
Im not sure to have understood your problem, anyway I can say how i managed my application STT+TTS to recognise and speak in many language.

1 - I saved all language model inside the app.
View attachment 152482

B4X:
'**************************************
Sub Process_Globals
    Private xui As XUI
    Public model_folder_name As String = "model"
    Private model_zip_name As String = "model.zip"
   
    Private rp As RuntimePermissions
    Private timer As Timer
    Dim awake As PhoneWakeState
    Dim STTList As List
   
    'STT
    Private STT As SpeechToText
   
    'TTS
    Dim TTS1 As TTS = Null
    Dim AppLang As String
    Dim Lang As String
   
    Dim FlgUsaTTS As Boolean = True
   
    Dim CfgData As Map
End Sub

then I call this routine that unzip the model file depending on the language required.

B4X:
'**************************************
Sub CreateSTT(Lingua As String)
    Log("CREATESTT")
   
    Dim STT As SpeechToText
   
    File.Copy(File.DirAssets, "model_" & Lingua & ".zip", File.DirInternal, model_zip_name)
    Dim ar As Archiver
    ar.AsyncUnZip(File.DirInternal,model_zip_name,File.DirInternal,"unzip")
   
    Log(Lingua)
   
    'unzip_UnZipDone
    Wait For unzip_UnZipDone(Completed As Boolean, Files As Int)
   
    If Completed=True Then
        STT.Initialize("STT", File.DirInternal & "/" & model_folder_name)
        STTList.Add(STT)
    End If
   
    STT.prepareMicrophone("")
   
    If FlgUsaTTS Then
        If TTS1.IsInitialized = False Then
            Log("init TTS")
            TTS1.Initialize("TTS1")
            TTS1.SetLanguage(Lingua, "")
            TTS1.Speak("START"), True)
        Else
            Log("TTS already")
            TTS1.SetLanguage(Lingua, "")
            TTS1.Speak("READY"), True)
        End If
    End If
End Sub

B4X:
'**************************************
Sub unzip_UnZipDone(CompletedWithoutError As Boolean, NbOfFiles As Int)
    If CompletedWithoutError Then
        ProgressDialogHide
        Log("Unzip completed")
    Else
        Log("Unzip Failed")
        MsgboxAsync("Failed","")
        Wait For MsgBox_Result (Result As Int)
    End If
    File.Delete(File.DirInternal,model_zip_name)
End Sub

'**************************************
Sub unzip_UnZipProgression(Count As Int, FileName As String)
    Log("Unzip File: "&Count)
End Sub

I hope this can be helpful.
Could you do me a favour to send me your B4A Project (without the model files) thank you.
 

drgottjr

Expert
Licensed User
Longtime User
do me a favor and don't start with somebody else before we're finished. by the way, we're finished.
you have the wrong language model. the small german model is 45MB. the file you have
is only 360KB. go back to post #1 and find the link to "language model". scroll down to
"vosk-model-small-de-0.15". download. change name to "model.zip", then open
.zip archive and change name of root folder to "model". if you compare it to the file you have
in your project, you'll see the difference.
 

Attachments

  • es_klappt.png
    es_klappt.png
    8.2 KB · Views: 101

Paolo Pini

Member
Licensed User
Longtime User
Could you do me a favour to send me your B4A Project (without the model files) thank you.
I apologise, but the application was developed for a client of mine and I cannot send it to you. However, I agree with drgottjr that you should complete your reasoning before trying anything else.
 

Guenter Becker

Active Member
Licensed User
do me a favor and don't start with somebody else before we're finished. by the way, we're finished.
you have the wrong language model. the small german model is 45MB. the file you have
is only 360KB. go back to post #1 and find the link to "language model". scroll down to
"vosk-model-small-de-0.15". download. change name to "model.zip", then open
.zip archive and change name of root folder to "model". if you compare it to the file you have
in your project, you'll see the difference.
Thank you very much,
i replaced the vosk model and changed the two global values in the main to vosk-model-small-de-0.15 and it works. As I see how easy it is I felt that something one in this case myself is stupid.
The only thing that is left for me is to translate number word in real numbers. I tried it but spoken numbers allways lead to irregular words as result sometimes it is ok and many times its not even if i speak slow and with good pronouncation..
So we may close the case.
 
Top