B4A Library SpeechToText - Continuous Offline Voice Recognition

Biswajit · Oct 5, 2021

This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:

As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
Remember to add RECORD_AUDIO permission.

How to use:

Download the required voice model from here.
Change the file name to a simple one like "model.zip"
Copy it to the Files folder of your project.
Now to use that model check the attached example.

SpeechToText

Author: @Biswajit
Version: 1.5

SpeechToText
- Events:
  - Error (message As String)
  - FinalResult (text As String)
  - MicrophoneBuffer (buffer() As Byte)
  - PartialResult (text As String)
  - Paused (paused As Boolean)
  - ReadyToListen
  - ReadyToListenEx new
  - ReadyToRead
  - Restarted
  - Result (text As String)
- Fields:
  - sampleRate As Int
    Default 16000
- Functions:
  - cancel As Boolean
    Cancel microphone recognition. Do not post any new events, simply cancel processing.
    Does nothing if recognition is not active.
    Return type: @return:true if recognition was actually stopped
  - FeedExternalBuffer (ExBuffer As Byte()) new
    For recognizing the external audio buffer, feed the buffer here.
    ExBuffer: The external audio byte buffer.
  - Initialize (eventName As String, modelPath As String)
    Initialize the object.
    eventName: The event name prefix.
    modelPath: The model folder path.
  - pause (pause As Boolean)
    Pause microphone recognition.
    pause: Pass true to pause and false to continue.
  - prepareAudioFile (audioPath As String, predefinedWords As String)
    Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
    Call startReading to start reading the file.
    audioPath: Audio file path.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareListenerEx (predefinedWords As String) new
    Prepare the listener for external audio buffer. On success Eventname_ReadyToListenEx event will be raised.
    Call startListeningEx to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareMicrophone (predefinedWords As String)
    Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
    Call startListening to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - reset
    Resets microphone recognizer in a thread, starts microphone recognition over again
  - shutdown
    Shutdown the microphone recognizer and release the recorder.
    Call this on activity or service closing event.
  - startListening (timeout As Int) As Boolean
    Starts microphone recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - startListeningEx As Boolean new
    Starts external audio buffer recognition.
    Return type: @return:true if recognition was actually started
  - startReading (timeout As Int) As Boolean
    Starts file recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - stop As Boolean
    Stops microphone/file recognition. Listener should receive final result if there is
    any. Does nothing if recognition is not active.
    Call this on activity or service closing event.
    Return type: @return:true if recognition was actually stopped

Downloads:

Update:

Version 1.1:
1. Added audio file to text functionality. (For now only WAV format is supported)
2. Added predefined word/phrase detection functionality.
3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
Version 1.2:
1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
Version 1.3:
1. Added method to change the sampling rate.
Version 1.4:
1. Fixed the app crashing issue while calling shutdown without stating the recognizer
Version 1.5:
1. Added option to feed external audio buffer. Instead of using the internal audio recorder you can feed external audio buffer from another audio source.
  (Check the latest example project)
2. Updated VOSK and JNA library. (Please delete old dependencies before coping the new ones.)

If you like my work, please donate. Your donations will encourage me to add more features in the future.

Derek Johnson · Oct 21, 2021

Biswajit said:
It should be starting and ending with [ ] instead of { }

Yes thanks! This list then becomes the only words that are accepted as far as I can tell.

Janusz Chmiel · Oct 23, 2021

Dear specialists,
I have The serious issue. I have done my best to copy files from folders from GIthub tree of Czech language model. But unfortunately, I Am helpless to find The required model.conf file. I can not simply mix this file from English model with this Czech model. I have contacted The author of Czech model database on Github, but if author will not respond. How can i generate The file with The correct values? Or is it impossible?

Or it is not The rule, that every language model must contain this file?

Here is link to The corresponding Github tree again.

https://github.com/rhasspy/cs_kaldi-rhasspy
I Am very sad, because if author will not provide this file and if file can not be generated, model will be unusable and there is no next model trained on The Internet.

Biswajit · Oct 24, 2021

Janusz Chmiel said:
Dear specialists,
I have The serious issue. I have done my best to copy files from folders from GIthub tree of Czech language model. But unfortunately, I Am helpless to find The required model.conf file. I can not simply mix this file from English model with this Czech model. I have contacted The author of Czech model database on Github, but if author will not respond. How can i generate The file with The correct values? Or is it impossible?

Or it is not The rule, that every language model must contain this file?

Here is link to The corresponding Github tree again.

https://github.com/rhasspy/cs_kaldi-rhasspy
I Am very sad, because if author will not provide this file and if file can not be generated, model will be unusable and there is no next model trained on The Internet.

I dont think the kaldi voice model will work with vosk. I will check.

Janusz Chmiel · Oct 24, 2021

Biswajit said:
This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:

As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.

The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.

Remember to add RECORD_AUDIO permission.

How to use:

Download the required voice model from here.

Change the file name to a simple one like "model.zip"

Copy it to the Files folder of your project.

Now to use that model check the attached example.

SpeechToText

Author: @Biswajit
Version: 1.2

SpeechToText

Events:

Error (message As String)

FinalResult (text As String)

MicrophoneBuffer (buffer() As Byte) new

PartialResult (text As String)

Paused (paused As Boolean)

ReadyToListen

ReadyToRead

Restarted

Result (text As String)

Functions:

cancel As Boolean
Cancel microphone recognition. Do not post any new events, simply cancel processing.
Does nothing if recognition is not active.
Return type: @return:true if recognition was actually stopped

Initialize (eventName As String, modelPath As String)
Initialize the object.
eventName: The event name prefix.
modelPath: The model folder path.

pause (pause As Boolean)
Pause microphone recognition.
pause: Pass true to pause and false to continue.

prepareAudioFile (audioPath As String, predefinedWords As String)
Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
Call startReading to start reading the file.
audioPath: Audio file path.
predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.

prepareMicrophone (predefinedWords As String)
Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
Call startListening to start listening.
predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.

reset
Resets microphone recognizer in a thread, starts microphone recognition over again

shutdown
Shutdown the microphone recognizer and release the recorder.
Call this on activity or service closing event.

startListening (timeout As Int) As Boolean
Starts microphone recognition. After specified timeout listening stops and the
endOfSpeech signals about that. Does nothing if recognition is active.
timeout: timeout in milliseconds to listen. -1 = infinite;
Return type: @return:true if recognition was actually started

startReading (timeout As Int) As Boolean
Starts file recognition. After specified timeout listening stops and the
endOfSpeech signals about that. Does nothing if recognition is active.
timeout: timeout in milliseconds to listen. -1 = infinite;
Return type: @return:true if recognition was actually started

stop As Boolean
Stops microphone/file recognition. Listener should receive final result if there is
any. Does nothing if recognition is not active.
Return type: @return:true if recognition was actually stopped

Downloads:

Library

Example

Voice Model

Test app

Update:

Version 1.1:

Added audio file to text functionality. (For now only WAV format is supported)

Added predefined word/phrase detection functionality.

Merged startListening and startListening2 together. Pass -1 for continuous recognition.

Version 1.2:

Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.

If you like my work, please donate. Your donations will encourage me to add more features in the future.

The author of recognition engine has provided The linkf for Me. May be, that some files must be renamed and that model.conf has The different name. Other files are here. It will be The surprice. Thank you for yours time and for yours analysis.

gezueb · Oct 26, 2021

Dialog: I would like to program a dialog with questions from device and answer by user. The device uses the Text to Speech (TTS library) for the questions. When the device speaks, the VOSK speech recognition must be paused because otherwise the voice output of the device is wrongly recognized as an answer of the user. There are several functions to disable recognition available, pause(false or true) , stop and cancel, but I am a bit confused how to use them to stop and restart recognition in a timely sequence. A further problem is that the TextToSpeech library TTS creates no event when the text is actually completely spoken (queue empty). I had to use sleep (something) so far. Thanks for advice!

Biswajit · Oct 26, 2021

gezueb said:
Dialog: I would like to program a dialog with questions from device and answer by user. The device uses the Text to Speech (TTS library) for the questions. When the device speaks, the VOSK speech recognition must be paused because otherwise the voice output of the device is wrongly recognized as an answer of the user. There are several functions to disable recognition available, pause(false or true) , stop and cancel, but I am a bit confused how to use them to stop and restart recognition in a timely sequence. A further problem is that the TextToSpeech library TTS creates no event when the text is actually completely spoken (queue empty). I had to use sleep (something) so far. Thanks for advice!

It's simple just check if TTS is still speaking or not. Check the below example.

B4X:

Sub Activity_Create(FirstTime As Boolean)
    Activity.LoadLayout("Layout")
    tts.Initialize("tts")
    t.Initialize("timer",100)
End Sub

Sub Button1_Click
    tts.Speak("your text",True)
    t.Enabled = True
End Sub

Sub timer_Tick
    If Not(tts.As(JavaObject).RunMethod("isSpeaking",Null)) Then
        t.Enabled = False
        'tts done now you can run speech recognition
    End If
End Sub

gezueb · Oct 27, 2021

Thank you for the example, Biswajit, i will try to integrate this.

Biswajit · Oct 27, 2021

gezueb said:
Thank you for the example, Biswajit, i will try to integrate this.

Thank you for the donation.

gezueb · Nov 1, 2021

I would just like to add: the example uses async methods to copy and unzip files. While this is certainly ok, it requires to handle the completion of all async functions properly in a resumable sub with wait fors. I find it much safer to use the blocking versions. The loading of the model can then be coded in a normal sub - not in a resumable one - which makes the code flow independant of the devices performance.

mobilemedved · Mar 16, 2022

Extracting voice model? This not good. How to do without it? Use the unzipped "model" folder?

Janusz Chmiel · Mar 16, 2022

Dear elite programmers,
Does somebody of us think, that it could be possible to prepare The usable voice model package, which support Czech language? Because Czech voice model do not work when I have tried to use one which have been made by The someone on Github. Unfortunately nobody on Github has helped Me.
Who of us would have some time and A good will to look at this problem?
The library which uses voice model and have been created by The this thread creator is outstanding and excellent. I can not write a bad word about this project. Because if model is compatible, it work like A charm. But I Am sad, that Czech model is not compatible.
Any good advice or package would be very very welcomed.

Biswajit · Mar 30, 2022

mobilemedved said:
Extracting voice model? This not good. How to do without it? Use the unzipped "model" folder?

If you embed the unzipped folder into your app it will increase the app size.

Biswajit · Mar 30, 2022

Janusz Chmiel said:
Dear elite programmers,
Does somebody of us think, that it could be possible to prepare The usable voice model package, which support Czech language? Because Czech voice model do not work when I have tried to use one which have been made by The someone on Github. Unfortunately nobody on Github has helped Me.
Who of us would have some time and A good will to look at this problem?
The library which uses voice model and have been created by The this thread creator is outstanding and excellent. I can not write a bad word about this project. Because if model is compatible, it work like A charm. But I Am sad, that Czech model is not compatible.
Any good advice or package would be very very welcomed.

If you need a voice model that is unavailable on the internet you have to train your own model.

Check this page, https://alphacephei.com/vosk/models
Scroll down and check "Training your own model" section.

mobilemedved · Mar 30, 2022

Sorry all. I am use this example some time and i have remark. When I press START button:

Private Sub StartBtn_Click
partialResultBox.Text = ""
resultBox.Text = ""
STT.prepareMicrophone("")
End Sub

Programm go this. Ok!

Sub STT_ReadyToListen
Log("ready")
STT.stop
If STT.startListening(-1) Then
StartBtn.Enabled = False
StopBtn.Enabled = True
Log("Started...")
partialResultBox.Text = "Talk to me!" 'This my add message for user
Else
Log("Failed to Start...")
MsgboxAsync("Failed to Start","")
End If
End Sub

I see "Talk to me!" and I start to speak, but the program does not perceive my speech A few seconds pass, and speech begins to be recognized. How do I know when I can talk?
And improperly configured microphone sensitivity settings can lead to unwanted background noise or a buzzing sound when putting your microphone to use. Somebody can make automatic adjustment of microphone sensitivity?

Biswajit · Mar 31, 2022

mobilemedved said:
Sorry all. I am use this example some time and i have remark. When I press START button:

Private Sub StartBtn_Click
partialResultBox.Text = ""
resultBox.Text = ""
STT.prepareMicrophone("")
End Sub

Programm go this. Ok!

Sub STT_ReadyToListen
Log("ready")
STT.stop
If STT.startListening(-1) Then
StartBtn.Enabled = False
StopBtn.Enabled = True
Log("Started...")
partialResultBox.Text = "Talk to me!" 'This my add message for user
Else
Log("Failed to Start...")
MsgboxAsync("Failed to Start","")
End If
End Sub

I see "Talk to me!" and I start to speak, but the program does not perceive my speech A few seconds pass, and speech begins to be recognized. How do I know when I can talk?
And improperly configured microphone sensitivity settings can lead to unwanted background noise or a buzzing sound when putting your microphone to use. Somebody can make automatic adjustment of microphone sensitivity?

When the program is ready for voice recognition it will raise the ReadyToListen event.

netsistemas · May 3, 2022

how change recognition voice to spanish?
(¿Como cambio para que reconozca voz española?)

Janusz Chmiel · May 4, 2022

Biswajit said:
If you need a voice model that is unavailable on the internet you have to train your own model.

Check this page, https://alphacephei.com/vosk/models
Scroll down and check "Training your own model" section.

This SWEB page contain small Czech language model. I will try it again if it will work or no with your sample app.
Thank you for your help and for your time.

netsistemas · May 4, 2022

thank. Doit .

- Uninstall apk (for delete folder model into device)
- Download voice model spanish from https://alphacephei.com/vosk/models
- Renamed filename downlaod to modelesp.zip AND Folded inside zip to model
- Include file donloaded to apk (modelesp.zip)
- Change GLOBAL var to: model_zip_name As String = "modelesp.zip"

petr4ppc · May 6, 2022

Dear friends

if I am trying this library with Android 11, everything is OK. If I am trying this LIB and same example with Android 4.4.2 I get:

** Activity (main) Pause, UserClosed = false **
** Activity (speechtotext) Create, isFirst = true **
speechtotext_v7 (java line: 399)
java.lang.UnsatisfiedLinkError: Unable to load library 'vosk':
dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
Native library (android-arm/libvosk.so) not found in resource path (.)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:301)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:461)
at com.sun.jna.Native.register(Native.java:1746)
at org.vosk.LibVosk.<clinit>(LibVosk.java:16)
at org.vosk.Model.<init>(Model.java:10)
at com.biswajit.vosk.SpeechToText.Initialize(SpeechToText.java:75)
at b4a.example.speechtotext._v7(speechtotext.java:399)
at b4a.example.speechtotext._activity_create(speechtotext.java:364)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at anywheresoftware.b4a.BA.raiseEvent2(BA.java:213)
at b4a.example.speechtotext.afterFirstLayout(speechtotext.java:105)
at b4a.example.speechtotext.access$000(speechtotext.java:17)
at b4a.example.speechtotext$WaitForLayout.run(speechtotext.java:83)
at android.os.Handler.handleCallback(Handler.java:733)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5291)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:849)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:665)
at dalvik.system.NativeStart.main(Native Method)
Suppressed: java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
... 22 more
Suppressed: java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:204)
... 22 more
Suppressed: java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "rand" referenced by "libvosk.so"...
at java.lang.Runtime.loadLibrary(Runtime.java:364)
at java.lang.System.loadLibrary(System.java:555)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:218)
... 22 more
Suppressed: java.io.IOException: Native library (android-arm/libvosk.so) not found in resource path (.)
at com.sun.jna.Native.extractFromResourcePath(Native.java:1119)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:275)
... 22 more

What can I do, please? It is possible to secure this error?

Best regards
p4ppc

Adamdam · May 28, 2022

Greetings,
Can I feed Youtube URL as input file instead of wav file ?? if yes how ?
to make off-line speech-to-text.
Best regards

B4A Library SpeechToText - Continuous Offline Voice Recognition

Active Member

Member

Active Member

Member

Active Member

Active Member

Active Member

Active Member

Active Member

New Member

Member

Active Member

Active Member

New Member

Active Member

Active Member

Member

Active Member

Well-Known Member

Active Member

Similar Threads

Privacy & Transparency

Privacy & Transparency