B4A Library SpeechToText - Continuous Offline Voice Recognition

Biswajit · Oct 5, 2021

This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:

As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
Remember to add RECORD_AUDIO permission.

How to use:

Download the required voice model from here.
Change the file name to a simple one like "model.zip"
Copy it to the Files folder of your project.
Now to use that model check the attached example.

SpeechToText

Author: @Biswajit
Version: 1.5

SpeechToText
- Events:
  - Error (message As String)
  - FinalResult (text As String)
  - MicrophoneBuffer (buffer() As Byte)
  - PartialResult (text As String)
  - Paused (paused As Boolean)
  - ReadyToListen
  - ReadyToListenEx new
  - ReadyToRead
  - Restarted
  - Result (text As String)
- Fields:
  - sampleRate As Int
    Default 16000
- Functions:
  - cancel As Boolean
    Cancel microphone recognition. Do not post any new events, simply cancel processing.
    Does nothing if recognition is not active.
    Return type: @return:true if recognition was actually stopped
  - FeedExternalBuffer (ExBuffer As Byte()) new
    For recognizing the external audio buffer, feed the buffer here.
    ExBuffer: The external audio byte buffer.
  - Initialize (eventName As String, modelPath As String)
    Initialize the object.
    eventName: The event name prefix.
    modelPath: The model folder path.
  - pause (pause As Boolean)
    Pause microphone recognition.
    pause: Pass true to pause and false to continue.
  - prepareAudioFile (audioPath As String, predefinedWords As String)
    Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
    Call startReading to start reading the file.
    audioPath: Audio file path.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareListenerEx (predefinedWords As String) new
    Prepare the listener for external audio buffer. On success Eventname_ReadyToListenEx event will be raised.
    Call startListeningEx to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareMicrophone (predefinedWords As String)
    Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
    Call startListening to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - reset
    Resets microphone recognizer in a thread, starts microphone recognition over again
  - shutdown
    Shutdown the microphone recognizer and release the recorder.
    Call this on activity or service closing event.
  - startListening (timeout As Int) As Boolean
    Starts microphone recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - startListeningEx As Boolean new
    Starts external audio buffer recognition.
    Return type: @return:true if recognition was actually started
  - startReading (timeout As Int) As Boolean
    Starts file recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - stop As Boolean
    Stops microphone/file recognition. Listener should receive final result if there is
    any. Does nothing if recognition is not active.
    Call this on activity or service closing event.
    Return type: @return:true if recognition was actually stopped

Downloads:

Update:

Version 1.1:
1. Added audio file to text functionality. (For now only WAV format is supported)
2. Added predefined word/phrase detection functionality.
3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
Version 1.2:
1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
Version 1.3:
1. Added method to change the sampling rate.
Version 1.4:
1. Fixed the app crashing issue while calling shutdown without stating the recognizer
Version 1.5:
1. Added option to feed external audio buffer. Instead of using the internal audio recorder you can feed external audio buffer from another audio source.
  (Check the latest example project)
2. Updated VOSK and JNA library. (Please delete old dependencies before coping the new ones.)

If you like my work, please donate. Your donations will encourage me to add more features in the future.

Biswajit · Sep 27, 2023

Jmu5667 said:
So, and update. I have a class that does the extract of the model. The async methods dont function correctly, however the no async method do. I eleive I have solved the issue.

Ok. So that was the issue with the archiver library.

Biswajit · Sep 27, 2023

Ricks Film Restoration said:
If you have a "new example project" kindly make it available to the B4A community (especially me)

@Ricks Film Restoration please check the new update.

Ricks Film Restoration · Sep 27, 2023

Great, I will! Thanks! If it works you can expect another donation.

DonManfred · Sep 28, 2023

Jmu5667 said:
The async methods dont function correctly

Are you talking about Archiver library? The async methods DOES work fine.
You need to use

https://www.b4x.com/android/help/archiver.html#archiver_asyncunzip

instead of

B4A - Archiver

unzip is an synchronous method and returns the amount of files unzipped. There is no event raised when finish.

asyncunzip ´ll raise an Event...

Ricks Film Restoration · Sep 29, 2023

DonManfred said:
Are you talking about Archiver library? The async methods DOES work fine.
You need to use

https://www.b4x.com/android/help/archiver.html#archiver_asyncunzip
instead of

B4A - Archiver

unzip is an synchronous method and returns the amount of files unzipped. There is no event raised when finish.

asyncunzip ´ll raise an Event...

Why not just use the regular archiver library without the resumeable sub stuff? With archiver the program will simply not continue until the unzipping is done: i.e. exactly what is needed in this speech rec. project. Or am I wrong?

Ricks Film Restoration · Oct 26, 2023

Biswajit said:
@Ricks Film Restoration please check the new update.

Hi Biswajit. Sorry for the delayed reply. Your new version of the STT speech recognizer works fine! Thank you very much. I will now make my second donation, as promissed.

drgottjr · Oct 27, 2023

there is an updated library? i donated to you. don't the rest of us get to see it?

Ricks Film Restoration · Oct 27, 2023

Just use the new version that Biswajit uploaded to this thread! See #102. Nothing has been witheld from the B4A community.

drgottjr · Oct 27, 2023

i don't know what you see when you look at post #102 (or any around it), but there is nothing there but a message to you.

Ricks Film Restoration · Oct 27, 2023

drgottjr said:
i don't know what you see when you look at post #102 (or any around it), but there is nothing there but a message to you.

No it is a message stating that the latest version (1.5) is included in this thread. Just go to #1 and click on 1 to 4 under "Downloads".

Biswajit · Oct 27, 2023

drgottjr said:
i don't know what you see when you look at post #102 (or any around it), but there is nothing there but a message to you.

Check the post #102 date and the #1 last edited date. Posting an update to the first post is the correct way. Else new users have to search for the update here and there. You can subscribe to B4A library update thread so that you can get a notification when someone posts an update.

Ricks Film Restoration · Oct 27, 2023

I am doing more tests with this STT. Sadly it crashes with the Dutch vosk-model-small-nl-0.22 when pressing the Text-to-speech button and B4A refuses to assemble the app with the large (1.8 GB) vosk-model-en-us-0.22 model. I've tried it many times and deleted the old version each time before re-assembling the app.

Ricks Film Restoration · Oct 27, 2023

PS: It also doesn't work with vosk-model-en-us-daanzu-20200905 In all cases it does unzip but then I get an error log message: "java.io.IOException failed to create a model". (Note: The app now does install the large (1.8 GB) vosk-model-en-us-0.22 model when I don't use the B4A bridge (which keep connecting and disconnecting), but the error still occurs.

Ricks Film Restoration · Oct 28, 2023

Problem is solved: Immediately under Activity_Create I added:
model_zip_name = "vosk-model-en-us-0.22-lgraph.zip" (or any another model name)
model_file_name = model_zip_name.Replace(".zip","")
and changed ar.asyncunzip(file....) to ar.unzip(file....) because asyncunzip does not wait for the unzipping to complete!
The Dutch model version vosk-model-nl-spraakherkenning-0.6 is truely excellent!

Ricks Film Restoration · Oct 30, 2023

There are three more issues with this STT I'd like to report:
1. If I don't use the speech rognizer, but leave it on in a noisy room (i.e. with my TV on) for about half an hour, the STT does not immediately respond when I start to use it again. It takes dozens of spoken words before it catches up and is working to speed and properly again. The noise in the room seems to be filling a buffer all the time and slows the STT's responsiveness down. Any idea how this can be amended?
2. The 1.8 Gbytes big most elaborate US English model vosk-model-en-us-0.22 can not be installed nor unzipped. I guess it is too big (?)
3. All models occupy a triple amount of memory, probably because these models reside in File.dirAssets, are copied to File.dirInternal and then are unzipped.
The only thing I can do is delete the model from File.dirInternal after it has been successfully unzipped and installed.
I think the best way to minimize memory usage is to Unzip a model on forehand on a computer, upload the entire folder structure to a website and then download all the files and folders directly to File.dirInternal, thus avoiding the unzip routine inside the app.

Biswajit · Oct 31, 2023

Ricks Film Restoration said:
1. If I don't use the speech rognizer, but leave it on in a noisy room

I think you are not suppose to use this library for leaving it in a noisy room. This is for converting voice to text continuously. If you use it for listening to any wakeup word like hey google, then this might show unexpected behaviour.

Ricks Film Restoration said:
I think the best way to minimize memory usage is to Unzip a model on forehand on a computer, upload the entire folder structure to a website

Its upto the developer, how he will optimise the app to use minimal storage.

Ricks Film Restoration · Oct 31, 2023

Biswajit said:
I think you are not suppose to use this library for leaving it in a noisy room. This is for converting voice to text continuously. If you use it for listening to any wakeup word like hey google, then this might show unexpected behaviour.

Its upto the developer, how he will optimise the app to use minimal storage.

You don't seem to know what causes this problem?
I am indeed using it for an elaborate personal assistant which just listens all the time for questions and commands. Also I might add a hotword to make it sleep and wake up again. For that purpose I have inserted the STT in a service, such that my assistant responds regardless what app the user is using. It works great, apart from the problem I mentioned. Probably I will make a function that destroys the service every ca. 10 minutes (after the user has not spoken for about a minute, assuming that he/she won't need it for the next couple of seconds) and then automatically restarts it again in order to erase caches etc. What do you think?

Biswajit · Oct 31, 2023

Ricks Film Restoration said:
You don't seem to know what causes this problem?
I am indeed using it for an elaborate personal assistant which just listens all the time for questions and commands. Also I might add a hotword to make it sleep and wake up again. For that purpose I have inserted the STT in a service, such that my assistant responds regardless what app the user is using. It works great, apart from the problem I mentioned. Probably I will make a function that destroys the service every ca. 10 minutes (after the user has not spoken for about a minute, assuming that he/she won't need it for the next couple of seconds) and then automatically restarts it again in order to erase caches etc. What do you think?

Thats what I mentioned in previous comment. This library is not optimized for voice assistant. If you keep it running it will consume your battery and system resources. It also doesn't support any sleep or wakeup on hotword functionality. You can add this functionality to your app so that you can process something on detection of any hotword while the app is running.

Ricks Film Restoration · Nov 29, 2023

Under Process Globals you should define:
Public model_folder_name As String = "model"
Private model_zip_name As String = "model.zip"
and make sure that you have renamed the chosen language model to "model.zip" (in lowercase)

PS: It might take some time before you read my reply: for some strange reason all my replies are "under moderation, awaiting approval". No idea why that is.
Is that usual behaviour in this forum as regards how new accounts are treated?

Paolo Pini · Dec 5, 2023

Hi,
this is a great library.

I have developed an app that recognises speech and responds to certain messages.
I am trying to change the speech recognition module at runtime to change the langue to recognize, I succeed but the STT engine does not restart until I restart the app.
I renaming and reloading the speech modules at runtime I tried using the following commands to try to restart the engine after the speech pattern change:

B4X:

'test combination of:

        'STT.shutdown
        'STT.stop
        'STT.Initialize("STT", File.DirInternal & "/" & model_folder_name)
        'STT.startListening(-1)

'but  this sub is only called if I restart the application:

Sub STT_ReadyToListen
    Log("READY")
    STT.stop
    If STT.startListening(-1) Then
        Log("STT ready...")
    Else
        Log("Start failed...")
        MsgboxAsync("Start failed","")
    End If
End Sub

How can I restart the SST engine at runtime after reloading the voice module?

Thanks in advance

Paolo

B4A Library SpeechToText - Continuous Offline Voice Recognition

Active Member

Active Member

Member

Expert

Member

Member

Expert

Member

Expert

Attachments

Member

Active Member

Member

Member

Member

Member

Active Member

Member

Active Member

Member

Member

Similar Threads

Privacy & Transparency

Privacy & Transparency