This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:
  1. As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
  2. The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
  3. Remember to add RECORD_AUDIO permission.
How to use:
  1. Download the required voice model from here.
  2. Change the file name to a simple one like "model.zip"
  3. Copy it to the Files folder of your project.
  4. Now to use that model check the attached example.

SpeechToText

Author:
@Biswajit
Version: 1.5
  • SpeechToText
    • Events:
      • Error (message As String)
      • FinalResult (text As String)
      • MicrophoneBuffer (buffer() As Byte)
      • PartialResult (text As String)
      • Paused (paused As Boolean)
      • ReadyToListen
      • ReadyToListenEx new
      • ReadyToRead
      • Restarted
      • Result (text As String)
    • Fields:
      • sampleRate As Int
        Default 16000
    • Functions:
      • cancel As Boolean
        Cancel microphone recognition. Do not post any new events, simply cancel processing.
        Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
      • FeedExternalBuffer (ExBuffer As Byte()) new
        For recognizing the external audio buffer, feed the buffer here.
        ExBuffer: The external audio byte buffer.
      • Initialize (eventName As String, modelPath As String)
        Initialize the object.
        eventName: The event name prefix.
        modelPath: The model folder path.
      • pause (pause As Boolean)
        Pause microphone recognition.
        pause: Pass true to pause and false to continue.
      • prepareAudioFile (audioPath As String, predefinedWords As String)
        Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
        Call startReading to start reading the file.
        audioPath: Audio file path.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareListenerEx (predefinedWords As String) new
        Prepare the listener for external audio buffer. On success Eventname_ReadyToListenEx event will be raised.
        Call startListeningEx to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareMicrophone (predefinedWords As String)
        Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
        Call startListening to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • reset
        Resets microphone recognizer in a thread, starts microphone recognition over again
      • shutdown
        Shutdown the microphone recognizer and release the recorder.
        Call this on activity or service closing event.
      • startListening (timeout As Int) As Boolean
        Starts microphone recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • startListeningEx As Boolean new
        Starts external audio buffer recognition.
        Return type: @return:true if recognition was actually started
      • startReading (timeout As Int) As Boolean
        Starts file recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • stop As Boolean
        Stops microphone/file recognition. Listener should receive final result if there is
        any. Does nothing if recognition is not active.
        Call this on activity or service closing event.
        Return type: @return:true if recognition was actually stopped
Downloads:
  1. Library
  2. Example
  3. Voice Model
  4. Test app
Update:
  • Version 1.1:
    1. Added audio file to text functionality. (For now only WAV format is supported)
    2. Added predefined word/phrase detection functionality.
    3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
  • Version 1.2:
    1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
  • Version 1.3:
    1. Added method to change the sampling rate.
  • Version 1.4:
    1. Fixed the app crashing issue while calling shutdown without stating the recognizer
  • Version 1.5:
    1. Added option to feed external audio buffer. Instead of using the internal audio recorder you can feed external audio buffer from another audio source.
      (Check the latest example project)
    2. Updated VOSK and JNA library. (Please delete old dependencies before coping the new ones.)

If you like my work, please donate. Your donations will encourage me to add more features in the future.

 
Last edited:

Biswajit

Active Member
Licensed User
Longtime User
So, and update. I have a class that does the extract of the model. The async methods dont function correctly, however the no async method do. I eleive I have solved the issue.
Ok. So that was the issue with the archiver library.
 

DonManfred

Expert
Licensed User
Longtime User
Are you talking about Archiver library? The async methods DOES work fine.
You need to use
instead of

unzip is an synchronous method and returns the amount of files unzipped. There is no event raised when finish.

asyncunzip ´ll raise an Event...
Why not just use the regular archiver library without the resumeable sub stuff? With archiver the program will simply not continue until the unzipping is done: i.e. exactly what is needed in this speech rec. project. Or am I wrong?
 

drgottjr

Expert
Licensed User
Longtime User
there is an updated library? i donated to you. don't the rest of us get to see it?
 

drgottjr

Expert
Licensed User
Longtime User
i don't know what you see when you look at post #102 (or any around it), but there is nothing there but a message to you.
 

Attachments

  • 102.png
    102.png
    45.4 KB · Views: 138

Biswajit

Active Member
Licensed User
Longtime User
i don't know what you see when you look at post #102 (or any around it), but there is nothing there but a message to you.
Check the post #102 date and the #1 last edited date. Posting an update to the first post is the correct way. Else new users have to search for the update here and there. You can subscribe to B4A library update thread so that you can get a notification when someone posts an update.
 
There are three more issues with this STT I'd like to report:
1. If I don't use the speech rognizer, but leave it on in a noisy room (i.e. with my TV on) for about half an hour, the STT does not immediately respond when I start to use it again. It takes dozens of spoken words before it catches up and is working to speed and properly again. The noise in the room seems to be filling a buffer all the time and slows the STT's responsiveness down. Any idea how this can be amended?
2. The 1.8 Gbytes big most elaborate US English model vosk-model-en-us-0.22 can not be installed nor unzipped. I guess it is too big (?)
3. All models occupy a triple amount of memory, probably because these models reside in File.dirAssets, are copied to File.dirInternal and then are unzipped.
The only thing I can do is delete the model from File.dirInternal after it has been successfully unzipped and installed.
I think the best way to minimize memory usage is to Unzip a model on forehand on a computer, upload the entire folder structure to a website and then download all the files and folders directly to File.dirInternal, thus avoiding the unzip routine inside the app.
 

Biswajit

Active Member
Licensed User
Longtime User
1. If I don't use the speech rognizer, but leave it on in a noisy room
I think you are not suppose to use this library for leaving it in a noisy room. This is for converting voice to text continuously. If you use it for listening to any wakeup word like hey google, then this might show unexpected behaviour.


I think the best way to minimize memory usage is to Unzip a model on forehand on a computer, upload the entire folder structure to a website
Its upto the developer, how he will optimise the app to use minimal storage.
 
I think you are not suppose to use this library for leaving it in a noisy room. This is for converting voice to text continuously. If you use it for listening to any wakeup word like hey google, then this might show unexpected behaviour.



Its upto the developer, how he will optimise the app to use minimal storage.
You don't seem to know what causes this problem?
I am indeed using it for an elaborate personal assistant which just listens all the time for questions and commands. Also I might add a hotword to make it sleep and wake up again. For that purpose I have inserted the STT in a service, such that my assistant responds regardless what app the user is using. It works great, apart from the problem I mentioned. Probably I will make a function that destroys the service every ca. 10 minutes (after the user has not spoken for about a minute, assuming that he/she won't need it for the next couple of seconds) and then automatically restarts it again in order to erase caches etc. What do you think?
 

Biswajit

Active Member
Licensed User
Longtime User
You don't seem to know what causes this problem?
I am indeed using it for an elaborate personal assistant which just listens all the time for questions and commands. Also I might add a hotword to make it sleep and wake up again. For that purpose I have inserted the STT in a service, such that my assistant responds regardless what app the user is using. It works great, apart from the problem I mentioned. Probably I will make a function that destroys the service every ca. 10 minutes (after the user has not spoken for about a minute, assuming that he/she won't need it for the next couple of seconds) and then automatically restarts it again in order to erase caches etc. What do you think?
Thats what I mentioned in previous comment. This library is not optimized for voice assistant. If you keep it running it will consume your battery and system resources. It also doesn't support any sleep or wakeup on hotword functionality. You can add this functionality to your app so that you can process something on detection of any hotword while the app is running.
 
Under Process Globals you should define:
Public model_folder_name As String = "model"
Private model_zip_name As String = "model.zip"
and make sure that you have renamed the chosen language model to "model.zip" (in lowercase)

PS: It might take some time before you read my reply: for some strange reason all my replies are "under moderation, awaiting approval". No idea why that is.
Is that usual behaviour in this forum as regards how new accounts are treated?
 

Paolo Pini

Member
Licensed User
Longtime User
Hi,
this is a great library.

I have developed an app that recognises speech and responds to certain messages.
I am trying to change the speech recognition module at runtime to change the langue to recognize, I succeed but the STT engine does not restart until I restart the app.
I renaming and reloading the speech modules at runtime I tried using the following commands to try to restart the engine after the speech pattern change:
B4X:
'test combination of:

        'STT.shutdown
        'STT.stop
        'STT.Initialize("STT", File.DirInternal & "/" & model_folder_name)
        'STT.startListening(-1)

'but  this sub is only called if I restart the application:

Sub STT_ReadyToListen
    Log("READY")
    STT.stop
    If STT.startListening(-1) Then
        Log("STT ready...")
    Else
        Log("Start failed...")
        MsgboxAsync("Start failed","")
    End If
End Sub

How can I restart the SST engine at runtime after reloading the voice module?

Thanks in advance

Paolo
 
Top