B4A Library SpeechToText - Continuous Offline Voice Recognition

Biswajit · Oct 5, 2021

This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:

As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
Remember to add RECORD_AUDIO permission.

How to use:

Download the required voice model from here.
Change the file name to a simple one like "model.zip"
Copy it to the Files folder of your project.
Now to use that model check the attached example.

SpeechToText

Author: @Biswajit
Version: 1.5

SpeechToText
- Events:
  - Error (message As String)
  - FinalResult (text As String)
  - MicrophoneBuffer (buffer() As Byte)
  - PartialResult (text As String)
  - Paused (paused As Boolean)
  - ReadyToListen
  - ReadyToListenEx new
  - ReadyToRead
  - Restarted
  - Result (text As String)
- Fields:
  - sampleRate As Int
    Default 16000
- Functions:
  - cancel As Boolean
    Cancel microphone recognition. Do not post any new events, simply cancel processing.
    Does nothing if recognition is not active.
    Return type: @return:true if recognition was actually stopped
  - FeedExternalBuffer (ExBuffer As Byte()) new
    For recognizing the external audio buffer, feed the buffer here.
    ExBuffer: The external audio byte buffer.
  - Initialize (eventName As String, modelPath As String)
    Initialize the object.
    eventName: The event name prefix.
    modelPath: The model folder path.
  - pause (pause As Boolean)
    Pause microphone recognition.
    pause: Pass true to pause and false to continue.
  - prepareAudioFile (audioPath As String, predefinedWords As String)
    Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
    Call startReading to start reading the file.
    audioPath: Audio file path.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareListenerEx (predefinedWords As String) new
    Prepare the listener for external audio buffer. On success Eventname_ReadyToListenEx event will be raised.
    Call startListeningEx to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - prepareMicrophone (predefinedWords As String)
    Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
    Call startListening to start listening.
    predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
  - reset
    Resets microphone recognizer in a thread, starts microphone recognition over again
  - shutdown
    Shutdown the microphone recognizer and release the recorder.
    Call this on activity or service closing event.
  - startListening (timeout As Int) As Boolean
    Starts microphone recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - startListeningEx As Boolean new
    Starts external audio buffer recognition.
    Return type: @return:true if recognition was actually started
  - startReading (timeout As Int) As Boolean
    Starts file recognition. After specified timeout listening stops and the
    endOfSpeech signals about that. Does nothing if recognition is active.
    timeout: timeout in milliseconds to listen. -1 = infinite;
    Return type: @return:true if recognition was actually started
  - stop As Boolean
    Stops microphone/file recognition. Listener should receive final result if there is
    any. Does nothing if recognition is not active.
    Call this on activity or service closing event.
    Return type: @return:true if recognition was actually stopped

Downloads:

Update:

Version 1.1:
1. Added audio file to text functionality. (For now only WAV format is supported)
2. Added predefined word/phrase detection functionality.
3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
Version 1.2:
1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
Version 1.3:
1. Added method to change the sampling rate.
Version 1.4:
1. Fixed the app crashing issue while calling shutdown without stating the recognizer
Version 1.5:
1. Added option to feed external audio buffer. Instead of using the internal audio recorder you can feed external audio buffer from another audio source.
  (Check the latest example project)
2. Updated VOSK and JNA library. (Please delete old dependencies before coping the new ones.)

If you like my work, please donate. Your donations will encourage me to add more features in the future.

Biswajit · May 28, 2022

Adamdam said:
Can I feed Youtube URL as input file instead of wav file ??

No. The SDK only supports WAV files. You can download the youtube audio and convert it to WAV format and pass it.

Adamdam said:
to make off-line speech-to-text.

This is an offline speech to text SDK you do not need an internet connection.

JohnC · Oct 6, 2022

Just tried the sound+record feature of the demo - works great!

Sending you a well-earned donation!

JohnC · Oct 7, 2022

I increased the audiostreamer bitrate from 16000 to 22050, but when I play back the audio it sounds like mickey mouse (plays too fast):

B4X:

   'player.Initialize("player",16000,True,16,player.VOLUME_MUSIC)
    player.Initialize("player",22050,True,16,player.VOLUME_MUSIC)

Is the STT_MicrophoneBuffer event hard-coded to only work with a 16000 sample rate?

Biswajit · Oct 7, 2022

JohnC said:
I increased the audiostreamer bitrate from 16000 to 22050, but when I play back the audio it sounds like mickey mouse (plays too fast):

B4X:

'player.Initialize("player",16000,True,16,player.VOLUME_MUSIC) player.Initialize("player",22050,True,16,player.VOLUME_MUSIC)

Is the STT_MicrophoneBuffer event hard-coded to only work with a 16000 sample rate?

Yes. It's hard-coded. I will add an option to change that.

JohnC · Oct 9, 2022

Biswajit said:
Yes. It's hard-coded. I will add an option to change that.

Would it be possible for the new change to also allow "reading" WAV files that also may not have been encoded in 16000 sample rate?

Hamied Abou Hulaikah · Oct 9, 2022

Great and work.

Biswajit · Oct 9, 2022

JohnC said:
Would it be possible for the new change to also allow "reading" WAV files that also may not have been encoded in 16000 sample rate?

Yes it will work for both types of recognition (speech and wav file).

Khalid. · Oct 10, 2022

Biswajit said:
Yes it will work for both types of recognition (speech and wav file).

I have a problem with wav file not converting to text

https://www.b4x.com/android/forum/threads/convert-audio-file-mp3-to-text.143144/post-908607

Biswajit · Oct 10, 2022

JohnC said:
Is the STT_MicrophoneBuffer event hard-coded to only work with a 16000 sample rate?

JohnC said:
Would it be possible for the new change to also allow "reading" WAV files that also may not have been encoded in 16000 sample rate?

Please check the new update. After initializing the library you can change the sampleRate value at any point of time (before starting the recognition).

JohnC · Oct 11, 2022

Biswajit said:
Please check the new update. After initializing the library you can change the sampleRate value at any point of time (before starting the recognition).

The SampleRate setting works great!

Biswajit · Oct 11, 2022

Khalid. said:
I have a problem with wav file not converting to text

https://www.b4x.com/android/forum/threads/convert-audio-file-mp3-to-text.143144/post-908607

The library supports only wav file format. Also please specify the sample rate before reading the file. Check the new update.

If it doesn't work, send me that audio file I will check.

Khalid. · Oct 11, 2022

Biswajit said:
The library supports only wav file format.

I tried it some audio files there stops talking and then continues like.
space on the number of time gone

i think we all have passions>>>_______________<<<and you don't get to choose

in your conversion to text

i think we all have passions and you don't get to choose

Does it support multiple languages?

JohnC · Oct 11, 2022

Khalid. said:
Does it support multiple languages?

I think you just need to use a different "Model" for other languages.

JohnC · Oct 11, 2022

Khalid. said:
I tried it some audio files there stops talking and then continues

Just make sure that the sample rate you set for this library matches the sample rate of the WAV audio file and make sure the audio file is a true WAV format, and not simply renamed to WAV file extension.

Khalid. · Oct 11, 2022

JohnC said:
Just make sure that the sample rate

Yes it works in the conversion, my question is in the audio there may be some pauses by the speaker and then he continues speaking, how can the pause be compensated by putting a (((space))) (on the number of time gone) between speeches.

Biswajit · Oct 11, 2022

Khalid. said:
I tried it some audio files there stops talking and then continues like.
space on the number of time gone

I think there is nothing called SPACE in voice recognition. When you pause/stop talking the recognizer waits for the next word instead of adding a blank space. This is valid for all types of voice recognition I guess. Try google assistant, Siri, Alexa, or any other voice-to-text app, I guess in all cases it will just wait for your voice when you temporarily stop talking.

The library supports multiple languages but one at a time.

Khalid. · Oct 12, 2022

Biswajit said:
When you pause/stop talking the recognizer waits for the next word instead of adding a blank space

To synchronize text and audio together

https://www.b4x.com/basic4android/images/F0DYcnZwgV.gif
Is it possible to take a sample of the sound wave when it is . In sleep mode (the speaker stops talking). And put space or "_" or "*" or "+" for the purpose of sync

drgottjr · Oct 13, 2022

a very tiny issue re an otherwise good job:
shutdown releases the audio recorder. if user exits app without having tapped the start button (maybe they changed their mind), a null object exception
is thrown. i would suggest an "if (recorder != null) release recorder" in shutdown. or recommend users add a try/catch on activity_pause

Biswajit · Oct 13, 2022

Khalid. said:
To synchronize text and audio together

https://www.b4x.com/basic4android/images/F0DYcnZwgV.gif
Is it possible to take a sample of the sound wave when it is . In sleep mode (the speaker stops talking). And put space or "_" or "*" or "+" for the purpose of sync

I dont think so. You have to add your own logic to add space when the speaker is silent.

Biswajit · Oct 13, 2022

drgottjr said:
a very tiny issue re an otherwise good job:
shutdown releases the audio recorder. if user exits app without having tapped the start button (maybe they changed their mind), a null object exception
is thrown. i would suggest an "if (recorder != null) release recorder" in shutdown. or recommend users add a try/catch on activity_pause

Ok I will check.

B4A Library SpeechToText - Continuous Offline Voice Recognition

Active Member

Expert

Expert

Active Member

Expert

Well-Known Member

Active Member

Member

Active Member

Expert

Active Member

Member

Expert

Expert

Member

Active Member

Member

Expert

Active Member

Active Member

Similar Threads

Privacy & Transparency

Privacy & Transparency