So, here is Mi Dica III
To review:
Mi Dica was a half-duplex voice translation app using Google's Speech Service, MLKit's on-device translation and Android's TTS engine.
I speak in my language, what I say is translated into and spoken in your language. Flip a switch; you speak in your language, and I hear the
translation in my language. My only regret was that you had to be online to use Google's voice recognizers.
Mi Dica II swapped out Google's Speech Service for Vosk's on-device recognizer (using forum member @Biswajit's wrapper). It was
mainly a test to see how Vosk handled voice recognition with a small offline model versus Google's cloud resources. Although spoken
translation into a second language was no problem, the offline recognizer requires a little careful handling, and it was unclear at the time how
to get the Vosk library to handle recognizing 2 languages in the same app.
Mi Dica III goes half-duplex with Vosk and with decent results. What I did was to make a Vosk class to handle all the voice-related matters and
create 2 instances of it, one for each language to be recognized. Simply stopping 1 class's listener was enough to allow the other class to activate
its listener without any conflict relating to the microphone. Trying to use a single instance would have required shutting down the engine and
re-initializing it every time there was a language change. There is a slight delay with Vosk from the time it says it's ready to listen until it is actually
listening. If you begin to speak right away, the first few words or syllables are often lost. I added a 1 second delay and a "Listening" sign. After that,
I was able to be understood holding the phone at a comfortable distance.
In truth, Google's recognizers are much better, and it's much easier to combine languages on the fly, but it you happen to be where there is no fast
Internet, then you're out of luck. Vosk's offline recognizers are a workable plan B. Each language model adds 40-50MB to your assets folder if you
build an app that way, but a simple 2-language app should be manageable. You could always require a one-time online connection to download
models after the app has been installed, if you wanted to set things up that way. Vosk has a repository, or you could use your own. The app would
always only support 2 languages at a time, but you could make the combinations on the fly using the models you had downloaded.
A video can be found here:
once again, Android's video capture app was not able to capture both the microphone and audio output at the same time, even though I selected the
appropriate setting. As a result, you can only hear the TTS output on the video.
To review:
Mi Dica was a half-duplex voice translation app using Google's Speech Service, MLKit's on-device translation and Android's TTS engine.
I speak in my language, what I say is translated into and spoken in your language. Flip a switch; you speak in your language, and I hear the
translation in my language. My only regret was that you had to be online to use Google's voice recognizers.
Mi Dica II swapped out Google's Speech Service for Vosk's on-device recognizer (using forum member @Biswajit's wrapper). It was
mainly a test to see how Vosk handled voice recognition with a small offline model versus Google's cloud resources. Although spoken
translation into a second language was no problem, the offline recognizer requires a little careful handling, and it was unclear at the time how
to get the Vosk library to handle recognizing 2 languages in the same app.
Mi Dica III goes half-duplex with Vosk and with decent results. What I did was to make a Vosk class to handle all the voice-related matters and
create 2 instances of it, one for each language to be recognized. Simply stopping 1 class's listener was enough to allow the other class to activate
its listener without any conflict relating to the microphone. Trying to use a single instance would have required shutting down the engine and
re-initializing it every time there was a language change. There is a slight delay with Vosk from the time it says it's ready to listen until it is actually
listening. If you begin to speak right away, the first few words or syllables are often lost. I added a 1 second delay and a "Listening" sign. After that,
I was able to be understood holding the phone at a comfortable distance.
In truth, Google's recognizers are much better, and it's much easier to combine languages on the fly, but it you happen to be where there is no fast
Internet, then you're out of luck. Vosk's offline recognizers are a workable plan B. Each language model adds 40-50MB to your assets folder if you
build an app that way, but a simple 2-language app should be manageable. You could always require a one-time online connection to download
models after the app has been installed, if you wanted to set things up that way. Vosk has a repository, or you could use your own. The app would
always only support 2 languages at a time, but you could make the combinations on the fly using the models you had downloaded.
A video can be found here:
once again, Android's video capture app was not able to capture both the microphone and audio output at the same time, even though I selected the
appropriate setting. As a result, you can only hear the TTS output on the video.