Hi everyone,
I am trying to develop an app to extract text from voice in a video stream online or as a file, for scientific purposes. What is the best way to do that. I know that speech can be converted to text but how to get the voice audio from the video is my problem? Any suggestions will be highly appreciated.
Have a good day
A better alternative for offline recognition is available here. With this, you can add speech recognition feature to your application without google speech recognition popup (check attached example), SpeechRecognitionNoUI Author: @Biswajit Version: 1.6 SpeechRecognitionNoUI Events...
as to your first question about how to extract audio from a video file, then you can use FFMpeg as long it is not in real-time.
You could use my FFmpeg-encoder library.
It probably doesn't help much but Microsoft Teams has a real-time transcription feature for video calls.
So its not just Google apps that have the feature.
Of course how they do it I have no idea, but in the past I have used Python libraries to transcribe video files from online workshops that I ran.