Re: [libreoffice-accessibility] Re: [libreoffice-users] Re: Sphinx - voice translation - Linux

Eric <esj -AT- esjworks.com>
Tue, 02 Dec 2014 22:37:03 -0500


On 12/2/2014 6:50 PM, Tom Davies wrote:

Hi :)
I am wondering if anyone here uses any speech recognition packages or has
any idea about which ones work well.  I think the preference is for
something working well on Linux but it might be interesting to hear about
others.

I've been using speech recognition since 1994. I currently useNaturallySpeaking 13 and, if you want to use anything other than simpledictation, it's the tool of choice. Sadly, the company has its head upits hind end with regards to how it treats disabled people but theydon't hesitate to tell you about their Section 508 compliance but don'tmention that it's only if you're working in Microsoft Word.

If you want to work on the Macintosh they do have a product for straightdictation but very little support for creating your own speech userinterfaces. People keep trying to use the Microsoft Windows speechrecognition environment but, they keep coming back to NaturallySpeakingbecause of the recognition accuracy and greater ease-of-use.

A few folks who try to do things with Google speech recognition butit's really aimed at dictating limited amount of tasks that you correctby hand on a mobile device. I put this in the maybe someday categoryespecially since Google, like Microsoft and Nuance, doesn't talk todisabled users who really understand how speech changes the system.

The biggest problem with using speech recognition is not the accuracy.It's not even finding a good microphone which, is an expensive processboth in terms of time and money, it's the social aspect. Think about howannoying it is to listen to someone talking on their phone hearing onlyhalf the conversation and all of the disjointed chaotic forms of speechthat humans use with communicating with other humans. Now, imaginesomething 3 to 4 times worse but you have the sound of a person talkingto a computer, correcting mistakes, and giving commands in a way thatmakes no sense to someone who knows the native language.

I've been wanting speech recognition on linux for a very long time. ,I'm not the only one. Back about circa 2000, I outlined the idea that toput speech recognition on other platforms, you should have a dedicatedbox it does nothing but run the recognition engine, language andacoustic processing etc. with a client-side wedge that makes it possibleto communicate the local context back to the recognition environment sothat the appropriate actions could be activated by a user supplied grammar.

the primary advantage of this technique is that you would control allof the hardware, audio channel, performance issues in a dedicated box tofor speech recognition (which makes it damned inconvenient foreverything else) and then you could control anything else for verylow-cost. You wouldn't have to worry about porting between differentoperating systems or hardware architectures. You just had one platformyou could control. In theory today, we could be talking about a mobilephone. NaturallySpeaking running takes up roughly 1 to 1.5 GB. Wellwithin the range of a high-end smart phone.

At the time, it was possible but not practical as it would require adedicated laptop or machine to run the recognition engine. Today, theonly technological barrier to implementing this model is the ability todeliver a clean audio stream to a virtual machine and enough hooks inthe target OS UI to let you do user interface automation and statedetection.

this is not terribly difficult project if you have hands that work wellenough to write code freely. However, as is usual for most accessibilityprojects, 99% of the barriers to implementation arepolitical/social/resource rather than technical. so I pick away at itin my spare time. I make painfully slow progress and after 15 years ofthinking about it, I have the start of an implementation for the KVMvirtual machine environment. I'm currently being hung up by a problemwith using the host USB audio device inside the virtual machine.

So tread lightly in your desire for speech recognition. If I wasn'tdisabled, I probably wouldn't use it. 10 years ago I would've saidotherwise. 10 to 15 years ago more applications were accessible byspeech recognition. Today, with all the different toolkits that pay noheed to the needs of speech recognition from users, it's even lessusability.

and I think that's enough for now. I will end with this. LO'saccessibility for speech recognition users is on a par with most otherWindows applications i.e. almost nil. For some reason I don'tunderstand, Select-and-Say capability does work sometimes so there isthat small blessing but for the most part, most of the features of LOare not accessible and cannot be made accessible with the current system.


--- eric

--
To unsubscribe e-mail to: accessibility+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/accessibility/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-accessibility] Re: [libreoffice-users] Re: Sphinx - voice translation - Linux · Tom Davies
- Re: [libreoffice-accessibility] Re: [libreoffice-users] Re: Sphinx - voice translation - Linux · Eric
  - [libreoffice-accessibility] Re: [libreoffice-users] Re: Sphinx - voice translation - Linux · Alexander Thurgood

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.