Look Mom, No Hands: Implementing Speech-Enabled Applications

Recent technological advances in voice recognition systems have shattered traditional associations with ineffective and often times frustrating automated prompts and customer relationship management technologies. Thanks to advances in voice recognition engines and middleware solutions, speech-based technology has evolved to where it is robust enough to be used by emergency responders in even the most extreme crisis scenarios. Whereas in the past, converting applications for speech-based input was an extremely difficult and imprecise undertaking, today's advances have resulted in speech-based end-to-end solutions that are reliable, ruggedized, user friendly and affordable. Current voice recognition technologies offer a number of advantages for highly mobile, rapidly deployed and compartmentalized emergency response forces.

In an e-form-based government environment protecting and serving the public, emergency workers must be expert record keepers and IT specialists, maintaining and submitting detailed preliminary and post-incident reports to an organizational database. This paperwork, regardless of how necessary it may be, demands time and attention that detracts from emergency workers' ability to carry out their primary job function. Additionally, emergency workers must have rapid access to information critical to the immediate task. Whether running the plates on a suspicious vehicle or pulling medical records for a trauma patient, emergency personnel must be able to access this information while being hyper-aware of their surrounding environment. While installing PCs in emergency vehicles represents improvement in the efforts to make information more readily accessible for emergency workers, these systems still require personnel to shift focus from the task at hand and give their eyes and hands to the data query and retrieval process.

With recent advances in voice recognition technologies, a speech-enabled data retrieval and submission system for emergency rescue workers is both a feasible and advantageous project. In practice, speech-enabling existing data systems would allow emergency workers to direct all of their attention to their immediate task, providing a "hands free, eyes free" means of accessing information and filing reports. Additionally, reports could be filed and records established in real time. Not only does this result in cost savings as a reduction of labor and cycle-time, in many instances (especially with first responders and health care workers), improved cycle-time can mean a faster response time, and a faster response time can mean the difference between life and death.

Ideal scenarios for speech-based applications
Scenarios in which speech-based applications would be of enormous benefit to emergency workers include:

A SWAT officer, while carrying his automatic rifle and directing his team with eyes on target, reports into a ruggedized PDA while carrying out a raid on a suspected drug dealer's house. During the raid the officer reports the details of the crime scene, and once suspects have been apprehended and evidence secured, records can be compiled based on the officer's spoken account of the situation. Based on his voice report, wirelessly transmitted to a local police database, a detailed report is stored which later can be used as evidence in court.
An EMT arrives at an incident scene to find several patients unconscious. While providing critical care with both hands, he dictates their vital signs and physical symptoms via a lapel microphone into a PDA in his jumpsuit pocket.

The data the EMT transmits automatically becomes part of the patients' electronic medical records. Based on this data, transmitted wirelessly to the nearest hospital, the emergency room staff is able to pre-position materials and notify public health authorities of a potential bio-chemical attack, resulting in rapid mobilization of other first responders.
A nurse records vital signs for patients in a hospital ward using her voice recorded by a wireless-equipped PDA. The data is uploaded to the hospital information system when she places her PDA in its cradle to connect and recharge.

Challenges and solutions
Considering the enormous advantages speech-based applications offer emergency workers, it is a wonder they have not been employed in the past. The reason emergency workers have not always relied on speech-based systems for data retrieval and filing reports is due to the number of technological hurdles that have plagued efforts to implement such systems. These hurdles include application development, user interface development and environmental challenges.

In the past, due to the enormous breadth of the English language, application development for speech-based platforms has been extremely difficult and limited. Efforts to convert existing data into a format compatible with speech-based applications have required extensive hand-coding and a great deal of oversight and management. Relatively simple applications can require hundreds of pages of code to implement basic data entry.

Today, software tools exist that streamline the coding process and provide automated development and trouble shooting for interactions between information systems and speech recognition (SR) engines. These developments serve to speed up the conversion process for speech-enabling information systems as well as continually updating and modifying SR platforms as the system is deployed.

Considering that reliable speech-based systems are still in the introductory market phase, voice user interfaces (VUI) have yet to be perfected for field use. A VUI differs substantially from a graphical user interface (GUI), and until recently, best practices for VUI design had not been codified. Just as early attempts to build GUIs were often laughable, early VUI designs have too often been confusing, awkward and frustrating for users. (Users have very high expectations for machine-based SR because they have spoken only to people who have remarkable flexibility and capability for disambiguation compared to computers.)

Many years of experience with SR applications in a variety of settings, including millions of telephone-based interactions with a diverse user base, have made it possible to codify best practices for building, testing and refining VUIs. Moreover, many of these best practices are built into reusable speech grammars and application architectures. While VUIs still require feedback from emergency professionals in the field to maximize performance, existing systems are sturdy enough to be applied in real-life scenarios without jeopardizing the safety of the user.

Considering the often inhospitable environment emergency professionals work in, past speech-based technology platforms have been inappropriate for use by emergency personnel due to the frequency of systems failure. However, today's generation of voice recognition technology is robust and rugged enough to work effectively in the harshest of environments. For example, emergency workers often operate in high-noise environments that make it difficult for SR engines to function at their best. However, today's SR systems incorporate microphone array and noise suppression technologies that can overcome many of the challenges posed by high-noise environments.

According to Max Patterson, a graduate of the U.S. Secret Service Protective Operations Briefing, and a former police chief for both the Albion, Michigan, and Windsor, Connecticut, police departments, "Integrated voice-enabled capability in mobile environments is particularly important in law enforcement. Officers can be alerted if the car is stolen, or if there is any other potential danger before pulling the vehicle over. The ability to use voice rather than a keyboard to enter and access information makes the VideoWitness Patrol Car System — using Vangard Voice AccuSPEECH technology — the ideal solution for smaller law enforcement agencies. It has a much smaller footprint, costs much less than the larger systems currently available and delivers all the functionality of those larger systems."

Ready to go
After a long gestation period — including more than 20 years of laboratory research and development — automated SR is finally ready for widespread deployment. While the technology has yet to be fully adopted by government agencies, speech-based records keeping and data retrieval systems offer a number of advantages for emergency personnel.

Transitioning from traditional information systems to speech-based applications
Getting speech-based data retrieval and input systems up and running is simply a matter of systems implementation and personnel training. Admittedly, the switch from a PC-based environment to a speech-based one is a large and seemingly risky step. However, a number of measures can be taken to ease the transition, and the advantages of such a system outweigh the transition and learning curve. A simple set of measures to ensure the smooth transition to a speech-based environment include:

Selecting a good pilot environment — Initial candidate agencies will have a well-defined forms-based process using a program like FormStream from Patron Systems (www.patronsystems.com), a willing management team with a track record of trying innovative approaches and a good economic or mission-oriented case for streamlining its data collection process.

Addressing application development productivity — The specific technology should provide an integrated workbench for designing, building and testing the voice-enabled application components prior to downloading to the target environment. It also is important to include expertise in voice user interface (VUI) development on the team.

Involving real users — During the design, build and test cycle, it is important to involve "model users" whose capabilities and accomplishments are appreciated in the organization. These users can make the design team aware of best practices among field users, and can help the team be creative in streamlining processes.

Roll out the application systematically — Once the application has been lab tested, the agency should roll out the application following a deployment strategy including initial user training, rapid collection of user feedback and quick dissemination of software enhancements.

Learn from feedback and plan wider deployments—Following a successful pilot, the agency can adopt suggested enhancements to the training and rollout process and deploy speech recognition in a wider set of mobile applications agency-wide.