IBM advances in conversational speech recognition technology

IBM Corporation

Based on deep learning technologies, IBM developed technology that recognizes spoken words ever closer to human parity.

Last year, IBM announced a major improvement in conversational speech recognition with a system achieving a 6.9 percent word error rate.

Since then, IBM Researchers have continued to push the boundaries of accuracy rates. The company believes that the latest development is a historic milestone and sets an industry record of 5.5 percent, a 20 percent improvement from the rate than was reported six months prior.

The success of speech recognition technology is measured against human parity, an error rate on par with that of two humans speaking.

Previously, human parity was considered a 5.9 percent word error rate; IBM partnered with Appen, a speech and technology service provider, to reassess the industry benchmark and determined that human parity is lower than what anyone has yet achieved: 5.1 percent.

“These speech developments build on decades of research, and achieving speech recognition comparable to that of humans is a complex task. At IBM, we are dedicated to creating the technology that will one day match the complexity of how the human ear, voice and brain interact,” said Michael Karasick, IBM Vice President, Cognitive Computing.

“This progress will have important implications for how man and machine collaborate in the future, making the interactions more natural and productive. We believe it is only a matter of time before we achieve parity on speech recognition with humans.”

As the IT major continues to develop and improve upon this technology, its researchers will remain accountable to the highest standards of accuracy when measuring for it for the findings to be truly valuable, IBM hopes.