Support
Partnerships
As a global leader in phonetic speech analytics technology, Aurix works with our partners to create innovative applications.
Demonstrations
Technology Frequently Asked Questions

1. What is phonetic search? How does it differ from speech-to text?

A. Phonetic search represents a big advance over traditional Large Vocabulary Continuous Speech Recognition (LVCSR) systems, which are based on speech-to-text technology making use of finite dictionaries. It is inherently more efficient and has a substantially lower implementation cost.

Unlike LVCSR systems, phonetic search transforms audio recordings into phonemes, one of the basic units of human speech, rather than written spellings. This means that the audio file needs to be "ingested" once only; the index file can be used for as many subsequent searches as you like. The phonetic search approach allows a much larger volume of calls to be ingested, for less computer power, than speech-to-text systems which need to carry out multiple tests in order to make the right decisions about words at the “ingestion”, or indexing stage.

Accuracy is also improved by the way that phonetic search indexes audio files. Speech-to-text systems make definitive decisions when transcribing words, but there are drawbacks. LVCSR dictionaries are limited by the number of words they contain. Proper nouns are particularly problematic.

By contrast, Aurix phonetic speech search engine has an open vocabulary, to break free from dictionary constraints. Aurix indexes speech sounds into possible phonemes. An audio searches is conducted using phoneme strings derived from search words and phrases input in text form by the user. Multiple results are returned and ranked by confidence level.


2. How fast can searches be conducted?

A. A phonetic search through Aurix technology will run up to 80,000x faster than real time on a modern PC such as an Intel Core 2 Duo E6600. So, one PC could search eight hours of audio data in under a second.

3. How is accuracy measured?

A. As audio mining systems are probabilistic in nature, human verification is needed in order to determine absolutely whether or not a given result is correct. Two types of error are possible: false positives (hits returned that are incorrect) and false negatives (genuine occurrences that are missed).

Common metrics used in measuring accuracy include recall (the proportion of genuine occurrences that are found), precision (the proportion of reported hits that are correct) and the false positive rate (the number of false positives per unit of audio). All systems have a trade-off between recall and precision or false positive rate: in general a higher recall will be associated with a higher false positive rate and hence also with a lower precision. Aurix audio mining feature has a confidence threshold mechanism to provide control over this trade-off.

4. What factors affect accuracy?

A. Audio mining accuracy is affected by a number of variables, including the audio bandwidth and the nature of the transmission channel, codec quality, and the level and type of background noise. It is important to select the Aurix audio mining configuration that best matches the characteristics of the audio channel and speakers expected in the application. Accuracy is also affected by the search term: in general, accuracy is higher with longer search terms. For any given total term length, words that are longer and more distinct tend to be better than sequences of shorter words.

5. Does the Aurix phonetic search engine support accents, dialects and non-native speakers?

A. Yes! The system is trained with a variety of accents enabling recognition and hence audio mining across a wide range of speech variation.
Talk to the Experts
Talk to our experienced consultants today to release the maximum value from your speech analysis applications.
gopher-it
hound-it
Speech Search Engine