Aurix: speech analytics and phonetic audio mining software

 

Aurix is the world's leading company in phonetic search technology.

 
Products
Services
Markets
Contact Aurix
Company
Partners
Recruitment
Contact Aurix
 
   
 

FAQs

(Frequently Asked Questions)

Below are answers to some of the most commonly asked questions regarding Aurix Limited, our patented speech technology and our current product line-up. Please contact us directly should you have any additional questions.

Aurix Limited Background

Aurix Audio Mining Technology Accuracy

Aurix Audio Mining Technology Real-time Capability

Utilizing Aurix Audio Mining Technology in Partner Applications

Aurix asr

Aurix Ltd. Background

Q. How long has Aurix been developing speech technology?

A. Aurix heritage is based on a legacy of speech technology research started by the U.K. Ministry of Defense in the 1950s. In 1986 this research was consolidated in the ‘Speech Research Unit’ (SRU) at Malvern, England. The SRU was eventually privatized and formed the core of Aurix Ltd. giving Aurix a heritage in excess of 50 years.  (Click here for more details)

Q. Who owns Aurix technology?

A. The speech software developed by Aurix is proprietary to Aurix Ltd.

Q. What range speech technologies does Aurix have experience in?

A. Aurix has significant experience across a very wide range of speech technologies due to its legacy as a key U.K. Government research group. This includes speech recognition, speech synthesis, low bit rate communications, and speech signal analysis among others.  

Q. What speech technologies are currently available from Aurix as products?

A. Aurix currently supplies four products:

Aurix audio miner - to search audio signals for words or phrases. (More details)

Aurix asr - (automatic speech recognition) to identify spoken words for voice

control in demanding applications. (More details)

Aurix speech detector - to identify when somebody is talking. (More details)

Aurix aligner - to automatically align speech and text. (More details)

These are provided as software development kits (SDKs). Each SDK contains a suite of tools and guides for the creation of custom applications with Aurix’s speech engines.

In addition to the four products currently available, Aurix undertakes custom speech technology projects for clients when appropriate.

Q. Who might typically purchase Aurix products?

A. The SDK products can be integrated into a wide variety of applications by Aurix partners for supply to their

end-users, thereby adding value.

Some examples of typical partners are:

  • Suppliers of QA software to call centers who need to employ Aurix audio miner to analyze their call-handling effectiveness;
  • Suppliers to the broadcasting media who want to offer their customers the ability to search through many program feeds for specific topics utilizing Aurix audio miner;
  • Security-related organizations who need to set alerts for inappropriate or illegal telephone traffic utilizing Aurix audio miner;
  • Defense industry suppliers who want to integrate the ability to control a radio by voice into a command unit;

Closed captioning software suppliers who want to supply systems that can automatically align captions (subtitles) with speech utilizing Aurix aligner Video editing software suppliers who want to allow their customers to skip rapidly between speech passages utilizing Aurix speech detector.

Return to Top

Aurix Audio Mining Technology

              General

Q. What type of speech technology does Aurix audio miner use?

A. Aurix uses phonetic technology for its proprietary audio mining engine. This technique works by generating a copy of each audio file that contains an analysis of the audio in terms of the basic speech sounds or “phonemes”. This format allows very rapid searches to be conducted for any spoken sound.

Q. How does this technology work?

A. Each audio file is “ingested” and an Aurix proprietary index file created which lists all the phoneme probabilities identified in the audio. Searches can then be conducted using phoneme strings derived from search words and phrases. The search process uses hidden Markov model (HMM) and dynamic programming algorithms to perform keyword searches on the audio stream. The phonetic design of the system ensures that it has an open vocabulary, and is not limited to words contained within a dictionary.  It also provides exceptional audio ingest speed.

Q. What is the key benefit of this approach to audio mining?

A. Crucially, this approach retains all the phonetic intelligence in the audio signal until search. No information is discarded. This is a key difference between phonetic mining and LVCSR mining where the phonetic intelligence

is discarded when the text-based transcription is generated.

Q. Does the audio file need to be ingested again for each new search?

A. Each audio file only needs to be ingested once and then the index file can be used for as many searches as you like.

Q. Does the technology rely on the use of a dictionary?

A. No! Large vocabulary continuous speech recognition (LVCSR) mining systems may be limited to the set of words in their original dictionary, but the phonetic approach used by Aurix audio miner is not limited to any particular dictionary.

Q. How easy is it to enter search words and phrases?

A. Search terms can be simply entered as words and phrases in plain language. No special training is required.

Q. How does it cope with names and made-up words?

A. As there are no constraints on word spelling or length, Proper names, place names and made-up words can be entered by typing them in as they sound.

Q. Is any grammar required to assist the mining process?

A. No grammar is required.

Q. What languages are supported now and which are planned for the future?

A. U.K. and U.S. English are currently supported.  U.S. Spanish is about to be added with other European languages to follow. Using Aurix technology to create a new language, Aurix merely needs to construct new phonetic models for that language. This process is significantly faster -- and more cost effective -- than a LVCSR system that would additionally require development of grammars and large dictionaries for the new language.

Q. Does your application support accents, dialects and non-native speakers?

A. Yes! The acoustic models of the phonemes used by Aurix audio miner are trained with a variety of accents.

Q. What effect does audio quality have on search accuracy?

A. In general, the higher the audio quality, the higher the accuracy.

Q. What audio formats (and CODECS) can Aurix audio miner work with?

A. Files in any format supported by Windows Media Player 9 can be managed off-line. Special CODECS can also be supported by partner applications.

Q. Do the index files need be compressed further to save on storage?

A. The Index files are already compressed when they are generated and so no further compression is necessary. 

Q. How fast can audio files be ingested?

A. Ingestion runs at up to 80 X real time on a modern PC such as an Intel Core 2 Duo E6600. This means that theoretically one PC could ingest around 80 separate audio feeds simultaneously.

Q. How fast can searches be conducted?

A. A search will run up to 30,000 faster than real time on a modern PC such as an Intel Core 2 Duo E6600. This means that theoretically one PC could search 8 hours of audio data in as little as 1 second.

Q. How is the search speed affected by the length of the search string?

A. Search speed drops slightly with search phrase length.

Q. How is the retrieval accuracy affected by the length of the search string?

A. Accuracy increases with search string length. This is a key benefit of Aurix phonetic search technology.

Return to Top

Aurix Audio Mining Technology

              Accuracy

Q. What level of accuracy can Aurix audio miner engine achieve?

A. Accuracy depends on a number of factors, including audio quality, search term length and partner application.  Average accuracy levels between 80% to 99% can be achieved.

Q. How do you measure the engine accuracy?

A. At the engine level, given a representative set of calls and a random selection of search terms of different lengths, the percentage of occurrences that are correctly identified (% recall) is measured as a function of false alarm rate.

Q. How many calls have to be analyzed to guarantee this level of accuracy?

A. The larger the quantity of representative material that is available, the more reliable the figures will be. A reasonable estimate of accuracy can be obtained based on a total of around 15 minutes of speech.

Q. How does search word length affect the accuracy of the output?

A. Longer search terms typically give better accuracy.

Q. What is your average false positive rate?

A. Aurix audio miner allows a trade-off between false positives and recall rate. This flexibility is passed to the application so that it can choose the optimal setting.

Q. How do you reduce your rate of false positives?

A. There are several techniques that can be used within the application layer to filter out false positives, including choosing long search terms, combining the results for multiple search terms, and incorporating other sources of information e.g. in Call Center applications incorporating meta data from CTI, IVR and CRM systems.

Return to Top

Aurix Audio Mining Technology

              Real-time Capability

Q. Can audio feeds be analyzed in real time?

A. Yes! This is a standard feature of Aurix audio miner

Q. How quickly can the hits be processed and returned to the Partner application?

A. The core Aurix audio miner engine search lag is only about 2 seconds.

Utilizing Aurix Audio Mining technology in Partner Applications

Q. How would a Partner Application communicate with Aurix audio miner?

A. Aurix audio miner has an Application Program Interface (API) that transfers the words and phrases to be searched from the Partner Application to Aurix audio miner.  On completion of the search, the hits are returned to the Partner application.

Q. Can a Partner application search for topics and concepts?

A. Typically the Partner application would conduct a search with a selection of words and phrases that it might combine with Boolean logical expressions and other factors such as call time. The logic to combine multiple searches would be performed by the Partner application.

Q. How many words/terms can an application simultaneously search?

A. There is no theoretical limit. In practice, the maximum search length is dictated by the amount of time and CPU power available.

Q. How lengthy a phrase can be searched?

A. There is no hard limit on the length of a word or phrase.

Return to Top

Aurix asr

Q. What type of core technology does Aurix asr use?

A. Aurix asr is a continuous speech recognizer that uses continuous density sub-word Hidden Markov Models.  It is primarily designed and optimized for use in small to medium vocabulary applications that are used for equipment control applications.

Q. How does this technology work?

A. The speech signal is sampled digitally (if not already in digital form) and then processed to form a stream of speech vectors that are acoustically representative of the speech sounds. The speech vectors are then compared, using a pattern matching technique, with a set of speech models in conjunction with a word grammar (or “syntax”) to determine the words spoken.  The pattern matching technique is statistically based utilizing HMMs (Hidden Markov models).

Q. How are the speech models prepared?

A. Normally a sample of speech from several hundred speakers speaking phrases from the target application is recorded, segmented and then used to generate models of the individual phonemes in the language (there are typically 44 phonemes in U.K. English for example). For general use and initial evaluations, Aurix asr is supplied with a generic set of speech models.

Q. How is a grammar or "syntax" generated?

A. Aurix asr uses a finite state grammar, based on an Augmented BNF W3C ‘Speech Grammar’. More details on this can be found at: (http://www.w3.org/TR/2002/CR-speech-grammar-20020626). This captures the words (or “vocabulary”) and word order found in the phrases used in the application.  Grammars are easily input using a text editor and are then compiled using a supplied BNF grammar compiler. The recognizer loads the grammar at run-time (as well as the speech models).

Q. Can the speech recognizer operate in noisy environments?

A. Aurix asr was specifically designed to operate in demanding military environments and employs proprietary noise tracking technology that delivers robust performance in the presence of high levels of interfering noise. It also employs a patented unsupervised adaptation technique (Spectral Shape Adjustment) that further improves its resilience to performance degradation in the presence of noise.

Q. How big a vocabulary can Aurix asr work with?

A. Theoretically, there is no upper vocabulary limit, but in reality for a command and control application in a demanding environment it would not be wise to use more than a few dozen words, dependent on their confusability, at any single point in the syntax.

Q. How does Aurix cope with different speaker’s voices?

A. The speech models are speaker-independent; that is they are made from the speech of many different people and incorporate the variation of the spoken sounds in their statistics. Additionally, a patented unsupervised adaptation technique Spectral Shape Adjustment (SSA) continually compensates for long-term spectral differences between individual speakers and the speech model set.

Q. What type of microphone can be used?

A. The speech models supplied are for a bandwidth microphone (100Hz - 6KHz). Other model sets for reduced bandwidths are also available. Aurix asr is also generally robust to changes of microphone and microphone position due to the SSA unsupervised adaptation process  that can compensate for changes to channel characteristics.

Q. What platform can Aurix asr run on?

A. Aurix asr is a software speech recognizer coded in C++. It normally runs on a PC under Windows 200/XP. It can be ported to any environment that has a C++ compiler.

Q. What sort of API (applications programming interface) is provided?

A.  The API is Proprietary C++ and includes the press to recognize (PTR) signal  

Return to Top

Aurix aligner

Q. How does Aurix aligner work?

A. Aurix aligner is designed to automatically align the speech within an audio or video file with a text transcript of the speech, so that each item of text can be tagged with the time at which it is spoken in the speech file.

Q. What type of core technology does Aurix aligner use?

A. Aurix aligner uses the same technology as Aurix asr (continuous density sub-word Hidden Markov Models) to identify the start and end times of each word in the text file relative to the audio file (or audio track in a video file).

Q. How does this technology work?

A. A specialized recognition grammar is generated to represent the sequence of words. This grammar is then used by Aurix asr for recognition and the timings at the word boundaries noted. Within certain limits, errors in the transcript can be accommodated using special features of this grammar.

Q. Why does Aurix aligner work well with an audio file that exhibits poor sound quality?

A. Aurix asr was originally developed to work with poor quality speech in military applications and has several features that deliver robust performance in the presence of noise. These include a noise tracking feature and a patented unsupervised adaptation technique (Spectral Shape Adjustment). Aurix aligner includes additional mechanisms specific to the alignment task to boost the performance even further.

Q. How quickly can files be processed?

A. Aurix aligner can process files very quickly and a typical recording will be processed more than 6X faster than real-time on a modern PC.

Q. What file formats does Aurix aligner support?

A. A range of different input media formats are supported, including MPG, AVI and Microsoft(r) Windows Media Format 9 files. Scripts are provided to the system in either plain text or XML format.

The output timings associated with each word are presented in an easily interpretable XML format.

Q. What resources does this require?

A. The Aurix aligner SDK will run on a PC running Windows 2000/XP. 

Return to Top

Aurix speech detector

Q. What type of core technology does Aurix speech detector use?

A. Aurix speech detector is designed to identify any speech-type sound in an audio recording. It does this by utilizing the same technology as Aurix asr (continuous density sub-word Hidden Markov Models) to identify regions of speech and noise (non-speech).

Q. How does this technology work?

A. The speech signal is compared with models of both speech and noise and the best match is used to segment regions of speech and noise. 

Q. Why does this work well in the presence of different types of noise?

A. The use of a statistically-based speech recognition algorithm to discern regions of speech and noise is much more powerful than more simple and traditional filter and energy-based routines. The speech recognition algorithm is tuned to detect some of the fundamental and inherent characteristics of speech, which makes it highly resistant to false alarms from some structured sounds such as music as well as unstructured noise. This also makes it ideal for robust performance in military applications.

Q. Is Aurix speech detector language dependent?

A. Aurix speech detector uses models that have been trained from speech in many languages and is thus language-independent.

Q. What resources does this require?

A. The Aurix speech detector SDK will run on a PC running Windows 2000/XP. With a Pentium 3, 1.5GHz, it will typically run faster than 30X real-time.

 

Return to Top

 
 
© 2008 Aurix Limited. All rights reserved.
Home | Products | Services | Markets | Contact Aurix | Company