Monday, January 16, 2012

Speaking of Speech Recognition... Check out Julius, the OSS high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder

Julius" is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit such as HTK, CMU-Cam SLM toolkit, etc.

The main platform is Linux and other Unix workstations, and also works on Windows. Most recent version is developed on Linux and Windows (cygwin / mingw), and also has Microsoft SAPI version. Julius is distributed with open license together with source codes.

...

Features

    An open-source software (see terms and conditions of license)
    Real-time, hi-speed, accurate recognition based on 2-pass strategy.
    Low memory requirement: less than 32MBytes required for work area (<64MBytes for 20k-word dictation with on-memory 3-gram LM).
    Supports LM of N-gram, grammar, and isolated word.
    Language and unit-dependent: Any LM in ARPA standard format and AM in HTK ascii hmmdefs format can be used.
    Highly configurable: can set various search parameters. Also alternate decoding algorithm (1-best/word-pair approx., word trellis/word graph intermediates, etc.) can be chosen.
    Full source code documentation and manual in Engligh / Japanese.
    List of major supported features:
        On-the-fly recognition for microphone and network input
        GMM-based input rejection
        Successive decoding, delimiting input by short pauses
        N-best output
        Word graph output
        Forced alignment on word, phoneme, and state level
        Confidence scoring
        Server mode and control API
        Many search parameters for tuning its performance
        Character code conversion for result output.
        (Rev. 4) Engine becomes Library and offers simple API
        (Rev. 4) Long N-gram support
        (Rev. 4) Run with forward / backward N-gram only
        (Rev. 4) Confusion network output
        (Rev. 4) Arbitrary multi-model decoding in a single thread.
        (Rev. 4) Rapid isolated word recognition
        (Rev. 4) User-defined LM function embedding


Read more: Greg's Cool [Insert Clever Name] of the Day
Read more: Open-Source Large Vocabulary CSR Engine Julius
QR: en_index.php?q=index-en.html

Posted via email from Jasper-Net