KINDAI UNIVERSITY


*A space between the first name and last name, please enter

YAMAMOTO Hirofumi

Profile

FacultyDepartment of Informatics / Graduate School of Science and Engineering Research
PositionProfessor
Degree
Commentator Guidehttps://www.kindai.ac.jp/meikan/449-yamamoto-hirofumi.html
URL
Mail
Last Updated :2020/09/30

Education and Career

Academic & Professional Experience

  •   2009 ,  - 2010 , National Institute of Information and Communications Technology

Research Activities

Research Areas

  • Informatics, Intelligent informatics

Misc

  • Translation quality analysis of Japanese-English translation sentence based on statistical translation evaluation criterion, TSUBAKI Hajime, YASUDA Keiji, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IPSJ SIG Notes, 2008, 33, 99, 103,   2008 03 28 , http://ci.nii.ac.jp/naid/110006825050
    Summary:A possibility of objective study rating of Japanese learners' Japanese-English translation sentences was examined using evaluation criteria for statistical machine translation. Two statistic values were designed for the evaluation. One index was a ratio of N-gram between a learner's translation sentence and a correct sentence to measure English characteristics in each sentence. The other index was a ratio of word translation probability between the two sentences to measure characteristics as the translation. Correspondence relation between subjective evaluation score for Japanese-English translated sentences and the statistic values of the indexes were analyzed. As for the index of translation characteristics, a clear correlating relationship was observed. The result showed an application possibility of machine translation evaluation criteria to measure translation characteristics of translation in objective translation evaluation.
  • Speech recognition of unregistered expressions, TOMITA Tatsuhiko, OKIMOTO Yoshiyuki, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IPSJ SIG Notes, 2005, 69, 117, 122,   2005 07 16 , http://ci.nii.ac.jp/naid/110002952523
    Summary:Aiming at speech recognition with arbitrary OOV expressions, speech recognition experiments were carried out using speech with OOV expressions consisting of multiple words. For a TV program retrieval task, a hierarchical language model was newly composed using conventional word class N-grams as an upper layered model and two lower layered models consisting of word N-grams for an OOV expressions (TV program names) and a statistical phonotactic word-structure model for OOV words of another class (personal name). Speech recognition experiment results showed reasonable performance of word N-grams for OOV expressions and no serious interference between two OOV models, which confirms the availability of a hierarchical OOV model with word-level statistics.
  • Out-of-Vocabulary Word Recognition with a Hierarchical Language Model Using Multiple Markov Model, YAMAMOTO Hirofumi, KOKUBO Hiroaki, KIKUI Genichiro, OGAWA Yoshihiko, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 87, 12, 2104, 2111,   2004 12 01 , http://ci.nii.ac.jp/naid/110003203161
    Summary:本論文では,タスクに依存して生じる未登録語の問題を解決するための言語モデルを提案する.新規のタスクに言語モデルを対応させる手法としては,言語モデル適応が一般的であるが,この手法では,タスクに依存して現れる固有名詞等の未登録語の問題には対応できない.本論文ではこの問題に対し,階層化言語モデルを用いて解決する.階層化モデルでは,単語間遷移確率と未登録語の音韻列生起確率を与える制約として,独立した二つのマルコフモデルを用いており,未登録語の生起確率は両者を組み合わせた二重マルコフモデルの形で表現される.アポイントメントを目的とした日本語音声対話データを用いた音声認識実験を行った.この結果,一つ以上の未登録語を含む文に対し,未登録語に対する対策を施さなかった場合の単語正解精度78.2%に対し単語正解精度86.7%が得られ,認識誤りのうちの34.4%が救済され,有効性が確認できた.
  • Factoring Table Approximation and Garbage Collection of Useless Word Hypotheses for Continuous Speech Recognition, KOKUBO Hiroaki, HAYASHI Teruaki, YAMAMOTO Hirofumi, KIKUI Genichiro, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 86, 6, 787, 795,   2003 06 01 , http://ci.nii.ac.jp/naid/110003170940
    Summary:連続音声認識システムの省メモリ化を目的として,単語グラフに登録される単語仮説の仮説数削減と木構造辞書のfactoringテーブルのメモリ削減について検討した.単語グラフ内の単語仮説に対して後続仮説数に関する属性をもたせることにより,pruningで不要となった単語仮説を効率よく特定し,ガーベジコレクションを行う.この処理により,単語仮説生成に必要なメモリサイズを127MByteから6.9MByteへと削減した.また,factoringテーブルに格納するbigramの値をPOS bigramで近似することによって,認識性能をほとんど劣化させることなくfactoringテーブルのメモリサイズを56MByteから19MByteへと削減することができた.これらのメモリ削減を行った結果,デコーダの消費メモリサイズは246MByteから113MByteへと削減された.
  • Statistical Language Model Adaptation with Additional Text Generated by Machine Translation, NAKAJIMA Hideharu, YAMAMOTO Hirofumi, WATANABE Taro, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 86, 4, 460, 467,   2003 04 01 , http://ci.nii.ac.jp/naid/110003170908
    Summary:統計的な言語モデルのタスク適応には,そのモデルを利用するタスク(ターゲットタスク)での小規模コーパスが必要となる.このコーパスは言語モデルと同じ言語で書かれていることが必要となる.しかし,この小規模なコーパスでさえ,特に話し言葉のコーパスはコストの点で集めることが困難な場合がある.この問題を解決するために,本論文では,ある一つの言語で書かれたターゲットタスクのコーパスを,タスク適応が必要な言語モデルと同じ言語に機械翻訳し,その翻訳結果をターゲットタスクのコーパスとして利用して,言語モデルの適応化を行う方法を提案する.このとき,翻訳知識の中に保存されていると考えられる統計的言語モデルに必要な隣接単語に関する情報を,翻訳によって取り出し適応に利用する.旅行会話文を対象とし,単語パープレキシティを評価尺度とする実験において,本手法による適応後の言語モデルの改善率は,人手で書かれたコーパスを使って適応を行った場合の改善率のおよそ半分まで達成でき,適応に必要なコーパス生成の新しい試みという本提案手法の有効性が確認された.
  • Integration of speech recognition results and rejection of error utterance based on ROVER method, KOKUBO Hiroaki, YAMAMOTO Hirofumi, KIKUI Genichiro, 日本音響学会研究発表会講演論文集, 2003, 1, 91, 92,   2003 03 18 , http://ci.nii.ac.jp/naid/10018034938
  • Combining FSA's and N-grams for improving sentence recognition rate in LVCSR, ONISHI Shigehiko, KIKUI Genichiro, YAMAMOTO Hirofumi, 日本音響学会研究発表会講演論文集, 2003, 1, 203, 204,   2003 03 18 , http://ci.nii.ac.jp/naid/10018035227
  • Combining Outputs of Multiple LVCSR Models : Evaluation on Travel Conversational Speech, WATANABE Tomohiro, YAMAMOTO Hirofumi, KOKUBO Hiroaki, KIKUI Genichiro, NISHIZAKI Hiromitsu, KODAMA Yasuhiro, UTSURO Takehito, NAKAGAWA Seiichi, 日本音響学会研究発表会講演論文集, 2003, 1, 209, 210,   2003 03 18 , http://ci.nii.ac.jp/naid/10018035250
  • Mis - recognized Utterance Detection Using Multiple Language Models Generated by Clustered Sentences, FUJINAGA Katsuhisa, KOKUBO Hiroaki, YAMAMOTO Hirofumi, KIKUI Genichiro, SHIMODAIRA Hiroshi, IPSJ SIG Notes, 2002, 121, 171, 176,   2002 12 16 , http://ci.nii.ac.jp/naid/110002913696
    Summary:In this paper, we propose a new method that detects mis-recognized utterances, based on voting scheme like ROVER. ROVER has two serious problems, 1) it is difficult to construct multiple speech recognition systems (SRSs), 2) calculation cost increases according to the number of SRSs. In contrast to the conventional ROVER, the proposed method uses multiple language models (LMs), general LM and sub LMs generated by clustered sentence, instead of different SRSs. Speech recognition with sub LMs-is proceeded by rescoring, instead of parallel decoding. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall from baseline, and 22-point higher precision with 20% loss of recall.
  • Mis-recognized Utterance Detection Using Multiple Language Models Generated by Clustered Sentences, FUJINAGA Katsuhisa, KOKUBO Hiroaki, YAMAMOTO Hirofumi, KIKUI Genichiro, SHIMODAIRA Hiroshi, IEICE technical report. Natural language understanding and models of communication, 102, 528, 7, 12,   2002 12 13 , http://ci.nii.ac.jp/naid/110003277982
    Summary:In this paper, we propose a new method that detects mis-recognized utterances, based on voting scheme like ROVER. ROVER has two serious problems, 1) it is difficult to construct multiple speech recognition systems (SRSs), 2) calculation cost increases according to the number of SRSs. In contrast to the conventional ROVER, the proposed method uses multiple language models (LMs), general LM and sub LMs generated by clustered sentence, instead of different SRSs. Speech recognition with sub LMs is proceeded by rescoring, instead of parallel decoding. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall from baseline, and 22-point higher precision with 20% loss of recall.
  • Mis-recognized Utterance Detection Using Multiple Language Models Generated by Clustered Sentences, FUJINAGA Katsuhisa, KOKUBO Hiroaki, YAMAMOTO Hirofumi, KIKUI Genichiro, SHIMODAIRA Hiroshi, IEICE technical report. Speech, 102, 530, 7, 12,   2002 12 13 , http://ci.nii.ac.jp/naid/110003295598
    Summary:In this paper, we propose a new method that detects mis-recognized utterances, based on voting scheme like ROVER. ROVER has two serious problems, 1) it is difficult to construct multiple speech recognition systems (SRSs), 2) calculation cost increases according to the number of SRSs. In contrast to the conventional ROVER, the proposed method uses multiple language models (LMs), general LM and sub LMs generated by clustered sentence, instead of different SRSs. Speech recognition with sub LMs is proceeded by rescoring, instead of parallel decoding. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall from baseline, and 22-point higher precision with 20% loss of recall.
  • A Language Model Adaptation Considering Both of Topic and Sentence Style Variances, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 85, 8, 1284, 1290,   2002 08 01 , http://ci.nii.ac.jp/naid/110003184247
    Summary:本論文では対話における話題の違いとともに,話者の立場の違いに起因する文型の違いをも適応対象として考慮した言語モデルの適応方法を提案する.適応モデルは二つの異なった適応元データから作られ,そのうち一つは話題のみが一致した適応元データであり,もう一つは文型のみが一致したデータである.これら二つの適応元データに対し,新たに提案された単語クラスタリングの手法を適用することによって,話題依存及び文型依存の単語が各々抽出される.この単語クラスタリングにおいては,二つの適応元データそれぞれにおける単語の接続性は別の特性とみなされ,その二つの特性を同時に考慮してクラスタリングすることによって,話題依存,文型依存及び話題文型非依存の単語は別々の単語クラスに分類されることになる.続いて,これらの単語クラスに基づくクラスN-gramに対し,話題,文型ともに一致した少量の適応先データを用いた適応が行われる.適応においては,クラスN-gramをベースとする適応であるため,従来法である単語N-gramをベースとする適応では適応がうまく行われない適応先データにおける未観測データに対しても適応効果が見込め,少量の適応先データに対する効果的な適応を行うことができる.実験においても,提案法は従来の単語N-gramをベースとし,かつ,一つの適応要素しか取り扱わない従来法と比べ,パープレキシティで14.8%,連続単語認識における誤認識率で8.7%低い値を示し,また,クラスN-gramをベースとした場合に比べてもパープレキシティで4.5%,連続単語認識における誤認識率で3.5%低い値を示し,有効性が確認できた.
  • Efficient Decoding Method for OOV Words Recognition with Subword Models, KOKUBO Hiroaki, ONISHI Shigehiko, YAMAMOTO Hirofumi, KIKUI Genichiro, Transactions of Information Processing Society of Japan, 43, 7, 2082, 2090,   2002 07 15 , http://ci.nii.ac.jp/naid/110002771189
    Summary:Class dependent subword models were found to be effective for recognizing OOV (out-of-vocabulary) words. This paper proposes a novel decoder that efficiently handles the models. Compared with previous decoder, the proposed method achieves language model size of 1/40, and 46% reduction in CPU time without any deterioration of performance. Then, using the structure of subword networks, we examine feature parameters of subword models, which are applied to Japanese family/personal name. The result of speech recognition for OOV words indicates that by using of additional characteristics (e.g., duration or occurrence probability in word-end), the number of correctly recognized OOV words was improved by about 15%.
  • Structured Language Modeling and Its Implementation, YAMAMOTO Hirofumi, ONISHI Shigehiko, KOKUBO Hiroaki, SAGISAKA Yoshinori, IEICE technical report. Speech, 102, 108, 49, 54,   2002 05 24 , http://ci.nii.ac.jp/naid/110003298235
    Summary:A structured language model and its implementation are proposed to resolve the OOV problem in continuous speech recognition. This model consists of two sub-models. The first sub-model is an inter-word model, which gives constraints for the OOV context. The second sub-model is an intra-word model, that gives constraints for OOV internal phone structure. These two sub-models can give not only OOV position information but also its category information. These two forms of information are very important for back-end processing, i.e., translation. In our speech recognition experiment for Japanese family and given name OOV, the proposed method was comparable in performance with the conventional vocabulary method.
  • Reduction of number of word hypotheses using the management function of successive hypotheses, KOKUBO Hiroaki, HAYASHI Teruaki, YAMAMOTO Hirofumi, KIKUI Genichiro, 日本音響学会研究発表会講演論文集, 2002, 1, 163, 164,   2002 03 18 , http://ci.nii.ac.jp/naid/10018033328
  • A Statistical Language Model for Conversational Speech Reflecting the Previous Utterance of the Other Participant, YAMAMOTO Hirofumi, TANIGAKI Koichi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 84, 12, 2507, 2514,   2001 12 01 , http://ci.nii.ac.jp/naid/110003184208
    Summary:音声翻訳における音声認識のための言語モデルの構築手法を提案する.提案する言語モデルでは相手話者の直前の発話内容を反映することによって音声認識の性能向上を図る.相手話者の直前の発話内容はC-star(The Spoken Language Translation Research Group)で一般的に用いられている中間言語表現で表現され, この中間言語表現依存の言語モデルを利用する.ホテル予約対話を対象とした実験により約5.4%(14.7%から13.9%)単語誤認識率の削減ができ, 提案するモデルの有用性が確認できた.
  • The Statistical Language Model for Utterance Splitting in Speech Recognition, NAKAJIMA Hideharu, YAMAMOTO Hirofumi, Transactions of Information Processing Society of Japan, 42, 11, 2681, 2688,   2001 11 15 , http://ci.nii.ac.jp/naid/110002726049
    Summary:In spontaneous dialogs, there are utterances containing several sentences. Although speech recognizers process utterances one by one, language processing such as understanding, translation or summarization needs to split. utterances into sentences. This paper presents utterance splitting by recognizing periods, i. e. , sentence boundaries, as well as usual words. We evaluate the performance of the model in terms of splitting and word(except for periods)accuracy. Experimental results show high recall/precision rates of splitting(the highest scores are 94%/100%)and no reduction of other word accuracy, proving the applicability of the proposed method.
  • Hierarchical Language Model Incorporating Probabilistic Description of Vocabulary in Classes, TANIGAKI Koichi, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 84, 11, 2371, 2378,   2001 11 01 , http://ci.nii.ac.jp/naid/110003184194
  • Efficient decoding method for OOV words recognition with sub-word models., KOKUBO Hiroaki, ONISHI Shigehiko, YAMAMOTO Hirofumi, KIKUI Genichiro, 日本音響学会研究発表会講演論文集, 2001, 2, 175, 176,   2001 10 01 , http://ci.nii.ac.jp/naid/10007458441
  • Sub-word modeling of OOVs those arise from two word categories in continuous speech recognition, ONISHI Shigehiko, KOKUBO Hiroaki, YAMAMOTO Hirofumi, KIKUI Genichiro, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2001, 2, 183, 184,   2001 10 01 , http://ci.nii.ac.jp/naid/10007458461
  • DECODING WITH SUB-WORD NETWORK MODELS FOR OUT-OF-VOCABULARY WORDS RECOGNITION, Kokubo Hiroaki, Onishi Shigehiko, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 101, 156, 7, 12,   2001 06 22 , http://ci.nii.ac.jp/naid/110003298006
    Summary:This paper proposes a novel decoder, which cope with sub-word models effectively. Sub-word models are devised for recognition of out-of-vocabulary words. Former implementation, which handles sub-word models as registered words, enlarges language model and consumes lots of computational resources. We proposed a structured decoding method, which applies sub-word network models. The proposed method makes it possible to search efficiently. Comparing between former implementation and proposed one, the proposed method achieves 90% reduction in language model size, and 50% reduction in CPU time without any deterioration of performance.
  • Multi-Class Composite N-gram Language Model Using Multiple Word Clusters and Word Successions, YAMAMOTO Hirofumi, ISOGAI Shuntarou, SAGISAKA Yoshinori, IEICE technical report. Speech, 101, 156, 13, 18,   2001 06 22 , http://ci.nii.ac.jp/naid/110003298008
    Summary:In this paper, a new language model, the Multi-Class Comppsite N-gram, is proposed to avoid a data sparseness problem in small amount of training data. The Multi-Class Composite N-gram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, so-called Multi-Classes. In the Multi-Class, the statistical connectivity at each position of the N-grams is regarded as word attributes, and one word cluster each is created to represent positional attributes. Furthermore, by introducing higher order word N-grams through the grouping of frequent word successions, Multi-Class N-grams are extended to Multi-Class Composite N-grams. In experiments, the Multi-Class Composite N-grams result in 9.5% lower perplexity and a 16% lower word error rate in speech recognition with a 40% smaller parameter size than conventional word 3-grams.
  • Out-of-vocabulary word modeling by using sub-word units in large-vocabulary continuous speech recognition, Onishi Shigehiko, Kokubo Hiroaki, Yamamoto Hirofumi, Sagisaka Yoshinori, Technical report of IEICE. EA, 101, 31, 33, 39,   2001 04 20 , http://ci.nii.ac.jp/naid/110003284739
    Summary:A structured language model (STLM) is proposed to cope with out-of-vocabulary (OOV) words coming from multiple word-classes. The STLM aims at independently modeling the classes without interference and identifying the class of words arising from multiple word-classes. The STLM consists of the conventional word-class N-gram and the sets of the independent-trained class-specific sub-word N-grams. We made an experimental language model by using STLM for the two similar proper-noun classes and performed the speech recognition experiments. The results show that any OOV word of the one class is never misrecognized as that of the other class. The results show that the STLM could integrate the multiple different statistical language models with no interference.
  • Out-of-vocabulary word modeling by using sub-word units in large-vocabulary continuous speech recognition, Onishi Shigehiko, Kokubo Hiroaki, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 101, 32, 33, 39,   2001 04 20 , http://ci.nii.ac.jp/naid/110003298022
    Summary:A structured language model (STLM) is proposed to cope with out-of-vocabulary (OOV) words coming from multiple word-classes. The STLM aims at independently modeling the classes without interference and identifying the class of words arising from multiple word-classes. The STLM consists of the conventional word-class N-gram and the sets of the independent-trained class-specific sub-word N-grams. We made an experimental language model by using STLM for the two similar proper-noun classes and performed the speech recognition experiments. The results show that any OOV word of the one class is never misrecognized as that of the other class. The results show that the STLM could integrate the multiple different statistical language models with no interference.
  • A language model reflected by the relation between phrase structures, JITSUHIRO Takatoshi, YAMAMOTO Hirofumi, YAMADA Setsuo, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2001, 1, 195, 196,   2001 03 01 , http://ci.nii.ac.jp/naid/10007456073
  • A Language Model Adaptation Considering Multiple Dimensionality of Domain, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2001, 1, 199, 200,   2001 03 01 , http://ci.nii.ac.jp/naid/10007456088
  • A Continuous Speech Recognition System for Conversational Speech, NAITO Masaki, YAMAMOTO Hirofumi, SINGER Harald, NAKAJIMA Hideharu, NAKAMURA Atsushi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 84, 1, 31, 40,   2001 01 01 , http://ci.nii.ac.jp/naid/110003183940
    Summary:対話音声を対象とした音声認識システムATRSPRECの試作・評価を行った.システム構築にあたり, 対話音声認識において認識性能劣化の大きな要因となる話者の発話様式の変化に対して頑健な音声認識を実現するため, 発話様式依存音響モデルを用い, 認識と同時に各発話に対して最適な音響モデルを動的に選択することで, 発話様式の変化に対するオンライン適応を実現した.日英音声翻訳システムを通した対話音声を用いた音声認識実験によりシステムの認識性能の評価を行った.対話データの解析の結果, 音声認識システム利用者がシステムに慣れるにつれ, 発話様式に変化が見られたが, 発話様式依存音響モデルの動的選択を行うことで, 自然発話, 朗読音声用音響モデル各々を単独で用いた場合の誤認識が約13%削減され, 発話様式の変化に伴う音声認識性能の劣化が改善された.
  • Word Categorization by MAP Interpolating Part-of-Speech and Word Connectivity Statistics, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 84, 1, 41, 47,   2001 01 01 , http://ci.nii.ac.jp/naid/110003183941
    Summary:本論文ではコンパクトなモデルサイズでありながら, 次単語予測精度, スパースデータに対する信頼性, タスクのずれに対する頑健さを兼ね備えた言語モデルを提案する.このモデルではMAP推定により単語と品詞情報を連続的に補間した新しい単語特徴量に基づいて単語のクラス分類が行われる.このモデルは単語N-gramに対して50%モデルサイズを縮小しておりながら, 単語N-gramに比べパープレキシティにおいては訓練セットと同一タスクでは3%, 異なるタスクでは15%低く, 更に連続単語認識の結果においては, それぞれ16%, 及び28%単語誤認識率が低くなっている.
  • The efficient method of automatic clustering for Multi-Class Trigrams, ISOGAI Shuntaro, SHIRAI Katsuhiko, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IPSJ SIG Notes, 34, 221, 225,   2000 12 21 , http://ci.nii.ac.jp/naid/110002917209
    Summary:In this paper, an efficient automatic word clustering method is proposed for Multi-Class Trigrams. The third position words in the trigrams are directly clustered using 'word trigram approximation by DUAME Language Modeling'. Therefore, conventional word-history clustering is not required. The Multi-Class Trigrams based on these classes showed better performance both in perplexity and recognition rates compared to conventional word trigrams. Additionally the parameter size can be reduced down to one percent.
  • The efficient method of automatic clustering for Multi-Class Trigrams, ISOGAI Shuntaro, SHIRAI Katsuhiko, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IEICE technical report. Natural language understanding and models of communication, 100, 521, 103, 107,   2000 12 15 , http://ci.nii.ac.jp/naid/110003278641
    Summary:In this paper, an efficient automatic word clustering method is proposed for Multi-Class Trigrams. The third position words in the trigrams are directly clustered using 'word trigram approximation by DUAME Language Modeling'. Therefore, conventional word-history clustering is not required. The Multi-Class Trigrams based on these classes showed better performance both in perplexity and recognition rates compared to conventional word trigrams. Additionally the parameter size can be reduced down to one percent.
  • The efficient method of automatic clustering for Multi-Class Trigrams, ISOGAI Shuntaro, SHIRAI Katsuhiko, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IEICE technical report. Speech, 100, 523, 103, 107,   2000 12 15 , http://ci.nii.ac.jp/naid/110003297846
    Summary:In this paper, an efficient automatic word clustering method is proposed for Multi-Class Trigrams. The third position words in the trigrams are directly clustered using 'word trigram approximation by DUAME Language Modeling'. Therefore, conventional word-history clustering is not required. The Multi-Class Trigrams based on these classes showed better performance both in perplexity and recognition rates compared to conventional word trigrams. Additionally the parameter size can be reduced down to one percent.
  • Multi-Class Composite N-gram Language Model Based on Connection Direction, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers. D-II, 83, 11, 2146, 2151,   2000 11 25 , http://ci.nii.ac.jp/naid/110003183824
    Summary:効率的な単語のクラス化に基づく, コンパクトで信頼性の高い音声認識のための言語モデルを生成する手法を提案する.本手法では直前及び, 直後の単語への接続性を別の属性としてとらえ, 各単語に対してその属性ごとに別々のクラス化を行う.これによって得られる多重の単語クラスは前後に接続している単語の分布に基づいて各々独立に作成されるため, 効率的でかつ信頼性の高いクラス分類となっている.提案手法を連鎖語の導入による可変長N-gramに適用した多重クラス複合N-gramにおいては, 10分の1のエントリサイズで, 従来の品調及び可変長単語列の複合N-gramを上回る単語認識率を示した.
  • Evaluation of Confidence Measure Using Number of Hypothesis, OKIMOTO Yoshiyuki, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2000, 2, 75, 76,   2000 09 01 , http://ci.nii.ac.jp/naid/10005116109
  • Speech Start-Point Detection using Vowel and Non-Speech HMMs, YAMAMOTO Hirofumi, SINGER Harald, 日本音響学会研究発表会講演論文集, 2000, 1, 137, 138,   2000 03 01 , http://ci.nii.ac.jp/naid/10004961357
  • TO EXPLOIT LONG HISTORY UNIT DEPENDENCIES BY LINKGRAM LANGUAGE MODELING, ZHANG Shuwu, YAMAMOTO Hirofumi, SAGISAKA Yosinori, 日本音響学会研究発表会講演論文集, 2000, 1, 171, 172,   2000 03 01 , http://ci.nii.ac.jp/naid/10004961445
  • A Language Model for Conversational Speech Using an Interlingal Representation of the Previous Utterance of the Other Participant, YAMAMOTO Hirofumi, TANIGAKI Koichi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2000, 1, 175, 176,   2000 03 01 , http://ci.nii.ac.jp/naid/10004961456
  • A statistical language model integrating class-dependent OOV models, TANIGAKI Koichi, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 2000, 1, 177, 178,   2000 03 01 , http://ci.nii.ac.jp/naid/10004961461
  • Linkgram Language Modeling, Zhang Shuwu, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 99, 577, 49, 54,   2000 01 21 , http://ci.nii.ac.jp/naid/110003297557
    Summary:This paper describes a linkgram language modeling approach to exploit longer distance linguistic correlations between units. Based on the annotated language corpus, we can extract some syntactic link dependencies between longer history units. In order to integrate these linguistic constraints effectively, we further simplify the link features as syntactic trigger pairs by pruning some unnecessary attributes. Then, the exponential parameters of the link features can be estimated binding with a previous proposed DUAME model by maximum entropy approach. Experimental results showed that linkgram modeling is very useful in complementing underlying language modeling.
  • CLASSDEPENDENTSUBWORD-MODELSFOROUT-OF-VOCABULARYWORDSRECOGNITION, TANIGAKI Koichi, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IEICE technical report. Natural language understanding and models of communication, 99, 524, 49, 54,   1999 12 21 , http://ci.nii.ac.jp/naid/110003278539
    Summary:本稿では,未登録語を含む音声の高精度な認識を可能とする言語モデルを提案する.単語のクラスN-gramをベースとする本言語モデルは,未登録語区間に対し,その語粟クラスの読みの統計的特徴を反映したサブワードモデルを用いる点を特徴とする.また,未登録語区間の認識結果として,クラスラベル付きの読みが与えられるため,後段の言語処理が容易になっている.本方式を日本人姓・名の両クラスに適用し検討を行った.日本人姓・名データの分析結果に基づき,サブワードモデルは,単語長(モーラ数)のガンマ分布と,自動獲得したサブワード単位のN-gramとによる統合モデルとして構築した.音声認識実験の結果,登録語として認識を行った場合とほぼ同等の精度で,未登録語の区間・読み・クラスを同定できることがわかった.
  • CLASS DEPENDENT SUBWORS-MODELS FOR OUT-OF-VOCABULARY WORDS RECOGNITION, Tanigaki Koichi, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 99, 526, 49, 54,   1999 12 21 , http://ci.nii.ac.jp/naid/110003297485
  • CLASS DEPENDENT SUBWORD-MODELS FOR OUT-OF-VOCABULARY WORDS RECOGNITION, TANIGAKI Koichi, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IPSJ SIG Notes, 29, 181, 186,   1999 12 20 , http://ci.nii.ac.jp/naid/110002945490
    Summary:A new language model is proposed for OUT-Of-Vocabulary (OOV) words to cope with inevitable demands for the recognition of prpoer nouns not registered in the lexicon. Multiple subword models are created for each lexical class where OOVs sre predicted, so that models are embedded in a class N-gram language model. The efficiency of this modeling is evaluated on Japanese family names and personal names. Speech recognition experiments show that the proposed method achieves 70% recall accuracy for OOV Japanese names, where recall is defined as correct identification of readings, classes, and locations, simultaneously. This is nearly equal to the plausible upper bound of 73% achieved under in-vocabulary condition.
  • Part-of-Speech Class N-gram and Word N-gram Fused Language Model, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1999, 2, 79, 80,   1999 09 01 , http://ci.nii.ac.jp/naid/10004963825
  • Evaluation of TARSPREC for travel arrangement task, NAITO Masaki, SINGER Harald, YAMAMOTO Hirofumi, NAKAJIMA Hideharu, MATSUI Tomoko, TSUKADA Hajime, NAKAMURA Atsushi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1999, 2, 113, 114,   1999 09 01 , http://ci.nii.ac.jp/naid/10004963923
  • Utterance Splitting in Realtime Spontaneous Speech Recognition, NAKAJIMA Hideharu, YAMAMOTO Hirofumi, 日本音響学会研究発表会講演論文集, 1999, 2, 147, 148,   1999 09 01 , http://ci.nii.ac.jp/naid/10004964020
  • PART-OF-SPEECH N-GRAM AND WORD N-GRAM FUSED LANGUAGE MODEL, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 99, 121, 73, 78,   1999 06 18 , http://ci.nii.ac.jp/naid/110003297318
    Summary:In this paper, an accurate and compact language model is proposed to cope robustly with data sparseness and task dependencies. This language model adopts new categories which are generated by continuously interpolating POS word-class categories and word categories using MAP estimation. This modeling reduces the model size to 50% of the conventional models. The bi-directional word-cluster N-grams generated by this modeling have 3% lower perplexity measured on a matched domain and 15% lower on a mismatched domain compared to a conventional word N-gram. More importantly, the word error rate for continuous word recognition was reduced by 16% for matched and 28% for mismatched domain.
  • PARALLEL JAPANESE/ENGLISH SPEECH RECOGNITION IN ATRSPREC, SINGER Harald, YAMAMOTO Hirofumi, 日本音響学会研究発表会講演論文集, 1999, 1, 167, 168,   1999 03 01 , http://ci.nii.ac.jp/naid/10002750002
  • MULTI CLASS COMPOSITE N-GRAM LANGUAGE MODEL BASED ON CONNECTION DIRECTION, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, IPSJ SIG Notes, 24, 49, 54,   1998 12 10 , http://ci.nii.ac.jp/naid/110002948716
    Summary:A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple(two-dimensional)word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller logical parameter size than conventional word 2-grams.
  • MULTI CLASS COMPOSITE N-GRAM LANGUAGE MODEL BASED ON CONNECTION DIRECTION, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Natural language understanding and models of communication, 98, 460, 49, 54,   1998 12 10 , http://ci.nii.ac.jp/naid/110003278445
    Summary:A new word-clustering rechnique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple(two-dimensional)word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller logical parameter size than conventional word 2-grams.
  • MULTI CLASS COMPOSITE N-GRAM LANGUAGE MODEL BASED ON CONNECTION DIRECTION, Yamamoto Hirofumi, Sagisaka Yoshinori, IEICE technical report. Speech, 98, 462, 49, 54,   1998 12 10 , http://ci.nii.ac.jp/naid/110003296833
    Summary:A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple(two-dimensional)word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller logical parameter size than conventional word 2-grams.
  • Multi-Class N-gram Model Based on Connection Direction, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1998, 2, 75, 76,   1998 09 01 , http://ci.nii.ac.jp/naid/10004298096
  • Effective Expression of Language Model Using Multi-Class N-gram, YAMAMOTO Hirofumi, 日本音響学会研究発表会講演論文集, 1998, 2, 77, 78,   1998 09 01 , http://ci.nii.ac.jp/naid/10004298101
  • Grammatical word graph generation by integrating grammar and statistical language model, TSUKADA H, YAMAMOTO H, TAKEZAWA T, SAGISAKA Y, 日本音響学会研究発表会講演論文集, 1998, 1, 35, 36,   1998 03 01 , http://ci.nii.ac.jp/naid/10002745868
  • Control and Structure of Recognition Subsystem in the ATR-MATRIX Japanese-English Speech Translation System, YAMAMOTO Hirofumi, SINGER Harald, REAVES Ben, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1998, 1, 161, 162,   1998 03 01 , http://ci.nii.ac.jp/naid/10002746208
  • Reliable utterance segment recognition by integrating grammar and statistical language constraints, Tsukada Hajime, Yamamoto Hirofumu, Takezawa Toshiyuki, Sagisaka Yoshinori, IEICE technical report. Natural language understanding and models of communication, 97, 440, 49, 54,   1997 12 12 , http://ci.nii.ac.jp/naid/110003278254
    Summary:This paper proposes a novel approach to recognize partial segments of utterance with high confidence instead of the complete utterance. The proposed method is based on a cooperative use of conventional n-gram constraints and additional grammatical constraints, applied to utterances considering insertions, deletions and substitutions. To apply the grammatical constraints efficiently, the constraints described by a context-free grammar are approximated in the form of a finite-state automaton. Through an experiment, it has been confirmed that the proposed method can recognize partial segments of an utterance with a higher reliability than conventional continuous speech recognition methods using only n-grams.
  • Reliable utterance segment recognition by integrating grammar and statistical language constraints, Tsukada Hajime, Yamamoto Hirofumu, Takezawa Toshiyuki, Sagisaka Yoshinori, IEICE technical report. Speech, 97, 442, 49, 54,   1997 12 12 , http://ci.nii.ac.jp/naid/110003296053
    Summary:This paper proposes a novel approach to recognize partial segments of utterance with high confidence instead of the complete utterance. The proposed method is based on a cooperative use of conventional n-gram constraints and additional grammatical constraints, applied to utterances considering insertions, deletions and substitutions. To apply the grammatical constraints efficiently, the constraints described by a context-free grammar are approximated in the form of a finite-state automaton. Through an experiment, it has been confirmed that the proposed method can recognize partial segments of an utterance with a higher reliability than conventional continuous speech recognition methods using only n-grams.
  • Reliable utterance segment recognition by integrating grammar and statistical language constraints, TSUKADA Hajime, YAMAMOTO Hirofumu, TAKEZAWA Toshiyuki, SAGISAKA Yoshinori, IPSJ SIG Notes, 19, 101, 106,   1997 12 11 , http://ci.nii.ac.jp/naid/110002954480
    Summary:This paper proposes a novel approach to recognize partial segments of utterance with high confidence instead of the complete utterance. The proposed method is based on a cooperative use of conventional n-gram constraints and additional grammatical constraints, applied to utterances considering insertions, deletions and substitutions. To apply the grammatical constraints efficiently, the constraints described by a context-free grammar are approximated in the form of a finite-state automaton. Through an experiment, it has been confirmed that the proposed method can recognize partial segments of an utterance with a higher reliability than conventional continuous speech recognition methods using only n-grams.
  • Unsupervised Quasi-Bayes Online Speaker Adaptation Using Language Information, YAMAMOTO Hirofumi, SINGER Harald, NAKAMURA Atsushi, HUO Qiang, 日本音響学会研究発表会講演論文集, 1997, 2, 21, 22,   1997 09 01 , http://ci.nii.ac.jp/naid/10002880371
  • Reliable utterance segment recognition by integrating grammar and statistical language constraints, TSUKADA H, YAMAMOTO H, SAGISAKA Y, 日本音響学会研究発表会講演論文集, 1997, 2, 55, 56,   1997 09 01 , http://ci.nii.ac.jp/naid/10002880462
  • Reduction of Number of Word Hypotheses for Large Vocabulary Continuous Speech Recognition, SHIMIZU Tohru, YAMAMOTO Hirofumi, MASATAKI Hirokazu, MATSUNAGA Shoichi, SAGISAKA Yoshinori, The transactions of the Institute of Electronics, Information and Communication Engineers, 79, 12, 2117, 2124,   1996 12 25 , http://ci.nii.ac.jp/naid/110003227681
    Summary:本論文では,大語い連続音声認識を目的とした,単語グラフを用いる連続音声認識手法を提案する.本認識手法では,計算量削減手法として,(1)同音語の言語ゆう度の共有化,(2)開始時刻の代表化処理における先行単語の共有化,(3)木構造辞書ノード間の言語ゆう度の補間の三つの手法を導入した.本手法の効果を,"ATR Lavel Arrangement Corpus"を用いて評価した.その結果,同音語の言語ゆう度の共有化は大きな計算量削減効果があること,単語の先頭の異音が同じであれば単語の開始時刻を一つに絞っても問題ないこと,言語ゆう度の変化を分散させることにより探索に要するビーム幅を小さくできることがわかった.これらの計算量削減手法により処理時間の99%が削減され,汎用ワークステーション(135SPECint92)での語い数6,635語の準実時間認識を可能にした.
  • Rescoring method for topic identification using pattern matching, BEPPU Tomohiko, TAKAHASHI Kazuhiro, NAKAMURA Atsushi, YAMAMOTO Hirofumi, 日本音響学会研究発表会講演論文集, 1996, 2, 11, 12,   1996 09 01 , http://ci.nii.ac.jp/naid/10002739493
  • Delayed decision beam search for continuous speech recognition, SHIMIZU Tohru, YAMAMOTO Hirofumi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1996, 2, 97, 98,   1996 09 01 , http://ci.nii.ac.jp/naid/10002739713
  • Large vocaburaly spontaneous dialogue speech recognition using word graph and variable-order statisticl language model, SHIMIZU Tohru, YAMAMOTO Hirofumi, MASATAKI Hirokazu, MATSUNAGA Shoichi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1996, 1, 197, 198,   1996 03 01 , http://ci.nii.ac.jp/naid/10002737410
  • Spontaneous Dialogue Speech Recognition using Cross-Word context Constrained Word Graph, SHIMIZU Tohru, YAMAMOTO Hirofumi, MATSUNAGA Shoichi, SAGISAKA Yoshinori, IPSJ SIG Notes, 9, 49, 54,   1995 12 14 , http://ci.nii.ac.jp/naid/110002917012
    Summary:This paper proposes a large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs. In this method, two approximation methods "cross-word context approximation" and "lenient language score smearing" are introduced to reduce the computational cost for word graph generation. The experimental results using a "travel planning corpus" show that this recognition method achieves a word hypotheses reduction of 25-40% and cpu-time reduction of 30-60% compared to no approximation, and that the use of class bigram scores as the expected language score for each lexicon tree node decreases the 25-30% of word error rate compared to no approximation.
  • Spontaneous Dialogue Speech Recognition using Cross-Word context Constrained Word Graph, Shimizu Tohru, Yamamoto Hirofumi, Matsunaga Shoichi, Sagisaka Yoshinori, IEICE technical report. Natural language understanding and models of communication, 95, 428, 49, 54,   1995 12 14 , http://ci.nii.ac.jp/naid/110003278295
    Summary:This paper proposes a large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs. In this method, two approximation methods "cross-word context approximation" and "lenient language score smearing" are introduced to reduce the computational cost for word graph generation. The experimental results using a "travel planning corpus" show that this recognition method achieves a word hypotheses reduction of 25-40% and cpu-time reduction of 30-60% compared to no approximation, and that the use of class bigram scores as the expected language score for each lexicon tree node decreases the 25-30% of word error rate compared to no approximation.
  • Continuous speech recognition method using word graphs, SHIMIZU Tohru, YAMAMOTO Hirofumi, MATSUNAGA Shoichi, SAGISAKA Yoshinori, 日本音響学会研究発表会講演論文集, 1995, 2, 61, 62,   1995 09 01 , http://ci.nii.ac.jp/naid/10002734390