山梨大学電子シラバス−授業データ

授業科目名

担当教員

言語・音声メディア処理特論

小澤　賢司／福本　文代

時間割番号

単位数

コース

履修年次

期別

曜日

時限

PTW716

(未登録)

前期

水

［概要と目標］

[Advanced Natural Language and Speech Media Processing]
The study of information science which takes information as computation starts in the middle of the 20th century and forms one of the major bases of computer science. This computational approach covers a wide range of information such as textual information and visual information sources. The purpose of this course is to understand information from the viewpoint of intelligent computational processing.
The first part addresses the issue of the semantics of natural languages and introduces computational models of the interpretation of semantics.
The second half of the lecture will focus on speech and study the fundamental theories and techniques related to speech recognition corresponding to semantic processing.
情報を計算という切り口でみる立場は20世紀中葉にはじまり、記号計算の基礎を与える数学理論として、計算機科学の基盤のひとつを形成している。さらにこの計算的アプローチは、人間の処理能力をはるかに超えたテキスト情報から、必要な情報を抽出する技術、あるいは，音声信号を識別・生成・変換などする技術と広がっている。本講義では、計算方法という立場から，これら様々な種類の情報の処理過程を理解することを目的とする。
授業前半では、言語の意味処理に焦点をあて、古典的な手法から最近の深層学習による手法までを概観する。
授業後半は音声に焦点をあて、意味処理に相当する音声認識に関して、その基礎理論・技術について学ぶ。

[到達目標]

- For the first half:
Understanding the basics and the state-of-the-art of statistical natural language semantic analysis
- For the second half:
Understanding the algorithms of classical speech recognition models, including acoustic models, pronunciation dictionaries, and language models, followed by implementing modern End-to-End models.
授業前半の目標は、言語の意味処理に焦点をあて、計算機による機械処理という立場から意味を扱う手法を理解することである。
授業後半の目標は、古典的な音声認識モデルである音響モデル・発音辞書・言語モデルの構成を理解したうえで、今日的なEnd-to-Endモデルの実装を行う。

［必要知識・準備］

Required mathematical foundation include linear algebra, integral and differential calculus, and introductory statistics. Basic knowledge and some experience on machine learning, such as clustering algorithms, classifiers such as support vector machine and random forest, as well as deep neural network is expected. Programing skills in Python and/or C++ will be required for some assignments. Familiarity with one of the deep learning frameworks, such as Tensorflow, Keras, and/or PyTorch would be helpful. Additionally, it is desirable to have a basic understanding of representations of acoustic signals and fundamental filtering techniques.
線形代数や情報理論などの数学、プログラミングのスキル、アルゴリズムとデータ構造に関する知識が必要である。また、音響信号の表現やその基本的なフィルタリングの基礎知識を持っていることが望ましい。クラスタリング、サポートベクトルマシン、ニューラルネットワークなどの機械学習の基礎知識があるとなおよい。

［評価基準］

No	評価項目	割合	評価の観点
1	小テスト／レポート	100 %	前半と後半に実施する課題のレポートで評価する。

［教科書］

高島遼一, Pythonで学ぶ音声認識, インプレス, ISBN:9784295011385,
(2021年出版機械学習実践シリーズ)

［参考書］

(未登録)

［講義項目］

1. Theories in semantics: formal semantics, lexical semantics, and conceptual semantics
2. Acquisition techniques: rule-based, example-based, and corpus-based techniques
3. Acquisition of semantics: synonyms, antonyms, polysemy, and bilingual word expressions
4. Metaphor: metaphor and conceptual metaphor
5. Application: information retrieval
6. Application: sentiment analysis
7. Application: question answering, and summarization
8. Summary of the First Half: The mechanism of speech recognition
9. Fundamental equations of speech recognition
10. Basics of speech processing and feature extraction
11. Solving the alignment problem in speech recognition
12. Speech recognition with GMM-HMM
13. Speech recognition with DNN-HMM
14. Continuous speech recognition with End-to-End models
15. Implementation of End-to-End models
１．意味論: アメリカ構造主義の意味論, 生成意味論, 形式意味論, 概念意味論
２．意味に関する抽出手法：人手による記述, 例に基づく抽出, コーパスからの抽出
３．統計手法による語に関する抽出事例: 同義語, 類義語, 反意語, 多義語, 対訳語　
４．深層学習による単語、文の意味表現: Word2Vec, BERT, LLM
５．応用：情報検索
６．応用：感情分析
７．応用：質問応答，要約
８．前半のまとめ．音声認識のしくみ
９．音声認識の基礎知識
１０．音声処理の基礎と特徴抽出
１１．音声認識におけるアライメント問題の解決
１２．GMM-HMMによる音声認識
１３．DNN-HMMによる音声認識
１４．End-to-Endモデルによる連続音声認識
１５．End-to-Endモデルの実装

［前年度授業に対する改善要望等への対応］

前年度において受講者数が少なかったため授業評価アンケート未実施