blob: a2637129906c1436afc7b094b9ee56196a6a3156 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
|
# SWR2-ASR
Automatic speech recognition model for the seminar "Spoken Word
Recogniton 2 (SWR2)" by Konstantin Sering in the summer term 2023.
Authors:
Silja Kasper, Marvin Borner, Philipp Merkel, Valentin Schmidt
# Dataset
We use the german [multilangual librispeech dataset](http://www.openslr.org/94/) (mls_german_opus). If the dataset is not found under the specified path, it will be downloaded automatically.
If you want to train this model on custom data, this code expects a folder structure like this:
```
<dataset_path>
├── <language>
│ ├── train
│ │ ├── transcripts.txt
│ │ └── audio
│ │ └── <speakerid>
│ │ └── <bookid>
│ │ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
│ ├── dev
│ │ ├── transcripts.txt
│ │ └── audio
│ │ └── <speakerid>
│ │ └── <bookid>
│ │ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
│ └── test
│ ├── transcripts.txt
│ └── audio
│ └── <speakerid>
│ └── <bookid>
│ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
``````
# Installation
The preferred method of installation is using [`poetry`](https://python-poetry.org/docs/#installation). After installing poetry, run
```
poetry install
```
to install all dependencies. `poetry` also enables you to run our scripts using
```
poetry run SCRIPT_NAME
```
Alternatively, you can use the provided `requirements.txt` file to install the dependencies using `pip` or `conda`.
# Usage
## Tokenizer
We include a pre-trained character-level tokenizer for the german language in the `data/tokenizers` directory.
If the path to the tokenizer you specified in the `config.yaml` file does not exist or is None (~), a new tokenizer will be trained on the training data.
## Decoder
There are two options for the decoder:
- greedy
- beam search with language model
The language model is a KenLM model and supplied by the multi-lingual librispeech dataset. If you want to use a different KenLM language model, you can specify the path to the language model in the `config.yaml` file.
## Training the model
All hyperparameters can be configured in the `config.yaml` file. The main sections are:
- model
- training
- dataset
- tokenizer
- checkpoints
- inference
Train using the provided train script:
poetry run train \
--config_path="PATH_TO_CONFIG_FILE"
You can also find our model that was trained for 67 epochs on the mls_german_opus [here](https://drive.google.com/file/d/1gcgCjlCH6DjT6f7EWTx0LcYP3CCuXYP-/view?usp=sharing).
## Inference
The `config.yaml` also includes a section for inference.
To run inference on a single audio file, run:
poetry run recognize \
--config_path="PATH_TO_CONFIG_FILE" \
--file_path="PATH_TO_AUDIO_FILE" \
--target_path="PATH_TO_TARGET_FILE"
Target path is optional. If not specified, the recognized text will be printed to the console. Otherwise, a WER will be computed.
|