aboutsummaryrefslogtreecommitdiff
path: root/readme.md
diff options
context:
space:
mode:
Diffstat (limited to 'readme.md')
-rw-r--r--readme.md86
1 files changed, 64 insertions, 22 deletions
diff --git a/readme.md b/readme.md
index 8d5fd4d..5b23a12 100644
--- a/readme.md
+++ b/readme.md
@@ -1,42 +1,84 @@
# SWR2-ASR
-Automatic speech recognition model for the seminar spoken word
-recogniton 2 (SWR2) in the summer term 2023.
+Automatic speech recognition model for the seminar "Spoken Word
+Recogniton 2 (SWR2)" by Konstantin Sering in the summer term 2023.
+
+Authors:
+Silja Kasper, Marvin Borner, Philipp Merkel, Valentin Schmidt
+
+# Dataset
+We use the german [multilangual librispeech dataset](http://www.openslr.org/94/) (mls_german_opus). If the dataset is not found under the specified path, it will be downloaded automatically.
+
+If you want to train this model on custom data, this code expects a folder structure like this:
+```
+<dataset_path>
+ ├── <language>
+ │ ├── train
+ │ │ ├── transcripts.txt
+ │ │ └── audio
+ │ │ └── <speakerid>
+ │ │ └── <bookid>
+ │ │ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
+ │ ├── dev
+ │ │ ├── transcripts.txt
+ │ │ └── audio
+ │ │ └── <speakerid>
+ │ │ └── <bookid>
+ │ │ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
+ │ └── test
+ │ ├── transcripts.txt
+ │ └── audio
+ │ └── <speakerid>
+ │ └── <bookid>
+ │ └── <speakerid>_<bookid>_<chapterid>.opus/.flac
+``````
+
# Installation
+The preferred method of installation is using [`poetry`](https://python-poetry.org/docs/#installation). After installing poetry, run
```
poetry install
```
+to install all dependencies. `poetry` also enables you to run our scripts using
+```
+poetry run SCRIPT_NAME
+```
+
+Alternatively, you can use the provided `requirements.txt` file to install the dependencies using `pip` or `conda`.
# Usage
-## Training the tokenizer
-We use a byte pair encoding tokenizer. To train the tokenizer, run
-```
-poetry run train-bpe-tokenizer --dataset_path="DATA_PATH" --language=mls_german_opus --split=all --out_path="data/tokenizers/bpe_tokenizer_german_3000.json" --vocab_size=3000
-```
-with the desired values for `DATA_PATH` and `vocab_size`.
+## Tokenizer
-You can also use a character level tokenizer, which can be trained with
-```
-poetry run train-char-tokenizer --dataset_path="DATA_PATH" --language=mls_german_opus --split=all --out_path="data/tokenizers/char_tokenizer_german.txt"
-```
-## Training
+We include a pre-trained character-level tokenizer for the german language in the `data/tokenizers` directory.
-Train using the provided train script:
+If the path to the tokenizer you specified in the `config.yaml` file does not exist or is None (~), a new tokenizer will be trained on the training data.
- poetry run train
+## Training the model
-## Evaluation
+All hyperparameters can be configured in the `config.yaml` file. The main sections are:
+- model
+- training
+- dataset
+- tokenizer
+- checkpoints
+- inference
-## Inference
+Train using the provided train script:
- poetry run recognize
+ poetry run train \
+ --config_path="PATH_TO_CONFIG_FILE"
-## CI
+## Evaluation
+Evaluation metrics are computed during training and are serialized with the checkpoints.
-You can use the Makefile to run these commands manually
+TODO: manual evaluation script / access to the evaluation metrics?
- make format
+## Inference
+The `config.yaml` also includes a section for inference.
+To run inference on a single audio file, run:
- make lint
+ poetry run recognize \
+ --config_path="PATH_TO_CONFIG_FILE" \
+ --file_path="PATH_TO_AUDIO_FILE"
+