CTC LSTM
spoken word recognition using CTC LSTMs
Instructions
- Create a virtual environment:
python -m venv venv
- Install the required packages:
./venv/bin/pip install -r requirements.txt
- Train the model:
./venv/bin/python main.py train
(takes a few hours and needs around 20GB disk and 5GB memory) - or download my pre-trained model (25 epochs, not good) from
here and move it to
target/model-final.ckpt
- Test the final model:
./venv/bin/python main.py test
- Infer text from flac:
./venv/bin/python main.py infer audio.flac
Note
- This is a proof-of-concept
- Does not use CUDA but should be easy to implement