The primary goal of
this assignment is to implement a simplified Transformer-based
language model for next-word prediction.
This assignment is
designed to help you understand the following
concepts:
Tokenization
Word embeddings
Positional encoding
Multi-head self-attention
Residual connections
Feed-forward subnetworks
Model training and
inference
To begin, download theKamangar_05.zip and unzip it on your
computer.
Allowed libraries
You may use
numpy
torch
torch.nn
torch.optim
re
math
random
You may not use
torch.nn.Transformer
torch.nn.MultiheadAttention
pretrained Transformer
blocks
Hugging Face model classes for the
Transformer itself
Notes:
embedding_dim must be divisible by
num_heads
self-attention must split the
embedding into heads
The model must support more than one
Transformer block
Tensor shapes must be handled
correctly
The training target is the next word
after the input sequence
The corpus contains approximately
1000 words and must be used to create next-word prediction
examples.
Use word-level
tokenization
Use next-word prediction
only
Use only the last output position to
predict the next word
Causal masking is not
required.
The goal of this assignment is
understanding tensor flow and Transformer mechanics. The model does
not need to achieve state-of-the-art performance. Keep your
implementation modular and readable.
DO
NOT alter/change the name of the function or
the parameters of the function.
You may
introduce additional functions (helper functions) as
needed.
The
"test_assignment_05.py" file includes a minimal set of unit tests.
The assignment grade will be based on your code passing these tests
(and possible other additional tests).
DO NOT
submit the "Assignment_02_tests.py" file when submitting your
Assignment_02
DO NOT
submit the your environment when submitting your
assignment.
You may run these tests using the command:
python -m pytest --verbose
test_assignment_05.py
The following is roughly what your output should
look like if all tests pass: