#concept

(B, T, C) tensor ? Batch (number of data samples processed in 1 roundtrip pass) by Time (sequence length) by Channel (embedding size)

B in (B, T, C) tensor ? B: batch size, e.g. number of data samples processed in 1 forward/backward pass

T in (B, T, C) tensor ? T: sequence length measured in time steps in a time series or positions in a sentence

C in (B, T, C) tensor ? C: channel which is dimensionality of embedding

References

  1. Karpathy video

Notes