#concept
(B, T, C) tensor ? Batch (number of data samples processed in 1 roundtrip pass) by Time (sequence length) by Channel (embedding size)
B in (B, T, C) tensor ? B: batch size, e.g. number of data samples processed in 1 forward/backward pass
T in (B, T, C) tensor ? T: sequence length measured in time steps in a time series or positions in a sentence
C in (B, T, C) tensor ? C: channel which is dimensionality of embedding
References
- Karpathy video