Language model

where $w_t$ is the $t$‘th word in the sequence and $w_{i}^{j}$ is the subsequence $\left(w_{i}, w_{i+1}, \cdots, w_{j-1}, w_{j}\right)$.