Skip to content

Commit 33ab022

Browse files
authored
Merge pull request #265 from thanhtcptit/master
Fix RoPE inner product equation & add note on the difference in implementation
2 parents 25e1698 + 8c84d6e commit 33ab022

1 file changed

Lines changed: 3 additions & 2 deletions

File tree

labml_nn/transformers/rope/__init__.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ class RotaryPositionalEmbeddings(nn.Module):
8181
x^{(2)}_m x^{(2)}_n \cos (m - n) \theta &= \\
8282
8383
\big(x^{(1)}_m \cos (m - n)\theta - x^{(2)}_m \sin (m - n) \theta\big) x^{(1)}_n &+ \\
84-
\big(x^{(2)}_m \cos (m - n)m\theta + x^{(1)}_m \sin (m - n) \theta\big) x^{(2)}_n &= \\
84+
\big(x^{(2)}_m \cos (m - n)\theta + x^{(1)}_m \sin (m - n) \theta\big) x^{(2)}_n &= \\
8585
8686
\Big \langle RoPE\big(x^{(1)}_m, x^{(2)}_m, m - n\big), RoPE\big(x^{(1)}_n, x^{(2)}_n, 0\big) \Big \rangle
8787
\end{align}
@@ -95,7 +95,8 @@ class RotaryPositionalEmbeddings(nn.Module):
9595
The paper suggests using $\Theta = {\theta_i = 10000^{\frac{2(i-1)}{d}}, i \in [1, 2, ..., \frac{d}{2}]}$
9696
for the $\frac{d}{2}$ pairs of features.
9797
98-
We pair feature $i$ with feature $i + \frac{d}{2}$. So for position $m$ we transform
98+
The original implementation of RoPE divide the $d$-dimension features into $\frac{d}{2}$ pairs of features ($i$, $i + 1$).
99+
In this implementation we pair feature $i$ with feature $i + \frac{d}{2}$. So for position $m$ we transform
99100
100101
\begin{align}
101102
\begin{pmatrix}

0 commit comments

Comments
 (0)