Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?

GPUs process data in warps/wavefronts of 32 or 64 threads, and their memory buses are widest when accesses align to powers of two. Picking a dimension like 768 (= 3 × 256) or 1024 (= 2¹⁰) means every matrix multiply lands on those alignment boundaries, so the hardware stays fully occupied instead of wasting cycles on padding. An odd number like 739 would leave threads idle and fragment memory reads, making the exact same model noticeably slower for no accuracy benefit.