make sure we work with unicode

In particular, we use byte positions where we should use char positions in many places. Furthermore, when we do use char positions, we don't check the 'physical' width of the character.