Selenoprotein evolution: GPX
The alignment shows 103 of the 170 available GPX sequences evenly distributed (more or less) over both the eight members of the gene tree and the chordate species tree. The reddish color shows residues conserved at the 90% level; the bluish less-conserved at 70%. The selenocysteine is represented by Z because U is not etained by the alignment tool used here.
Note the three large deletions in GPX4, GPX7, and GPX8 relative to other GPX. The latter have ancestral length judging by the GPX4-classifying sequence in the metazoan outgroup species Monosiga brevicollis. Parsimonously, the deletions occured once in the common ancestral sequence to GPX4 and GPX7/8. It is already present in the tubeworm metazona Ridgeia.
It can be seen that most cysteines occur sporadically or just within a particular paralog group, the exceptions being the univeral selenocysteine site (occupied by cysteine in some paralog families) and a following cysteine 30 residues distal (eg VASQuGKTEVNYTQLVDLHARYAECGLRILAFPCNQF in GPX4_homSap). These two residues very likely form a mixed diselenide (resp. disulfide) with an essential role in the redox reactions carried out by glutathione peroxidases. Other cysteines might be in structural disulfides yet in some GPX an odd number occur, meaning they could not all pair off. This, in conjunction with non-homologous positions, argues against structural disulfides in most cases.
Note GPX7 and GPX8 have classical KDEL endoplasmic reticulum retention signals. This implies an interaction with the protein systems responsible for retrograde translocation and retention. This subcellular localization apparently arose in GPX7 post-amphioxus divergence since the motif is missing there, in sea urchin, and early eukaryotes. GPX8 arose as a subsequent duplication of this ancestral GPX7 and so inherited the motif.
Thus the pattern of indels suggests grouping GPX4 with GPX7 and GPX8 to the exclusion of GPX1 and GPX2.