Fig. 3 | Scientific Reports

Fig. 3

From: Channel-shuffled transformers for cross-modality person re-identification in video

Fig. 3

The Channel-Shuffled Transformer Attention Block (CSTB) and the Shuffled Multi-Head Attention (SMHA) module used in HCSTNET as Channel-Shuffled Temporal Transformer (CSTT). Both modules use a Shuffled Fully Connected (SFC) module consisting of a channel shuffle layer sandwiched between two grouped convolutions. (a) The Channel-Shuffled Transformer Attention Block (CSTB), the Shuffled Multi-Head Attention (SMHA), and the Shuffled Fully Connected (SFC) module. (b) The detailed view of the Shuffled Fully Connected (SFC) module. Grouped convolution (GCONV) layers contain groups of fully connected layers (or the functionally equivalent pointwise convolution) operating on chunks of the input. To ensure interactions between chunks, a channel shuffle layer is used to reorder the chunks before forwarding to the second GCONV. The number of groups in the second GCONV is reciprocal to the number of groups in the first GCONV. Note: some connections were omitted for brevity.

Back to article page