Contrastive learning (CL) pre-trains general-purpose encoders using an
unlabeled pre-training dataset, which consists of images or image-text pairs.
CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an
attacker injects poisoned inputs into the pre-training dataset so the encoder
is backdoored. However, existing DPBAs achieve limited effectiveness. In this
work, we propose new DPBAs called CorruptEncoder to CL. CorruptEncoder uses a
theory-guided method to create optimal poisoned inputs to maximize attack
effectiveness. Our experiments show that CorruptEncoder substantially
outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA
that achieves more than 90% attack success rates with only a few (3) reference
images and a small poisoning ratio (0.5%). Moreover, we also propose a defense,
called localized cropping, to defend against DPBAs. Our results show that our
defense can reduce the effectiveness of DPBAs, though it slightly sacrifices
the utility of the encoder.

By admin