See/Sources: @radfordCLIPConnectingText2021, @radfordLearningTransferableVisual2021, GitHub
CLIP (“Contrastive Learning–Image Pre-training”) is a neural network that can classify images based on any given classification, which is an ability similar to zero-shot.