In this talk we explore a domain-specific application CLIP: FashionCLIP. We describe a contrastive models trained on 800K fashion caption-image pairs and show its usefulness in applied tasks.