Vision and Language Models

Teaching CLIP Some Fashion

This is a short blog post describing FashionCLIP. If you are a data scientist you probably have to deal with both images and text. However, your data will be very specific to your domain, and standard models might not work well. This post explains how domain-specific vision and language models can be used in a domain-specific setting and why using them can be a promising way to create a search engine or a (zero-shot) classifier.