Large Language Models

When and Why Vision-Language Models Behave like Bags-Of-Words, and What to Do About It?

Despite the success of large vision and language models (VLMs) in many downstream applications, it is unclear how well they encode the compositional relationships between objects and attributes. Here, we create the Attribution, Relation, and Order …