Running on Zero 16 Explainable-Vision-Language-Model ๐ฅถ Generate a video visualizing how a model attends to an image while generating text