-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
My image:
And the text input:
This is a picture from tweet, and the corresponding text is:
CONGRATS ON HITTING YOIR GOAL GUYS, I'm sure the victims of Harvey will appreciate it greatly https://t.co/daPhXZvhuY
Please judge the humanitarian type in the image, you can only choose one answer exactly from the following types:
'not_humanitarian', 'injured_or_dead_people', 'other_relevant_information', 'affected_individuals', 'infrastructure_and_utility_damage', 'rescue_volunteering_or_donation_effort', 'vehicle_damage', 'missing_or_found_people'
The tweet is from the CrisisMMD dataset.
My codes:
def ask_blip(image_path: str, question: str):
image = Image.open(image_path).convert("RGB")
image = vis_processors["eval"](image).unsqueeze(0).to("cuda")
question = txt_processors["eval"](question)
return model.predict_answers(samples={"image": image, "text_input": question}, inference_method="generate")[0]
And error raised:
Traceback (most recent call last):
File "/root/shared-nvme/baselines/blip.py", line 121, in <module>
main()
File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/shared-nvme/baselines/blip.py", line 95, in main
responses["humanitarian"] = ask_blip(image_path, textwrap.dedent(f"""\
File "/root/shared-nvme/baselines/blip.py", line 24, in ask_blip
return model.predict_answers(samples={"image": image, "text_input": question}, inference_method="generate")[0]
File "/root/.local/lib/python3.10/site-packages/lavis/models/blip_models/blip_vqa.py", line 225, in predict_answers
return self._generate_answers(
File "/root/.local/lib/python3.10/site-packages/lavis/models/blip_models/blip_vqa.py", line 259, in _generate_answers
outputs = self.text_decoder.generate(
File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2345, in generate
result = self._beam_search(
File "/root/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 3760, in _beam_search
model_outputs = self(**model_inputs, return_dict=True)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 1210, in forward
outputs = self.bert(
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 974, in forward
encoder_outputs = self.encoder(
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 592, in forward
layer_outputs = layer_module(
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 475, in forward
cross_attention_outputs = self.crossattention(
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 346, in forward
self_outputs = self.self(
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/lavis/models/med.py", line 219, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
I also tried BLIP in the HuggingFace, and it raised same exception. How to fix it? :(
surbhim18
Metadata
Metadata
Assignees
Labels
No labels