Skip to content

Tactile Captioning

Kelvin Lin edited this page Jul 7, 2025 · 1 revision

Fine-tuned Capabilities

Octopi-1.5 is fine-tuned for two main capabilities involving GelSight tactile inputs:

  • Description
  • Ranking

These can either be done separately or together, as shown in the templates below.

Question Answering Templates

Description

USER

Describe the objects in the following tactile videos.

Object 1

Part 1.1: [tactile frame embeddings]

Part 1.2: [tactile frame embeddings]

ASSISTANT

Object 1

Part 1.1: [descriptions]

Part 1.2: [descriptions]

Ranking

USER

Rank the objects in the following tactile videos in decreasing hardness and roughness.

Object 1

Part 1.1: [tactile frame embeddings]

Part 1.2: [tactile frame embeddings]

ASSISTANT

Object parts ranked in decreasing hardness: 1.1, 1.2

Object parts ranked in decreasing roughness: 1.2, 1.1

Description and Ranking

USER

Describe the objects in the following tactile videos and rank them in decreasing hardness and roughness.

Object 1

Part 1.1: [tactile frame embeddings]

Part 1.2: [tactile frame embeddings]

ASSISTANT

Object 1

Part 1.1: [descriptions]

Part 1.2: [descriptions]

Object parts ranked in decreasing hardness: 1.1, 1.2

Object parts ranked in decreasing roughness: 1.2, 1.1

Clone this wiki locally