ARC-AGI: Completely separate train/test examples at puzzle level #22

dywsy21 · 2025-08-04T08:41:29Z

What

An attempt to remove Test-Time Training and completely resolve the issue Data Leakage Bug: Model Trained on Full Dataset (Including Val/Test Splits) #18

Description

I saw #18 and was interested in how the model would behave in ARC-AGI if it only used puzzle inputs/outputs from train instead of also incorporating the inputs from test.

While I know that TTT is allowed in ARC-AGI, training on test examples beforehand does allow the model to have an unfair understanding of the implied rules used in them. It would be interesting to see how the H&L arch could figure out the implied rules it has not seen before, just like humans.

By removing TTT your model's evaluation result on ARC-AGI can be more convincing and more indicative of the model's actual generalization abilities. Let me know if this approach will help, happy to chat~

…vel for ARC-AGI

helma436 · 2025-08-04T11:01:01Z

.

shawntan · 2025-08-13T03:31:59Z

Does the TTT setting for ARC-AGI allow for parameter updates across evaluation examples?

If it doesn't then doing Training + TTT together represents a very different setting than Training -> TTT per evaluation instance right? Each evaluation instance would be iid in that case, and the model cannot use generalised information from the evaluation.

feat: attempt to completely separate train/test examples at puzzle le…

9fd1aba

…vel for ARC-AGI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARC-AGI: Completely separate train/test examples at puzzle level #22

ARC-AGI: Completely separate train/test examples at puzzle level #22

dywsy21 commented Aug 4, 2025 •

edited

Loading

Uh oh!

helma436 commented Aug 4, 2025

Uh oh!

shawntan commented Aug 13, 2025

Uh oh!

Uh oh!

ARC-AGI: Completely separate train/test examples at puzzle level #22

Are you sure you want to change the base?

ARC-AGI: Completely separate train/test examples at puzzle level #22

Conversation

dywsy21 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Description

Uh oh!

helma436 commented Aug 4, 2025

Uh oh!

shawntan commented Aug 13, 2025

Uh oh!

Uh oh!

dywsy21 commented Aug 4, 2025 •

edited

Loading