hello, could you please provide a collab example where we input a sequence of video frames with depth information, just for inference?