1. Seems like there's a redundant code here.  2. Tokenize method not working well for creating correct instruction_mask column 