ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality
This repository contains the code of three different runs that appear in the mentioned paper for replication. The data is not provided, due to license restrictions, but the steps to adquire it are explained in the ArchEHR-QA 2025 webpage.
We introduce an end-to-end prompt-based baseline and two two-step methods to divide the task, without utilizing any external knowledge. Both two step approaches first extract essential sentences from the clinical text, by prompt or similarity ranking, and then generate the final answer from these notes.
For the two-step approaches we first explain how to extract essential sentences from the EHRs and, afterwards the steps to follow to obtain the final argumentations from these.
The code of this run is located in the end2end folder. Inside this, the prompts and configurations appear in the conf folder. Some of these parameters are changed when running the bash file.
By executing the end2end.slurm file, all the needed configuration is made by changing the models and the set to be used (test, dev or both). The fixing is also executed automatically.
To add new models they must be declared in the following code by adding a name of preference and the path (from HuggingFace or not). All the models added will be tested in the same run:
declare -a pairs=(
"Aloe8B HPAI-BSC/Llama3.1-Aloe-Beta-8B"
"Mistral mistralai/Mistral-7B-Instruct-v0.3"
"Gemma google/gemma-2-9b-it"
"Llama70B meta-llama/Llama-3.3-70B-Instruct"
"Aloe70B HPAI-BSC/Llama3.1-Aloe-Beta-70B"
)
For these first steps the slurms available or the following commands can be executed. The paths of the files should be changed before executing to match your distribution.
To extract the essential sentences (or utilizing 2steps_exp.slurm):
python two_step_prompt.py /path/to/prompts_baseline_best.json <description_of_output>
The description will be utilized to create the output file that will be stored in prompt_first_step with the name output_<description_of_output>.json
For the test the same procedure must be followed but with 2steps_exp_test.slurm
For calculating the optimal threshold, we use calculate_optimal_threshold.py script. We input the dev file and the model we want to use (or utilizing calculate_threshold.slurm). Example of usage:
python calculate_optimal_threshold.py --file_path="/path/to/parsed_data_dev.json --reranker_model="jinaai/jina-reranker-v2-base-multilingual"
Once calculated the threshold, we can then get the predictions using that threshold with the get_reranker_outputs.py script (or utilizing reranker_outputs.slurm). This script outputs the predictions with the threshold and also the ranks.
python get_reranker_outputs.py --file_path="/path/to/parsed_data_dev.json" --reranker_model="jinaai/jina-reranker-v2-base-multilingual" --threshold=-2.4375
First, to create the arguments the 2step_argumentation.slurm file is used. It is prepared to create the outputs of the prompting and reranking processes at the time. In case to do it by command the steps to follow are:
For the prompting process:
mkdir ./second_step/outputs
python argument_sentences.py prompt_generate_arguments.json <file_of_prompting_output> <description_of_output>
The description will be utilized to create the output file that will be stored in /path/to/second_step/outputs/ with the name argumentation_<description_of_output>.json
For the reranker, first we convert the outputs:
python parse_reranker.py "$MAINPATH/reranker_first_step/reranker_preds_jinaai_jina-reranker-v2-base-multilingual.json" "$SRC_PATH/reranker_preds_converted.json"
And then execute the argumentation script:
python argument_sentences.py "$SRC_PATH/prompt_generate_arguments.json" "$SRC_PATH/reranker_preds_converted.json" "reranker_preds"
For the final postprocess and citation adding, postprocess.slurm file can be executed or the following command:
python postprocess.py <path_to_argumentation_file> <out_path_json_filenale_included>
There is an additional file for preprocessing the XML and key-containing JSON in order to obtain the parsed versions of the test and development data. It essential to follow this format in order to execute the scripts. We recomend to store the files in the data folder the script is in.
Also all the paths of the slurm and python files must be fixed before using. And it is needed to add HuggingFace token if required by the model, as well as a WanDB token if wanted. The virtual environment must be set too.
The format of the submission.json
output will be something like this (example from link):
[
{
"case_id": "1",
"answer": "His aortic aneurysm was caused by the rupture of a thoracoabdominal aortic aneurysm, which required emergent surgical intervention. |1|\n He underwent a complex salvage repair using a 34-mm Dacron tube graft and deep hypothermic circulatory arrest to address the rupture. |2|\n The extended recovery time and hospital stay were necessary due to the severity of the rupture and the complexity of the surgery, though his wound is now healing well with only a small open area noted. |8|"
}, ...
]