Skip to content

Commit 2f33466

Browse files
committed
feat(evaluation): redid badge evaluation
1 parent 5469343 commit 2f33466

File tree

5 files changed

+1371
-87
lines changed

5 files changed

+1371
-87
lines changed

evaluation/badges.qmd

Lines changed: 39 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ bibliography: ../quarto_site/references.bib
88

99
This page evaluates the extent to which the author-published research artefacts meet the criteria of badges related to reproducibility from various organisations and journals.
1010

11-
*Caveat: Please note that these criteria are based on available information about each badge online, and that we have likely differences in our procedure (e.g. allowed troubleshooting for execution and reproduction, not under tight time pressure to complete). Moreover, we focus only on reproduction of the discrete-event simulation, and not on other aspects of the article. We cannot guarantee that the badges below would have been awarded in practice by these journals.*
11+
*Caveat: Please note that these criteria are based on available information about each badge online. Moreover, we focus only on reproduction of the discrete-event simulation, and not on other aspects of the article. We cannot guarantee that the badges below would have been awarded in practice by these journals.*
1212

1313
## Criteria
1414

@@ -19,35 +19,34 @@ import pandas as pd
1919
2020
# Criteria and their definitions
2121
criteria = {
22-
'archive': 'Stored in a permanent archive that is publicly and openly accessible',
23-
'id': 'Has a persistent identifier',
24-
'license': 'Includes an open license',
25-
'relevant': '''Artefacts are relevant to and contribute to the article's results''',
26-
'complete': 'Complete set of materials shared (as would be needed to fully reproduce article)',
27-
'structure': 'Artefacts are well structured/organised (e.g. to the extent that reuse and repurposing is facilitated, adhering to norms and standards of research community)',
28-
'documentation_sufficient': 'Artefacts are sufficiently documented (i.e. to understand how it works, to enable it to be run, including package versions)',
29-
'documentation_careful': 'Artefacts are carefully documented (more than sufficient - i.e. to the extent that reuse and repurposing is facilitated - e.g. changing parameters, reusing for own purpose)',
30-
# This criteria is kept seperate to documentation_careful, as it specifically requires a README file
31-
'documentation_readme': 'Artefacts are clearly documented and accompanied by a README file with step-by-step instructions on how to reproduce results in the manuscript',
22+
'archive': 'Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI)',
23+
'licence': 'Open licence',
24+
'complete': 'Complete (all relevant artefacts available)',
25+
'docs1': 'Documents (a) how code is used (b) how it relates to article (c) software, systems, packages and versions',
26+
'docs2': 'Documents (a) inventory of artefacts (b) sufficient description for artefacts to be exercised',
27+
'relevant': 'Artefacts relevant to paper',
3228
'execute': 'Scripts can be successfully executed',
33-
'regenerated': 'Independent party regenerated results using the authors research artefacts',
34-
'hour': 'Reproduced within approximately one hour (excluding compute time)',
29+
'careful': 'Artefacts are carefully documented and well-structured to the extent that reuse and repurposing is facilitated, adhering to norms and standards',
30+
'reproduce': 'Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting)',
31+
'readme': 'README file with step-by-step instructions to run analysis',
32+
'dependencies': 'Dependencies (e.g. package versions) stated',
33+
'correspond': 'Clear how output of analysis corresponds to article'
3534
}
3635
3736
# Evaluation for this study
3837
eval = pd.Series({
3938
'archive': 0,
40-
'id': 0,
41-
'license': 0, # At the point of publication
42-
'relevant': 1,
39+
'licence': 0,
4340
'complete': 0,
44-
'structure': 0,
45-
'documentation_sufficient': 0,
46-
'documentation_careful': 0,
47-
'documentation_readme': 0,
41+
'docs1': 0,
42+
'docs2': 0,
43+
'relevant': 1,
4844
'execute': 1,
49-
'regenerated': 1,
50-
'hour': 0,
45+
'careful': 0,
46+
'reproduce': 0,
47+
'readme': 0,
48+
'dependencies': 0,
49+
'correspond': 0
5150
})
5251
5352
# Get list of criteria met (True/False) overall
@@ -82,10 +81,10 @@ def create_criteria_list(criteria_dict):
8281
return(formatted_list)
8382
8483
# Define groups of criteria
85-
criteria_share_how = ['archive', 'id', 'license']
86-
criteria_share_what = ['relevant', 'complete']
87-
criteria_doc_struc = ['structure', 'documentation_sufficient', 'documentation_careful', 'documentation_readme']
88-
criteria_run = ['execute', 'regenerated', 'hour']
84+
criteria_share_how = ['archive', 'licence']
85+
criteria_share_what = ['complete', 'relevant']
86+
criteria_doc_struc = ['docs1', 'docs2', 'careful', 'readme', 'dependencies', 'correspond']
87+
criteria_run = ['execute', 'reproduce']
8988
9089
# Create text section
9190
display(Markdown(f'''
@@ -118,39 +117,39 @@ Criteria related to running and reproducing results -
118117
# Full badge names
119118
badge_names = {
120119
# Open objects
120+
'open_acm': 'ACM "Artifacts Available"',
121121
'open_niso': 'NISO "Open Research Objects (ORO)"',
122122
'open_niso_all': 'NISO "Open Research Objects - All (ORO-A)"',
123-
'open_acm': 'ACM "Artifacts Available"',
124123
'open_cos': 'COS "Open Code"',
125124
'open_ieee': 'IEEE "Code Available"',
126125
# Object review
127126
'review_acm_functional': 'ACM "Artifacts Evaluated - Functional"',
128127
'review_acm_reusable': 'ACM "Artifacts Evaluated - Reusable"',
129128
'review_ieee': 'IEEE "Code Reviewed"',
130129
# Results reproduced
131-
'reproduce_niso': 'NISO "Results Reproduced (ROR-R)"',
132130
'reproduce_acm': 'ACM "Results Reproduced"',
131+
'reproduce_niso': 'NISO "Results Reproduced (ROR-R)"',
133132
'reproduce_ieee': 'IEEE "Code Reproducible"',
134133
'reproduce_psy': 'Psychological Science "Computational Reproducibility"'
135134
}
136135
137136
# Criteria required by each badge
138137
badges = {
139138
# Open objects
140-
'open_niso': ['archive', 'id', 'license'],
141-
'open_niso_all': ['archive', 'id', 'license', 'complete'],
142-
'open_acm': ['archive', 'id'],
143-
'open_cos': ['archive', 'id', 'license', 'complete', 'documentation_sufficient'],
139+
'open_acm': ['archive'],
140+
'open_niso': ['archive', 'licence'],
141+
'open_niso_all': ['archive', 'licence', 'complete'],
142+
'open_cos': ['archive', 'licence', 'docs1'],
144143
'open_ieee': ['complete'],
145144
# Object review
146-
'review_acm_functional': ['documentation_sufficient', 'relevant', 'complete', 'execute'],
147-
'review_acm_reusable': ['documentation_sufficient', 'documentation_careful', 'relevant', 'complete', 'execute', 'structure'],
145+
'review_acm_functional': ['docs2', 'relevant', 'complete', 'execute'],
146+
'review_acm_reusable': ['docs2', 'relevant', 'complete', 'execute', 'careful'],
148147
'review_ieee': ['complete', 'execute'],
149148
# Results reproduced
150-
'reproduce_niso': ['regenerated'],
151-
'reproduce_acm': ['regenerated'],
152-
'reproduce_ieee': ['regenerated'],
153-
'reproduce_psy': ['regenerated', 'hour', 'structure', 'documentation_readme'],
149+
'reproduce_acm': ['reproduce'],
150+
'reproduce_niso': ['reproduce'],
151+
'reproduce_ieee': ['reproduce'],
152+
'reproduce_psy': ['reproduce', 'readme', 'dependencies', 'correspond']
154153
}
155154
156155
# Identify which badges would be awarded based on criteria
@@ -256,12 +255,12 @@ create_badge_callout({k: v for (k, v) in award.items() if k.startswith('reproduc
256255

257256
* "Open Code"
258257

259-
**Institute of Electrical and Electronics Engineers (IEEE)** (@institute_of_electrical_and_electronics_engineers_ieee_about_nodate)
258+
**Institute of Electrical and Electronics Engineers (IEEE)** (@institute_of_electrical_and_electronics_engineers_ieee_about_2024)
260259

261260
* "Code Available"
262261
* "Code Reviewed"
263262
* "Code Reproducible"
264263

265-
**Psychological Science** (@hardwicke_transparency_2023 and @association_for_psychological_science_aps_psychological_2023)
264+
**Psychological Science** (@hardwicke_transparency_2024 and @association_for_psychological_science_aps_psychological_2024)
266265

267266
* "Computational Reproducibility"

evaluation/reproduction_report.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ col = ['fully', 'partially', 'not', 'na']
100100
eval_dict = {
101101
'STARS (essential)': [2, 0, 6, 0],
102102
'STARS (optional)': [0, 0, 5, 0],
103-
'Badges (criteria)': [3, 0, 9, 0],
104-
'Badges (badges)': [3, 0, 9, 0],
103+
'Badges (criteria)': [2, 0, 10, 0],
104+
'Badges (badges)': [0, 0, 12, 0],
105105
'STRESS-DES': [15, 3, 3, 3],
106106
'ISPOR-SDM': [12, 0, 4, 2]
107107
}

logbook/posts/2024_07_24/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Day 4"
33
author: "Amy Heather"
44
date: "2024-07-24"
5-
categories: [reproduce, guidelines]
5+
categories: [reproduce, evaluation]
66
bibliography: ../../../quarto_site/references.bib
77
---
88

logbook/posts/2024_11_21/index.qmd

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: "Day 9"
3+
author: "Amy Heather"
4+
date: "2024-11-21"
5+
categories: [evaluation]
6+
---
7+
8+
::: {.callout-note}
9+
10+
Redid badge evaluation.
11+
12+
:::
13+
14+
## 10.34-X: Revisit evaluation
15+
16+
Revisited and revised the badge criteria to (a) make them up-to-date, and (b) make sure they are *specific* to the descriptions from each badge. Hence, redoing evaluations for all eight studies.
17+
18+
Notes:
19+
20+
* Reproduction - no, as added reasonable assumption that would expect this within a reasonable time (e.g. a few hours) and with only minor troubleshooting - but this reproduction required a large time investment and extensive troubleshooting (e.g. writing code) to reproduce
21+
22+
## Untimed: Update summary report
23+
24+
With new badge evaluation results.

0 commit comments

Comments
 (0)