You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update torch due to sentence-transformers changes
* Update torch to 2.3.0
* Add esco data
* set torch version
* push torch back
* pin transformer to stop torch issue in python 3.10, and dont allow spacy >3.8 due to another version issue with blis
* Read data from package location, and saves taxonomy embeddings when they are calculated for the first time
* Add other datasets to git
* rename esco taxonomy to v_1_1_1 to make it clearer, add this in the config file
* Try to download taxonomy embeddings from huggingface hub, if not then they get calculated on the fly
* Refresh readmes with new outputs
* Add newly calculated metrics to pipeline summary doc page
* correct for unmatched skills
* Update package version to major change
Copy file name to clipboardExpand all lines: docs/index.md
+9-26Lines changed: 9 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,44 +26,27 @@ You can use pip to install the library:
26
26
27
27
`pip install ojd-daps-skills`
28
28
29
-
Note: If you are using a conda environment you may need to do `conda install scipy` before pip installing this library.
29
+
> 🐍 **NOTE:** If you are using a conda environment you may need to do `conda install scipy` before pip installing this library.
30
30
31
-
Note that this package was developed on MacOS and tested on Ubuntu. Changes have been made to be compatible on a Windows system but are not tested and cannot be guaranteed.
32
-
33
-
When the package is first used it will automatically download a folder of neccessary data and models (~1GB).
31
+
> 💻 **NOTE:** This package was developed on MacOS and tested on Ubuntu. Changes have been made to be compatible on a Windows system but are not tested and cannot be guaranteed.
34
32
35
33
## TL;DR: Using Nesta’s Skills Extractor library
36
34
35
+
> ⏳ **NOTE:** The first time you import `SkillsExtractor` in python it will take some time (around a minute) to load.
36
+
37
37
To extract skills from a job advert:
38
38
39
39
```
40
40
from ojd_daps_skills.extract_skills.extract_skills import SkillsExtractor
41
41
42
-
sm = SkillsExtractor(taxonomy_name="toy")
43
-
44
-
✘ nestauk/en_skillner NER model not loaded. Downloading model...
Copy file name to clipboardExpand all lines: docs/pipeline_summary.md
+37-3Lines changed: 37 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,9 +23,43 @@ For further information or feedback please contact Liz Gallagher, India Kerle or
23
23
- Out of scope is extracting and matching skills from job adverts in non-English languages; extracting and matching skills from texts other than job adverts; drawing conclusions on new, unidentified skills.
24
24
- Skills extracted should not be used to determine skill demand without expert steer and input nor should be used for any discriminatory hiring practices.
25
25
26
-
## Metrics
26
+
## Metrics - The model trained on data from 8th August 2023 (correct as of 29th May 2025)
27
27
28
-
There is no exact way to evaluate how well our pipeline works; however we have several proxies to better understand how our approach compares. The analysis in this section was performed using the results of the `20220825` model. We believe the newer `20230808` model will improve these results, but the analysis hasn't been repeated.
28
+
There is no exact way to evaluate how well our pipeline works; however we have several proxies to better understand how our approach compares.
29
+
30
+
### Evaluation 2 - Manual judgement of skills extraction and mapping quality
31
+
32
+
We manually tagged a random sample of skills extracted from job adverts, with whether we thought they were inappropriate, OK or excellent skill entities, and whether we thought they had inappropriate, OK or excellent matches to ESCO skills (or other parts of the taxonomy).
33
+
34
+
- We felt that out of 202 skill entities 73% were excellent entities, 17% were OK and 10% were inappropriate.
35
+
- 192 of the 202 skill entities were matched to ESCO skills or parts of the taxonomy.
36
+
- Of the 192 matched skills, we felt 45% were excellently matched, 27% were OK and 27% were inappropriate.
37
+
- Of the 96 skills matched to ESCO skills, we felt 71% were excellently matched, 24% were OK and 5% were inappropriate.
## Metrics - The model trained on data from 25th August 2022
61
+
62
+
> ⚠️ **NOTE:** The analysis in this section was performed using the results of the `20220825` model. We believe the newer `20230808` model will improve these results, but the analysis hasn't been repeated apart from 'Evaluation 2' discussed above.
29
63
30
64
### Comparison 1 - Top skill groups per occupation comparison to ESCO essential skill groups per occupation
31
65
@@ -93,5 +127,5 @@ We manually tagged a random sample of skills extracted from job adverts, with wh
0 commit comments