-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi! I've reviewing and running the first notebook, 'Demographics of Lebanon', here's some feedback:
- I get an error after running this code:
hrsl_lebanon0 = LEBANON_ADM0.sjoin(hrsl)
hrsl_lebanon0 = hrsl_lebanon0.groupby(["admin0Name"]).sum()
hrsl_lebanon3 = LEBANON_ADM3.sjoin(hrsl)
The issue happens because hrsl_lebanon0
has two datetime columns:
date datetime64[ms]
validOn datetime64[ms]
But it seems that hrsl_lebanon0
isn’t used after that. Is the notebook going to be updated? If not, it might make sense to either remove or comment out that line to avoid errors for other people running the notebook.
- I also get the same error after running this:
hrsl_lebanon3 = (
hrsl_lebanon3.groupby(["admin0Name", "admin1Name", "admin2Name", "admin3Name"])
.sum()
.reset_index()[["admin0Name", "admin1Name", "admin3Name", "population"]]
)
hrsl_lebanon3 = hrsl_lebanon3.merge(
LEBANON_ADM3[["admin0Name", "admin1Name", "admin3Name", "geometry"]],
on=["admin0Name", "admin1Name", "admin3Name"],
)
hrsl_lebanon3 = gpd.GeoDataFrame(hrsl_lebanon3, crs="EPSG:4326", geometry="geometry")
This happens because there are non-numeric columns (like geometry and datetime) that cannot be summed.
I fixed it by excluding those data types:
hrsl_lebanon3 = (
hrsl_lebanon3.select_dtypes(exclude=["datetime", "geometry"]) # Exclude datetime and geometry columns
.groupby(["admin0Name", "admin1Name", "admin2Name", "admin3Name"])
.sum()
.reset_index()[["admin0Name", "admin1Name", "admin3Name", "population"]]
)
- The
lebanon_bbox
variable is declared but not used. We could either changed the code to:
hrsl = hrsl[
(hrsl["longitude"] > lebanon_bbox[0])
& (hrsl["longitude"] < lebanon_bbox[2])
& (hrsl["latitude"] > lebanon_bbox[1])
& (hrsl["latitude"] < lebanon_bbox[3])
]
or just delete the variable if it’s not required.
-
I recommend adding a semicolon ; at the end of the cells that create maps to suppress unnecessary messages from appearing above them:
-
It would be helpful to simplify this part by hiding the urls and adding hyperlinks.
Population data: Population data was used by the [estimates released by UNOCHA in 2002](https://cod.humanitarianresponse.info/fr/search/field_country_region/138, Accessed 05/01/2014). The lowest admin level at which data was available is admin 3.
Sex variables: Sex variables were derived from UN Urban and Rural Population by Age and Sex (URPAS) 2010 estimates, generated using URPAS, http://esa.un.org/unpd/popdev/urpas/urpas2014.aspx, Accessed 03/27/2015 at a national level.
Age variables: Age variables were obtained from UN Urban and Rural Population by Age and Sex (URPAS) 2010 estimates, generated using URPAS, http://esa.un.org/unpd/popdev/urpas/urpas2014.aspx, Accessed 03/27/2015 at a national level.
Boundaries:Boundaries were obtained from UN Office for the Coordination of Humanitarian Affairs (UN OCHA), http://www.humanitarianresponse.info/applications/data, Accessed 01/05/2011.
For example change this: Population data: Population data was used by the estimates released by UNOCHA in 2002 (https://cod.humanitarianresponse.info/fr/search/field_country_region/138, Accessed 05/01/2014).
to this: Population data: Population data was used by the estimates, released by UNOCHA in 2002 and accessed 05/01/2014.
I also noticed here that the links for Population data and Boundaries both lead to the same page (https://data.humdata.org/), even though the links are different.
- the references could be improved by organizing them in the
bibliography.bib
file, which is already stored in the repository. This would change the current citation format, for example from:
The analysis was validated in 18 countries (Tiecke, et.al., 2017).
to: (where the link in [1] would lead to the end of the page where the references are listed).
The analysis was validated in 18 countries [1].
- I noticed a few typos such as missing periods, I could correct them.
If you agree with any of these suggestions, please let me know so I can create a PR with the changes 🙂