-
-
Notifications
You must be signed in to change notification settings - Fork 69
Description
Is your feature request related to a problem? Please describe.
During #291 you implemented support for SHA256 Gravatars (thanks!)
I actually wasn't aware of the improved hash support, so started researching what else had changed since I last updated my knowledge on this topic.
Spoiler: a lot 😁
TL;DR: a LOT of sources now implement the <some-base-url>/<sha256-of-email>?params
convention, so the "gravatar mechanism" can easily be extended for other providers (including local, offline self-hosted ones)
Below is a summary of my research this evening (+ night, I'm quite past my bed-time; self-nerd-snipe yo!)
Perhaps I shouldn't be surprised that the intersection between "user identity" and "privacy in distributed systems" is a complicated topic 👼
Describe the solution you'd like
GRavatar(.com) has set a de-facto standard for a template string: <some-base-url>/<sha256-of-email>?params
.
Thus, to support other avatar-hosts, all that is needed is making "avatar URL template" a setting, rather than a hardcoded string.
Special-casing null
(or "disabled", or NoOpAvatarFetchingStrategy
or whatever) can then function as #291's/(6d8c5dc) useGravatar
-boolean.
For convenience, a couple of "presets" (such as for gravatar.com, or libravatar should probably be provided too, that set a pre-configured URL-template.
(so ideally: a radio-button listing "disabled(the default) / gravatar.com / libravatar.org / custom (enables second text input)")
Importantly: all alternatives I've found support SHA256, so no configurable hashing strategy is needed, which simplifies technical implementation and user-facing configuration enormously.
MD5 seems to be outdated, though still-working for some implementations. After the privacy-busting attacks demonstrated against it already in 2009, it seems newer implementations just skipped over MD5 completely, and started directly with SHA256).
Re-identification risk remains, regardless of hash-type
I'd like to point out that opt-out continues to be important, even with SHA256 hashing.
The fundamental design of "stable hash of stable identifier" allows cross-identifying users across websites/programs. Indeed that is the entire design goal: get a consistent picture/profile everywhere.
It doesn't matter if one uses MD5, SHA256 or PBKDF or even ARGON2: as long as the input is unsalted-email, the output token will be stable across locations, and thus correlating (for example) a stackoverflow account to that sensitive blogpost from a decade ago remains possible.
The only way around this is to add site-specific (or even user-specific) salts into the gravatar-hash, thus deviating from the gravatar-protocol. This (intentionally) breaks avatar-lookup, and because users cannot sign-up to the gravatar-host with a salted input email, they lose any possibility of customising their (auto-generated) avatar.
StackOverflow does this intentionally, salting the email if they don't find a gravatar-account for the unsalted email. Additionally, they provide a stack-hosted "uploaded image" alternative to avoid gravatar completely. I consider this the "state of the art" for balancing gravatar-support (by default) with privacy (by choice). It is non-trivial to implement though, requiring multiple rounds of lookup, and a completely separate secondary avatar system (static image).
Describe alternatives you've considered
I see (roughly) four levels of complexity in solving the general problem of "pictures for user identity".
I want to list all of them (and some variations I came up with) in order to have a full overview, and to inform thinking about proper software architecture (depending on how many you think are worth covering, perhaps a Strategy Pattern for avatar-sourcing implementations may be worthwhile)
- Do nothing: just use committer name+email in text form.
- no cost/benefit evaluation is complete without the literally-zero-cost solution of doing-nothing-at-all.
- The least hassle, but the post-"gitnuro is private" promise not consistent with gravatar usage #291 status quo is already more featureful than this.
- This is the most powerful in terms of privacy preserving: since no third party whatsoever is involved, only what is provided in the git-repo itself
- local, file-based lookup
- have a config directory for avatars on disk (e.g. on linux
$XDG_CONFIG_DIR/gitnuro/avatars/<commiter.email>.png
. Gitnuro user can populate this manually. - perfectly privacy-preserving, since no network lookups are involved, only local storage
- avatar is configurable separately for each individual gitnuro user (my avatar-for-me doesn´t need to be same as your avatar-for-me)
- lots of manual legwork (fetching images, naming files) for gitnuro user, thus not attractive from an end-user perspective
- any "smart" pre-fetching of images from some centralised upstream quickly becomes identical to option 4: custom gravatar-style url.
- modern languages/libraries make fetching from random internet servers as easy as fetching from disk, so techical implementation is about as hard as network-based alternatives, I think(?).
- have a config directory for avatars on disk (e.g. on linux
- hardcoding a single gravatar-style provider
- this was the status-quo before "gitnuro is private" promise not consistent with gravatar usage #291
- Privacy can-of-worms instantly fully open: leaking of stable personal identifiers for any/all committers over network, to single, fixed trusted(?) party.
- Regardless whatever caching is done, at least one lookup must be done, thus "uncloaking" the ID's existence.
- Depending on cache-strategy: also leaks usage patterns of gitnuro itself.
- No way to please everyone with only a single, hardcoded provider.
- configurable gravatar-style provider (multiple providers and/or custom URL):
- This what I am suggesting in this feature request
- technical implementation only slightly more complex than single hardcoded URL, thus not something I consider a separate level of complexity.
- privacy can-of-worms somewhat open, if the user chooses to enable the feature, and the avatar-host is unfriendly.
- by configuring a trusted URL, privacy can be preserved.
For example: a central company git-server already knows all committer-ids, and timestamps of when you commit/push/pull. it learns nothing new from its avatar-api: neither about committer identities, nor about your working times. - can support non-public avatar sources, such as self-hosted git-servers on company intranet.
- This level automatically unlocks "central" libravatar: https://wiki.libravatar.org/description/
- fully-federated libravatar: gitnuro discovers gravatar-url for each committer-email-domain.
- technically the most challenging, since full federation requires host-discovery (via DNS SRV records), not just string-template replacing.
- The only libravatar java library that I found doesn't implement federation (https://github.com/alessandroleite/libravatar-j)
- While I personally think it's a worthy goal, I consider federation out-of-scope for this feature request.
A fixed configurable URL will serve 80% of use-cases with 20% of the implementation effort compared to federation.
context, documentation and research notes
- gravatar.com the original which started it all
- lookup template:
https://gravatar.com/avatar/HASH
(with optional params) - Docu: https://docs.gravatar.com/api/avatars/hash/
- SHA256 support quietly added quite "recently" it seems.
- wayback machine shows gravatar.com docu recommended SHA256 at earliest known snapshot: 23 October 2023.
- wordpress.com (owned by automattic, just like gravatar.com) switched only as recently as 18 December 2024, citing concerns over availability of SHA256 extension on common PHP deployments.
- lookup template:
- my $DAYJOB uses self-hosted gitea.
Gitea supports configurable URLs for avatar-fetching, but uses the gravatar-style SHA256 hash.- Server Configuration docu cheatsheet > avatar section
- todo: lookup template for intranet instance (from memory:
https://git.dept.intra.company.com/avatars/
or something like that)
certificate into JVM keystore, or gitnuro provides "trust-on-first-use" ("TOFU") prompt (and handles verification against local gitnuro keystore), or wildly-unsafe-but-easy "disable HTTPS cert validation" boolean setting)
- Libravatar.org offers a FOSS alternative to gravatar.com
- centralised lookup template:
https://seccdn.libravatar.org/gravatarproxy/${the-hash}?s=512&default=identicon
- ivatar software (and some alternative implementations) available for self-hosting
- centralised lookup template:
- public gitea.com seems to allow user-specified URLs, for example: - multiple entries use
https://seccdn.libravatar.org/gravatarproxy/${the-hash}?s=512&default=identicon
- a few seem to have some sort of "local CDN" in the form of
https://4d3e0f26919f429c2b0092fb846c818a.r2.cloudflarestorage.com/gitea-com-prod/avatars/${the-hash}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=<...lots more X-Amz headers
- a few seem to have some sort of "local CDN" in the form of
- Gitnuro Alternative
sourcegit
exists.- also hardcodes to gravatar (seems I have more issues to open 👼)
https://github.com/sourcegit-scm/sourcegit/blob/1138ba304d2e01b64c32226773ae3387369e51a9/src/Models/AvatarManager.cs#L78-L82 - does have special handling for github's identity protection scheme (
avatars.githubusercontent.com
)
- also hardcodes to gravatar (seems I have more issues to open 👼)
- random unsorted concern: self-hosted instances with self-signed intranet HTTPS certificates will be "fun"; either user must import a certificate into the system java keystore, or gitnuro will have to show a "Trust on First Use (TOFU)"-prompt (and store the answer in an extra gitnuro keystore, or a (wildly-unsafe) "skip certificate validation" boolean setting.