Skip to content

Calculate Git LFS data (storage) usage for each repo #133

@anthonyfok

Description

@anthonyfok

While GitHub does shows Git LFS data usage, it only shows the top 5 and hides the rest.

It would be nice to have a daily updated web page and/or a CLI tool that shows our Git LFS data usage across all repos for diagnostics purposes; to estimate the bandwidth usage of repo mirroring or cloning, etc.

$ gh repo list OpenDRR --limit 5 --public

Showing 5 of 35 repositories in @OpenDRR that match your search

OpenDRR/opendrr-api    REST API for OpenDRR data / API REST pour les données OpenDRR                                                  public  1h
OpenDRR/model-factory  OpenQuake compilation and data manipulation scripts                                                            public  11h
OpenDRR/python-env     Docker image for Linux based python environment                                                                public  4d
OpenDRR/riskprofiler   Web Application to Support Disaster Resilience / Application web pour soutenir la résilience aux catastrophes  public  5d
OpenDRR/boundaries     Boundary geometries for model results in Geopackage format.                                                    public  5d

Mini HOWTOs

To get a list of all our repos (including private and archived ones):

gh repo list OpenDRR --limit 200 | cut -f1

(or borrow from @DamonU2's work on #125 where direct API call is used.)

For each repo (using OpenDRR/boundaries as example):

To clone a repo without checking out LFS files:

GIT_LFS_SKIP_SMUDGE=1 gh repo clone OpenDRR/boundaries

To sum up LFS data storage usage for all files in the repo:

~/OpenDRR/boundaries$ git lfs ls-files --debug | grep size: | grep -o '[0-9]\+' | paste -sd + - | bc | numfmt --to=iec --round=nearest --format="%.2f"
8.34G
~/OpenDRR/boundaries$ git lfs ls-files --debug --all | grep size: | grep -o '[0-9]\+' | paste -sd + - | bc | numfmt --to=iec --round=nearest --format="%.2f"
20.50G

where the relevant options for git lfs ls-files are:

  • -d --debug:
    Show as much information as possible about a LFS file. This is intended
    for manual inspection; the exact format may change at any time.

  • -a --all:
    Inspects the full history of the repository, not the current HEAD (or other
    provided reference). This will include previous versions of LFS objects that
    are no longer found in the current tree.

The 20.50G figure matches that reported by GitLab at https://gitlab.com/groups/OpenDRR/-/usage_quotas#storage-quota-tab. It is actually 20.50 GiB (10243). When numfmt --to=si is used, it is 22.01 GB (10003).

While git lfs ls-files --size also gives size information, it is given in human-readable form (e.g. 2.5 GB and thus not as precise.

Credit (for the use of paste, bc and numfmt): linux - Sum up numbers with KB/MB/GB/TB/PB... suffixes - Unix & Linux Stack Exchange


For the size of the Git repo itself without counting LFS storage:

$ curl https://api.github.com/repos/OpenDRR/boundaries 2> /dev/null | grep size | tr -dc '[:digit:]'
7294

Credit: https://stackoverflow.com/questions/8646517/how-can-i-see-the-size-of-a-github-repository-before-cloning-it

There is also https://github.com/github/git-sizer which "[c]ompute[s] various size metrics for a Git repository, flagging those that might cause problems".

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions