-
Notifications
You must be signed in to change notification settings - Fork 16
Description
After each release the stats have to be updated. Most figures can be acquired via omero fs usage
and stats.py
script.
Problem 1:
studies.tsv wants:
Study | Container | Introduced | Internal ID | Sets | Wells | Experiments (wells for screens, imaging experiments for non-screens) | Targets (genes, small molecules, geographic locations, or combination of factors (idr0019, 26, 34, 38) | Acquisitions | 5D Images | Planes | Size (TB) | Size | # of Files | avg. size (MB) | Avg. Image Dim (XYZCT)
From stats.py
you'll get
Container | ID | Set | Wells | Images | Planes | Bytes
Example:
idr0052-walther-condensinmap/experimentA | 752 | 44 of 54 | 0 | 282 | 699360 | 85.4 GB
What does 44 of 54
sets mean? What is Bytes
, does that have to be used for Size (TB)
and Size
?
omero fs usage
give you something like
Total disk usage: 115773571855 bytes in 25 files
. What about this size? And is the 25 files
the # of Files
?
The workflow doc has an hql query how to get the Avg. Image Dim (XYZCT)
, but only for projects not for screens.
And how to get Targets
? As this can be multiple things, can't think of an easy/generic script which can go through any annotation.csv and pull the number of unique 'targets'.
Problem 2
releases.tsv wants:
Date | Data release | Code version | Sets | Wells | Experiments | Images | Planes | Size (TB) | Files (Million) | DB Size (GB)
From stats.py you'll get some of it:
Container | ID | Set | Wells | Images | Planes | Bytes
Total | | 13044 | 1213175 | 9150589 | 65571290 | 334.2 TB
But where to get Files (Million)
from? And how to get DB Size (GB)
?
/cc @sbesson wasn't really sure where to open the issue, here (stats) or idr-utils (stats.py script).