average_cell_entropy - string conversion causes incorrect outputs

Hello! I really enjoy working with this library. It's my first time working with CA and the library makes the learning very easy. 

I noticed a small problem while working with larger numbers of unique values.

In the `average_cell_entropy()` function, the `shannon_entropy()` is called on concatenated string conversions of column elements.
Computed entropy is effectively computed on a string - an array of characters. 

This creates an issue when one unique element of input is converted to a string consisting of multiple characters. A number `10` gets converted to a string `"10"` made of 2 different characters.

Computing entropy on such strings can lead to incorrect results. 

Easy example to reproduce:

```python
>>> import numpy as np
>>> from cellpylib import average_cell_entropy
>>> average_cell_entropy(np.array([[9]]))
0.0
>>> average_cell_entropy(np.array([[10]]))
1.0
```

Computing entropy on 1x1 array of one unique value should always return `0.0` - one probability equal to 1 for one element.

Having number `10` as input returns entropy `1.0` in instead of `0.0` with current implementation. That's entropy for two unique values with `0.5` probability each. 

This can lead to potential faulty results for very large numbers of unique values.
If the column values are `[0, 1, 2, ..., 19]` the string for entropy computation will be `"012345678910111213141516171819"` with `'1'` making over one third of all characters.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

average_cell_entropy - string conversion causes incorrect outputs #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

average_cell_entropy - string conversion causes incorrect outputs #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions