Skip to content

Commit ad21e80

Browse files
authored
Merge pull request #1 from IFCA-Advanced-Computing/develop
Update to version 0.0.3
2 parents 3eec7fe + 08e3794 commit ad21e80

23 files changed

+239
-35
lines changed

CITATION.cff

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
cff-version: 0.0.2
2+
message: "If you use this software, please cite it as below."
3+
authors:
4+
- family-names: "Sáinz-Pardo Díaz"
5+
given-names: "Judith"
6+
orcid: "https://orcid.org/0000-0002-8387-578X"
7+
- family-names: "López García"
8+
given-names: "Álvaro"
9+
orcid: "https://orcid.org/0000-0002-0013-4602"
10+
title: "ANJANA"
11+
version: 0.0.3
12+
date-released: 2024-04-18
13+
url: "https://github.com/IFCA-Advanced-Computing/anjana"

MANIFEST.in

Lines changed: 0 additions & 1 deletion
This file was deleted.

README.md

Lines changed: 36 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# ANJANA
22
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE)
33
[![codecov](https://codecov.io/gh/IFCA-Advanced-Computing/anjana/graph/badge.svg?token=AVI53GZ7YD)](https://codecov.io/gh/IFCA-Advanced-Computing/anjana)
4-
4+
![PyPI](https://img.shields.io/pypi/v/anjana)
5+
![PyPI - Downloads](https://img.shields.io/pypi/dm/anjana)
6+
[![Documentation Status](https://readthedocs.org/projects/anjana/badge/?version=latest)](https://anjana.readthedocs.io/en/latest/?badge=latest)
57
![Python version](https://img.shields.io/badge/python-3.9|3.10|3.11|3.12-blue)
68

79

@@ -20,7 +22,7 @@ The following anonymity techniques are implemented, based on the Python library
2022
* _Enhanced β-likeness_.
2123
* _δ-disclosure privacy_.
2224

23-
## :bulb: Installation
25+
## Installation
2426
First, we strongly recommend the use of a virtual environment. In linux:
2527
```bash
2628
virtualenv .venv -p python3
@@ -42,15 +44,15 @@ Install the most updated version of anjana (linux and windows):
4244
pip install git+https://github.com/IFCA-Advanced-Computing/anjana.git
4345
```
4446

45-
## :rocket: Getting started
47+
## Getting started
4648

4749
For anonymizing your data you need to introduce:
48-
* The **pandas dataframe** with the data to be anonymized. Each column can contain: indentifiers, quasi-indentifiers or sensitive attributes.
50+
* The **pandas dataframe** with the data to be anonymized. Each column can contain: identifiers, quasi-indentifiers or sensitive attributes.
4951
* The **list with the names of the identifiers** in the dataframe, in order to suppress them.
5052
* The **list with the names of the quasi-identifiers** in the dataframe.
5153
* The **sentive attribute** (only one) in case of applying other techniques than _k-anonymity_.
5254
* The **level of anonymity to be applied**, e.g. _k_ (for _k-anonymity_), __ (for _ℓ-diversity_), _t_ (for _t-closeness_), _β_ (for _basic or enhanced β-likeness_), etc.
53-
* Maximum **level of record suppression** allowed (from 0 to 100).
55+
* Maximum **level of record suppression** allowed (from 0 to 100, acting as the percentage of suppressed records).
5456
* Dictionary containing one dictionary for each quasi-identifier with the **hierarchies** and the levels.
5557

5658
### Example: apply _k-anonymity_, _ℓ-diversity_ and _t-closeness_ to the [adult dataset](https://archive.ics.uci.edu/dataset/2/adult) with some predefined hierarchies:
@@ -137,6 +139,8 @@ For a better understanding, let's look at the following example. Supose that we
137139
Then, in order to create the hierarquies we can define the following dictionary:
138140

139141
```python
142+
import numpy as np
143+
140144
age = data['age'].values
141145
# Values: [29 24 28 27 24 23 19 29 17 19] (note that the following can be automatized)
142146
age_5years = ['[25, 30)', '[20, 25)', '[25, 30)',
@@ -160,10 +164,34 @@ hierarchies = {
160164
}
161165
```
162166

163-
## :scroll: License
164-
This project is licensed under the [Apache 2.0 license](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE?ref_type=heads).
167+
You can also use the function _generate_intervals()_ from _utils_ for creating the interval-based hierarchy as follows:
168+
169+
```python
170+
import numpy as np
171+
from anjana.anonymity import utils
172+
173+
age = data['age'].values
174+
175+
hierarchies = {
176+
"age": {
177+
0: data["age"].values,
178+
1: utils.generate_intervals(data["age"].values, 0, 100, 5),
179+
2: utils.generate_intervals(data["age"].values, 0, 100, 10),
180+
},
181+
"gender": {
182+
0: data["gender"].values,
183+
1: np.array(["*"] * len(data["gender"].values)) # Suppression
184+
},
185+
"city": {0: data["city"].values,
186+
1: np.array(["*"] * len(data["city"].values))} # Suppression
187+
}
188+
```
189+
190+
191+
## License
192+
This project is licensed under the [Apache 2.0 license](https://github.com/IFCA-Advanced-Computing/anjana/blob/main/LICENSE).
165193

166-
## :warning: Project status
194+
## Project status
167195
This project is under active development.
168196

169197
## Funding and acknowledgments

anjana/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@
1616

1717
"""ANJANA is an open source framework for anonymizing data with different techniques."""
1818

19-
__version__ = "0.0.2"
19+
__version__ = "0.0.3"

anjana/anonymity/_delta_disclosure.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def delta_disclosure(
7979
quasi_ident_gen = copy(quasi_ident)
8080

8181
if delta_real <= delta:
82-
print(f"The data verifies delta-disclosure with t={delta_real}")
82+
print(f"The data verifies delta-disclosure with delta={delta_real}")
8383
return data_kanon
8484

8585
while delta_real > delta:

anjana/anonymity/_k_anonymity.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ def alpha_k_anonymity(
153153
data_kanon, quasi_ident
154154
)
155155

156+
k_ec = []
156157
alpha_ec = []
157158
for ec in equiv_class:
158159
data_temp = data_kanon.iloc[
@@ -164,14 +165,17 @@ def alpha_k_anonymity(
164165
for s in values
165166
]
166167
alpha_ec.append(max(alpha_s))
168+
k_ec.append(len(ec))
167169

168170
if alpha > min(alpha_ec):
169171
if max(alpha_ec) <= alpha:
170172
return data_kanon
171173

172-
data_ec = pd.DataFrame({"equiv_class": equiv_class, "alpha": alpha_ec})
174+
data_ec = pd.DataFrame(
175+
{"equiv_class": equiv_class, "alpha": alpha_ec, "k": k_ec}
176+
)
173177
data_ec_alpha = data_ec[data_ec.alpha > alpha]
174-
records_sup = sum(data_ec_alpha.alpha.values)
178+
records_sup = sum(data_ec_alpha.k.values)
175179
if (records_sup + supp_records) * 100 / len(data) <= supp_level:
176180
ec_elim = np.concatenate(
177181
[

anjana/anonymity/_l_diversity.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ def recursive_c_l_diversity(
254254
equiv_class = pycanon.anonymity.utils.aux_anonymity.get_equiv_class(
255255
data_kanon, quasi_ident
256256
)
257+
k_ec = []
257258
c_ec = []
258259
for ec in equiv_class:
259260
data_temp = data_kanon.iloc[
@@ -262,12 +263,15 @@ def recursive_c_l_diversity(
262263
values = np.unique(data_temp[sens_att].values)
263264
r_ec = np.sort([len(data_temp[data_temp[sens_att] == s]) for s in values])
264265
c_ec.append(np.floor(r_ec[0] / sum(r_ec[(l_div - 1) :]) + 1))
266+
k_ec.append(len(ec))
265267
if max(c_ec) < c:
266268
f"Recursive (c,l)-diversity cannot be achieved for l={l_div} and c={c}"
267269
else:
268-
data_ec = pd.DataFrame({"equiv_class": equiv_class, "c_ec": c_ec})
270+
data_ec = pd.DataFrame(
271+
{"equiv_class": equiv_class, "c_ec": c_ec, "k": k_ec}
272+
)
269273
data_ec_c = data_ec[data_ec.c_ec < c]
270-
records_sup = sum(data_ec_c.c_ec.values)
274+
records_sup = sum(data_ec_c.k.values)
271275
if (records_sup + supp_records) * 100 / len(data) <= supp_level:
272276
ec_elim = np.concatenate(
273277
[
@@ -358,11 +362,14 @@ def _l_diversity_inner(
358362
ec_sensitivity = [
359363
len(np.unique(data_kanon.iloc[ec][sens_att])) for ec in equiv_class
360364
]
365+
k_ec = [len(ec) for ec in equiv_class]
361366

362367
if l_div > max(ec_sensitivity):
363-
data_ec = pd.DataFrame({"equiv_class": equiv_class, "l": ec_sensitivity})
368+
data_ec = pd.DataFrame(
369+
{"equiv_class": equiv_class, "l": ec_sensitivity, "k": k_ec}
370+
)
364371
data_ec_l = data_ec[data_ec.l < l_div]
365-
records_sup = sum(data_ec_l.l.values)
372+
records_sup = sum(data_ec_l.k.values)
366373
if (records_sup + supp_records_k) * 100 / len(data) <= supp_level:
367374
ec_elim = np.concatenate(
368375
[

anjana/anonymity/utils/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,13 @@
2020
apply_hierarchy,
2121
check_gen_level,
2222
get_transformation,
23+
generate_intervals,
2324
)
2425

2526
__all__ = [
2627
"suppress_identifiers",
2728
"apply_hierarchy",
2829
"check_gen_level",
2930
"get_transformation",
31+
"generate_intervals",
3032
]

anjana/anonymity/utils/utils.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,3 +149,41 @@ def get_transformation(
149149
transformation.append(0)
150150

151151
return transformation
152+
153+
154+
@beartype()
155+
def generate_intervals(
156+
quasi_ident: typing.Union[typing.List, np.ndarray],
157+
inf: typing.Union[int, float],
158+
sup: typing.Union[int, float],
159+
step: int,
160+
) -> list:
161+
"""Given a quasi-identifier of numeric type, creates a list containing an
162+
interval-based generalization (hierarchy) of the values of the quasi-identifier.
163+
The intervals will have the length entered in the parameter step.
164+
165+
:param quasi_ident: values of the quasi-identifier on which the interval-based
166+
generalization is to be obtained
167+
:type quasi_ident: list or numpy array
168+
169+
:param inf: lower value of the set of intervals
170+
:type inf: int or float
171+
172+
:param sup: bigger value of the set of intervals
173+
:type sup: int or float
174+
175+
:param step: spacing between values of the intervals
176+
:type step: int
177+
178+
:return: list with the intervals associated with the given values
179+
:rtype: list
180+
"""
181+
values = np.arange(inf, sup + 1, step)
182+
interval = []
183+
for num in quasi_ident:
184+
lower = np.searchsorted(values, num)
185+
if lower == 0:
186+
lower = 1
187+
interval.append(f"[{values[lower - 1]}, {values[lower]})")
188+
189+
return interval

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
project = "ANJANA"
2020
copyright = "2024, Spanish National Research Council (CSIC)"
2121
author = "Judith Sáinz-Pardo Díaz (CSIC)"
22-
release = "0.0.2"
22+
release = "0.0.3"
2323

2424
# -- General configuration ---------------------------------------------------
2525
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

0 commit comments

Comments
 (0)