You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-15Lines changed: 29 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,14 +16,25 @@ This paper is an extended replication of the paper [_Code cloning in smart contr
16
16
-`/duplicates` – Cleaned data.
17
17
-`openzeppelin.zip` – OpenZeppelin data. Requires unzipping into folder `openzeppelin`.
18
18
-`/metadata` – Metadata about the authors, creation date and transactions of the contracts in the corpus.
19
-
-`/prepared` - Prepared data for analysis. Contains potentially long-running scripts.
20
-
-`/02_analysis` - Analysis scripts.
21
-
-`/03_results` - Results.
19
+
-`/prepared` - Prepared pickle files for data analysis.
20
+
-`/02_prepare` - Scripts for preparing the data in `/01_data/prepared`. Contains potentially long-running scripts. In such cases, the approximate execution times are reported in the source files.
21
+
-`/03_analysis` - Analysis scripts for the automated analysis of data.
22
+
-`/04_results` - Results of the analyses, including charts and numeric results. Some of these results are discussed in the paper in great detail. Every analysis result corresponds to a particular observation in the paper, clearly identified in the name of the generated observation file.
22
23
23
24
## Reproduction
24
25
26
+
The following describes four reproduction scenarios. Any of the scenarios can be executed independently from the others.
27
+
*[Reproduction of the analyses](#reproduction-of-the-analyses): reproduces the analysis results in `/04_results`, including charts and numeric results. The scripts use the prepared data contained in the `/01_data/prepared` folder.
28
+
*[Reproduction of the prepared data](#reproduction-of-the-prepared-data-01_dataprepared): reproduces the prepared data in `/01_data/prepared` by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
29
+
*[Reproduction of the cleaned data](#reproduction-of-the-cleaned-data-01_dataclonedataduplicates): reproduces the cleaned data in `/01_data/clonedata/duplicates` from the raw data in `/01_data/clonedata/raw` by bringing the contents of the `.xml` files into a consolidated form.
30
+
*[Reproduction of the raw data](#reproducing-the-clone-analysis-01_dataclonedataraw): reproduces the raw data `/01_data/clonedata/raw` by running the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
31
+
32
+
**NOTE:** The following steps have been tested with `python>=3.7 && python<3.10`.
33
+
25
34
### Reproduction of the analyses
26
35
36
+
Follow the steps below to reproduce the analysis results in `/04_results`, including charts and numeric results. The scripts use the prepared data contained in the /01_data/prepared folder.
37
+
27
38
1. Clone this repository.
28
39
2. Install dependencies by running `pip install -r requirements.txt` in the root folder.
29
40
3. Extract `/01_data/clonedata/openzeppelin.zip` into folder `/01_data/clonedata/openzeppelin`, or run `python 01_unzip.py` in the `02_prepare` folder.
@@ -33,7 +44,7 @@ This paper is an extended replication of the paper [_Code cloning in smart contr
33
44
34
45
### Reproduction of the prepared data (`/01_data/prepared`)
35
46
36
-
The prepared data is used in the analyses. The prepared data is included in this replication package in folder `/01_data/prepared`, but it can be reproduced from the cleaned data by following the steps below.
47
+
Follow the steps below to reproduce the prepared data in `/01_data/prepared` by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
37
48
38
49
1. Run `python 03_mergeMetadata.py` in the `/02_prepare` folder.
39
50
2. Run `python 04_prepareAnalysisData.py` in the `/02_prepare` folder.
@@ -43,28 +54,31 @@ Some preparation steps can take up to hours to complete. Please find the benchma
43
54
44
55
### Reproduction of the cleaned data (`/01_data/clonedata/duplicates`)
45
56
57
+
Follow the steps below to reproduce the cleaned data in `/01_data/clonedata/duplicates` from the raw data in `/01_data/clonedata/raw` by bringing the contents of the `.xml` files into a consolidated form.
58
+
46
59
The cleaned data is used in the data preparation scripts. The cleaned data is included in this replication package in folder `/01_data/clonedata/duplicates`, but it can be reproduced from the raw data by following the steps below.
47
60
48
61
1. Run `python 02_cleanup.py` in the `/02_prepare` folder.
49
62
50
63
### Reproducing the clone analysis (`/01_data/clonedata/raw`)
51
64
52
-
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
65
+
Follow the steps below to reproduce the raw clone data in `/01_data/clonedata/raw` by running the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
66
+
67
+
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the [original study](https://github.com/SAILResearch/suppmaterial-18-masanari-smart_contract_cloning).
53
68
54
-
To run the clone analysis, please, refer to the repository of the NiCad extension developed for this study.
55
-
This replication package contains a Docker image with the installed tool. The image can be found in the `/docker` folder.
69
+
A Docker image is maintained on [Docker Hub](https://hub.docker.com/repository/docker/faizank/nicad6) and can be obtained by running: `docker pull faizank/nicad6:TSE`.
56
70
57
-
The image is maintained at https://hub.docker.com/repository/docker/faizank/nicad6. Pull it by running `docker pull faizank/nicad6:TSE`.
58
-
The repository of the tool is available at https://github.com/eff-kay/nicad6.
71
+
The following process assumes [docker](https://docs.docker.com/get-started/) is installed and working correctly, and the image is pulled. You can verify that image by issuing `docker images` from the terminal and see that there is an image named `faizank/nicad6` available in the list.
59
72
60
-
The following process assumes [docker](https://docs.docker.com/get-started/) is installed and working correctly.
73
+
**NOTE:**The following steps have been tested with `docker_engine==20.10.17(build==100c701)`
61
74
62
-
1. Create a new folder `systems/source-code` and move the corpus to this folder.
63
-
2. Create a new folder `output` to store the result of clone analysis.
64
-
3. Execute the analysis by issuing the following command: `docker run --platform linux/x86_64 -v $(pwd)/output:/nicad6/01_data -v $(pwd)/systems:/nicad6/systems faizank/nicad6`. This will generate the output artefacts inside the `output` folder.
65
-
4. Move the contents of the `output` folder to `01_data` and use the python scripts discussed above for the rest of the replication.
75
+
1. Create a new folder `/systems/source-code` and move the corpus to this folder.
76
+
2. Create a new folder `/output` to store the result of clone analysis.
77
+
3. Execute the analysis by issuing the following command: `docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems faizank/nicad6`. This will generate the output artefacts inside the `output` folder.
78
+
4. Move the contents of the `/output` folder to `/01_data` and use the python scripts discussed above for the rest of the replication.
66
79
80
+
Should you prefer to build the image from scratch, please, refer to the repository of the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
67
81
68
82
### Further experimentation with the tool
69
83
70
-
To experiment with the tool, issue `docker run --platform linux/x86_64 -v $(pwd)/output:/nicad6/01_data -v $(pwd)/systems:/nicad6/systems -it faizank/nicad6 bash`.
84
+
To experiment with the tool, issue `docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems -it faizank/nicad6 bash`.
0 commit comments