Skip to content

Commit 7531696

Browse files
authored
Update README.md
1 parent d1296be commit 7531696

File tree

1 file changed

+29
-15
lines changed

1 file changed

+29
-15
lines changed

README.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,25 @@ This paper is an extended replication of the paper [_Code cloning in smart contr
1616
- `/duplicates` – Cleaned data.
1717
- `openzeppelin.zip` – OpenZeppelin data. Requires unzipping into folder `openzeppelin`.
1818
- `/metadata` – Metadata about the authors, creation date and transactions of the contracts in the corpus.
19-
- `/prepared` - Prepared data for analysis. Contains potentially long-running scripts.
20-
- `/02_analysis` - Analysis scripts.
21-
- `/03_results` - Results.
19+
- `/prepared` - Prepared pickle files for data analysis.
20+
- `/02_prepare` - Scripts for preparing the data in `/01_data/prepared`. Contains potentially long-running scripts. In such cases, the approximate execution times are reported in the source files.
21+
- `/03_analysis` - Analysis scripts for the automated analysis of data.
22+
- `/04_results` - Results of the analyses, including charts and numeric results. Some of these results are discussed in the paper in great detail. Every analysis result corresponds to a particular observation in the paper, clearly identified in the name of the generated observation file.
2223

2324
## Reproduction
2425

26+
The following describes four reproduction scenarios. Any of the scenarios can be executed independently from the others.
27+
* [Reproduction of the analyses](#reproduction-of-the-analyses): reproduces the analysis results in `/04_results`, including charts and numeric results. The scripts use the prepared data contained in the `/01_data/prepared` folder.
28+
* [Reproduction of the prepared data](#reproduction-of-the-prepared-data-01_dataprepared): reproduces the prepared data in `/01_data/prepared` by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
29+
* [Reproduction of the cleaned data](#reproduction-of-the-cleaned-data-01_dataclonedataduplicates): reproduces the cleaned data in `/01_data/clonedata/duplicates` from the raw data in `/01_data/clonedata/raw` by bringing the contents of the `.xml` files into a consolidated form.
30+
* [Reproduction of the raw data](#reproducing-the-clone-analysis-01_dataclonedataraw): reproduces the raw data `/01_data/clonedata/raw` by running the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
31+
32+
**NOTE:** The following steps have been tested with `python>=3.7 && python<3.10`.
33+
2534
### Reproduction of the analyses
2635

36+
Follow the steps below to reproduce the analysis results in `/04_results`, including charts and numeric results. The scripts use the prepared data contained in the /01_data/prepared folder.
37+
2738
1. Clone this repository.
2839
2. Install dependencies by running `pip install -r requirements.txt` in the root folder.
2940
3. Extract `/01_data/clonedata/openzeppelin.zip` into folder `/01_data/clonedata/openzeppelin`, or run `python 01_unzip.py` in the `02_prepare` folder.
@@ -33,7 +44,7 @@ This paper is an extended replication of the paper [_Code cloning in smart contr
3344

3445
### Reproduction of the prepared data (`/01_data/prepared`)
3546

36-
The prepared data is used in the analyses. The prepared data is included in this replication package in folder `/01_data/prepared`, but it can be reproduced from the cleaned data by following the steps below.
47+
Follow the steps below to reproduce the prepared data in `/01_data/prepared` by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
3748

3849
1. Run `python 03_mergeMetadata.py` in the `/02_prepare` folder.
3950
2. Run `python 04_prepareAnalysisData.py` in the `/02_prepare` folder.
@@ -43,28 +54,31 @@ Some preparation steps can take up to hours to complete. Please find the benchma
4354

4455
### Reproduction of the cleaned data (`/01_data/clonedata/duplicates`)
4556

57+
Follow the steps below to reproduce the cleaned data in `/01_data/clonedata/duplicates` from the raw data in `/01_data/clonedata/raw` by bringing the contents of the `.xml` files into a consolidated form.
58+
4659
The cleaned data is used in the data preparation scripts. The cleaned data is included in this replication package in folder `/01_data/clonedata/duplicates`, but it can be reproduced from the raw data by following the steps below.
4760

4861
1. Run `python 02_cleanup.py` in the `/02_prepare` folder.
4962

5063
### Reproducing the clone analysis (`/01_data/clonedata/raw`)
5164

52-
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
65+
Follow the steps below to reproduce the raw clone data in `/01_data/clonedata/raw` by running the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
66+
67+
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the [original study](https://github.com/SAILResearch/suppmaterial-18-masanari-smart_contract_cloning).
5368

54-
To run the clone analysis, please, refer to the repository of the NiCad extension developed for this study.
55-
This replication package contains a Docker image with the installed tool. The image can be found in the `/docker` folder.
69+
A Docker image is maintained on [Docker Hub](https://hub.docker.com/repository/docker/faizank/nicad6) and can be obtained by running: `docker pull faizank/nicad6:TSE`.
5670

57-
The image is maintained at https://hub.docker.com/repository/docker/faizank/nicad6. Pull it by running `docker pull faizank/nicad6:TSE`.
58-
The repository of the tool is available at https://github.com/eff-kay/nicad6.
71+
The following process assumes [docker](https://docs.docker.com/get-started/) is installed and working correctly, and the image is pulled. You can verify that image by issuing `docker images` from the terminal and see that there is an image named `faizank/nicad6` available in the list.
5972

60-
The following process assumes [docker](https://docs.docker.com/get-started/) is installed and working correctly.
73+
**NOTE:** The following steps have been tested with `docker_engine==20.10.17(build==100c701)`
6174

62-
1. Create a new folder `systems/source-code` and move the corpus to this folder.
63-
2. Create a new folder `output` to store the result of clone analysis.
64-
3. Execute the analysis by issuing the following command: `docker run --platform linux/x86_64 -v $(pwd)/output:/nicad6/01_data -v $(pwd)/systems:/nicad6/systems faizank/nicad6`. This will generate the output artefacts inside the `output` folder.
65-
4. Move the contents of the `output` folder to `01_data` and use the python scripts discussed above for the rest of the replication.
75+
1. Create a new folder `/systems/source-code` and move the corpus to this folder.
76+
2. Create a new folder `/output` to store the result of clone analysis.
77+
3. Execute the analysis by issuing the following command: `docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems faizank/nicad6`. This will generate the output artefacts inside the `output` folder.
78+
4. Move the contents of the `/output` folder to `/01_data` and use the python scripts discussed above for the rest of the replication.
6679

80+
Should you prefer to build the image from scratch, please, refer to the repository of the [NiCad extension](https://github.com/eff-kay/nicad6) developed for this study.
6781

6882
### Further experimentation with the tool
6983

70-
To experiment with the tool, issue `docker run --platform linux/x86_64 -v $(pwd)/output:/nicad6/01_data -v $(pwd)/systems:/nicad6/systems -it faizank/nicad6 bash`.
84+
To experiment with the tool, issue `docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems -it faizank/nicad6 bash`.

0 commit comments

Comments
 (0)