You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: assignment4.Rmd
+10-6Lines changed: 10 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -99,6 +99,7 @@ $$
99
99
$$
100
100
101
101
Where:
102
+
102
103
-\( D_i \) is the distance between feature *i* and its nearest neighbor
103
104
-*n* is the total number of features (points)
104
105
@@ -109,6 +110,7 @@ $$
109
110
$$
110
111
111
112
Where:
113
+
112
114
-*n* is the total number of features (points)
113
115
-*A* is the area of the study region
114
116
@@ -119,6 +121,7 @@ NNI = \frac{\text{Observed Average Distance}}{\text{Expected Average Distance (w
119
121
$$
120
122
121
123
Where:
124
+
122
125
-\( \bar{D}_O \) is the observed average distance between each point and its nearest neighbor
123
126
-\( \bar{D}_E \) is the expected average distance under complete spatial randomness (CSR)
124
127
@@ -156,11 +159,11 @@ While Nearest Neighbor Analysis (NNA) is a useful method for detecting point pat
156
159
157
160
### Assumption of Regular Boundaries
158
161
159
-
A major limitation of NNA is that it typically assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance**\( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
162
+
NNA assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance**\( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
160
163
161
164
**Edge Effects**
162
165
163
-
Another common issue is**edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
166
+
NNA also assumes**edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
164
167
165
168
**Assumption of Homogeneity**
166
169
@@ -180,7 +183,8 @@ $$
180
183
K(d) = \frac{\frac{1}{n} \sum_{i=1}^{n} \#\left[S \in \text{Circle}(s_i, d)\right]}{\frac{n}{a}} = \frac{\text{Mean number of points in all circles of radius } d}{\text{Mean point density in entire study region } a}
181
184
$$
182
185
183
-
Where
186
+
Where:
187
+
184
188
-\( n \) is the total number of points in the dataset
185
189
-\( a \) is the area of the study region
186
190
-\( d \) is the search radius (distance threshold)
@@ -502,13 +506,13 @@ ggplot() +
502
506
labs(title = "Philadelphia Population by Zip Code")
503
507
```
504
508
505
-
Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia nad South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunties for farmers' markets to attract enough customer. In this came, inhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the inhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
509
+
Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia and South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunities for farmers' markets to attract enough customer. In this came, nonhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the nonhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
506
510
507
511
# Discussion
508
512
509
-
The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness. The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
513
+
The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. **These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness.** The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
510
514
511
-
These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved. Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
515
+
**These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved.** Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
0 commit comments