Skip to content

Commit 857604a

Browse files
committed
update
1 parent 15b4b8e commit 857604a

File tree

2 files changed

+120
-100
lines changed

2 files changed

+120
-100
lines changed

assignment4.Rmd

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ $$
9999
$$
100100

101101
Where:
102+
102103
- \( D_i \) is the distance between feature *i* and its nearest neighbor
103104
- *n* is the total number of features (points)
104105

@@ -109,6 +110,7 @@ $$
109110
$$
110111

111112
Where:
113+
112114
- *n* is the total number of features (points)
113115
- *A* is the area of the study region
114116

@@ -119,6 +121,7 @@ NNI = \frac{\text{Observed Average Distance}}{\text{Expected Average Distance (w
119121
$$
120122

121123
Where:
124+
122125
- \( \bar{D}_O \) is the observed average distance between each point and its nearest neighbor
123126
- \( \bar{D}_E \) is the expected average distance under complete spatial randomness (CSR)
124127

@@ -156,11 +159,11 @@ While Nearest Neighbor Analysis (NNA) is a useful method for detecting point pat
156159

157160
### Assumption of Regular Boundaries
158161

159-
A major limitation of NNA is that it typically assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance** \( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
162+
NNA assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance** \( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
160163

161164
**Edge Effects**
162165

163-
Another common issue is **edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
166+
NNA also assumes **edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
164167

165168
**Assumption of Homogeneity**
166169

@@ -180,7 +183,8 @@ $$
180183
K(d) = \frac{\frac{1}{n} \sum_{i=1}^{n} \#\left[S \in \text{Circle}(s_i, d)\right]}{\frac{n}{a}} = \frac{\text{Mean number of points in all circles of radius } d}{\text{Mean point density in entire study region } a}
181184
$$
182185

183-
Where
186+
Where:
187+
184188
- \( n \) is the total number of points in the dataset
185189
- \( a \) is the area of the study region
186190
- \( d \) is the search radius (distance threshold)
@@ -502,13 +506,13 @@ ggplot() +
502506
labs(title = "Philadelphia Population by Zip Code")
503507
```
504508

505-
Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia nad South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunties for farmers' markets to attract enough customer. In this came, inhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the inhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
509+
Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia and South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunities for farmers' markets to attract enough customer. In this came, nonhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the nonhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
506510

507511
# Discussion
508512

509-
The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness. The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
513+
The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. **These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness.** The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
510514

511-
These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved. Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
515+
**These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved.** Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
512516

513517
```{r}
514518
ggplot() +

assignment4.html

Lines changed: 110 additions & 94 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)