MUSA-Zhanchao
diff --git a/‎assignment4.Rmd
Lines changed: 10 additions & 6 deletions b/‎assignment4.Rmd
Lines changed: 10 additions & 6 deletions
diff --git a/‎assignment4.html
Lines changed: 110 additions & 94 deletions b/‎assignment4.html
Lines changed: 110 additions & 94 deletions
@@ -99,6 +99,7 @@ $$
 $$
 
 Where:
+
 - \( D_i \) is the distance between feature *i* and its nearest neighbor
 - *n* is the total number of features (points)
 
@@ -109,6 +110,7 @@ $$
 $$
 
 Where:
+
 - *n* is the total number of features (points)
 - *A* is the area of the study region
 
@@ -119,6 +121,7 @@ NNI = \frac{\text{Observed Average Distance}}{\text{Expected Average Distance (w
 $$
 
 Where:
+
 - \( \bar{D}_O \) is the observed average distance between each point and its nearest neighbor
 - \( \bar{D}_E \) is the expected average distance under complete spatial randomness (CSR)
 
@@ -156,11 +159,11 @@ While Nearest Neighbor Analysis (NNA) is a useful method for detecting point pat
 
 ### Assumption of Regular Boundaries
 
-A major limitation of NNA is that it typically assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance** \( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
+NNA assumes a **rectangular study area**, regardless of the actual shape of the region. For example, in the case of hospital locations in Philadelphia, the hospitals are **clustered in Center City**. However, because the tool uses a smaller rectangular bounding box rather than the actual city outline, the calculated area was smaller than the actual distribution area, which is primarily concentrated in Center City. This underestimation of the study area **decreased the expected average distance** \( \bar{d}_e \), leading to a false conclusion of randomness, even though the clustering in the city center was visually evident. This example highlights how misrepresenting the true shape of the study area can result in inaccurate or misleading conclusions.
 
 **Edge Effects**
 
-Another common issue is **edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
+NNA also assumes **edge effects**. Points located near the boundaries may have their nearest neighbors just **outside** the study area, but these are **not considered** in the analysis. This omission can result in **overestimated nearest neighbor distances**, which in turn distorts the z-score and test conclusions, particularly in dense urban areas.
 
 **Assumption of Homogeneity**
 
@@ -180,7 +183,8 @@ $$
 K(d) = \frac{\frac{1}{n} \sum_{i=1}^{n} \#\left[S \in \text{Circle}(s_i, d)\right]}{\frac{n}{a}} = \frac{\text{Mean number of points in all circles of radius } d}{\text{Mean point density in entire study region } a}
 $$
 
-Where
+Where:
+
 - \( n \) is the total number of points in the dataset
 - \( a \) is the area of the study region
 - \( d \) is the search radius (distance threshold)
@@ -502,13 +506,13 @@ ggplot() +
   labs(title = "Philadelphia Population by Zip Code")
 ```
 
-Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia nad South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunties for farmers' markets to attract enough customer. In this came, inhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the inhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
+Without conducting the analyses, we suspect that the absence of farmers markets in Northeastern Phildelphia and South Philadelphia could be due to low population density in those census tract. If the population sparse, there may be less demand or fewer opportunities for farmers' markets to attract enough customer. In this came, nonhomogeneous K-function analysis would be a more valuable tool. Unlike the homogeneous K-function, which assumes a uniform distribution of points across the study area, the nonhomogeneous K-function accounts for variations in point density. This allows us to assess clustering or dispersion while considering the underlying population distribution. By incorporating population density as a reference measure, we can better understand how farmers' markets are distributed relative to the population and identify areas where they may be lacking.
 
 # Discussion
 
-The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness. The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
+The results from both the Nearest Neighbor Analysis and K-function analysis consistently indicate that the spatial distribution of farmers markets in Philadelphia is significantly clustered. The Nearest Neighbor Index (NNI) is 0.778, with a z-score of -3.345 and a p-value of 0.0000002. **These values provide strong statistical evidence to reject the null hypothesis of complete spatial randomness.** The K-function analysis supports this conclusion by showing that the observed K(d) begins to exceed the theoretical K(d) at a distance of 54 feet. This divergence continues to increase with distance, indicating significant clustering across multiple spatial scales.
 
-These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved. Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
+**These findings align with initial expectations based on the visual distribution of farmers markets. The point data showed that markets were concentrated in Center City and parts of West Philadelphia, while large areas such as the Northeast and South appeared underserved.** Both methods confirmed these visual observations through statistically significant results. This consistency strengthens the reliability of the findings. At the same time, it is necessary to acknowledge the limitations of the methods used. Nearest Neighbor Analysis evaluates only the distance to the closest point and is highly sensitive to the shape of the study area. In a city with irregular boundaries such as Philadelphia, this can result in inaccurate estimates of expected spacing. K-function and L-function analyses offer a more detailed view by examining clustering across different distances. However, they rely on the assumption that points have an equal probability of occurring anywhere within the study area. This assumption is difficult to justify in a city where population density and land use vary significantly. Despite these limitations, the convergence of results across different methods provides strong evidence that the observed pattern is not random.
 
 ```{r}
 ggplot() +