Motivation

New York City is infamously known as the concrete jungle with approximately 1 million structures distributed over an era of 778.2 km^2. However, unbeknownst to many, NYC offers more green space than any other major U.S. city. In fact, 99% of New Yorkers live within a 10-minute walk from a green space. Despite this, these areas are often overlooked due to the city’s dense, industrial layout. Our team aims to enhance access to and awareness of these green spaces by documenting the location of every garden in NYC and highlighting their features, such as plant and animal life, available amenities, and environmentally friendly practices. We are creating a comprehensive resource to empower NYC residents with this valuable information.

Initial Questions

Flora and fauna: Are gardens in specific boroughs more likely to have various types of trees (street trees, trees in gardens, fruit trees) than others, controlling for sidewalk area?

Eco-friendly practices: Are gardens in certain boroughs more likely to engage in eco-friendly practices than others?

Garden amenities: how does income in the area affect the amount of garden amenities available to people? Do certain boroughs have more garden amenities available than others?

Data

We utilized three datasets in a collection on the NYC Open Data website titled “GreenThumb Gardens”. The GreenThumb program provides programming and material support to over 550 community gardens in New York City. These datasets provide information about the NYC Parks GreenThumb community garden program. The first dataset, titled “GreenThumb Garden Info”, contains location and hours information on 629 community gardens. Location is described in a multitude of columns: zip code, congressional district, assembly district, latitude, longitude and more. We are also provided open hours of the gardens for each day of the week.

Our second dataset, GreenThumb Site Visits, contains quantitative and qualitative information related to the physical status of the garden, as well as its ongoing operation, maintenance, and programming. Site visits are done by NYC Parks staff, who are also responsible for collecting the data available in this dataset in order to ensure each park is operated in a safe manner and to understand the needs of different gardens. This dataset is extremely relevant to our project as it contains columns about eco-friendly practices and garden amenities for 357 gardens.

The third dataset is GreenThumb Block-Lot, which contains information about the size of 1,397 gardens through columns describing the location, size and area of the garden lot.

In addition to our primary GreenThumb datasets, we used additional datasets to supplement our analysis and include external variables. We used US Census Bureau’s S1903 Median Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars) dataset to obtain the median income for each borough. For each county, we extracted the median income estimate (in dollars) for each county as well as the margin of error. We also extracted county names and recoded them to borough names. Since the columns were all character type, we converted to numeric for analysis. The median income variable was then used to visualize the distribution of median income by borough and in a Poisson analysis.

Cleaning & Exploratory Analysis

Our analysis was split into three categories, each with its own cleaning, exploratory and formal statistical analysis steps. After importing the Garden Info and Site Visits datasets, we prepped the data to merge them for analysis. Cleaning steps included using janitor::clean_names() to tidy variable names and recoding the “borough” variable to express each borough’s full name rather than just a representative letter (e.g. “Bronx” instead of “X”). Each analysis category involved selecting variables of interest from the Site Visits dataset before merging it with the Garden Info dataset.

Flora and Fauna

The variables of interest for this data set were as follows: parksid, inspectionid, trees in garden, fruit trees, street trees, chickens, pond, fish in pond, turtles, and total sidewalk area. We converted these variables to numeric form through the mutate function so they could be properly analyzed. Merging this dataset by parksid to the Garden Info dataset resulted in a flora_fauna_df, a new data frame with grouped flora and fauna by borough that was used for subsequent visualizations and analyses. A map of the flora and fauna was created through the leaflet package. The dataset was then pivoted to create a multi-series bar chart to reflect the distribution of flora and fauna features by borough. To understand how the presence of different types of trees differ across boroughs, a logistic regression was run for each tree type in the data set (trees in gardens, fruit trees, and street trees). The outcome was probability of tree presence, and the predictor variable was borough. Sidewalk area was chosen as a confounder because it was the closest representation of garden size, which would be associated with both borough and tree presence. Brooklyn was made the reference group for borough because it has the largest number of community gardens.

Eco-friendly Practices

For this dataset, we focused on variables of interest: parksid, inspectionid, rain harvesting, composting, aquaponics, and solar panels, converting these to numeric form using the mutate function. We merged this cleaned dataset with the Gardens Info dataset to create a new data frame called eco_friendly_df. After merging, we grouped the eco-friendly practices by borough with the group_by(borough) function. To prepare for data visualization, we further refined the data to retain only the relevant variables (borough, rain harvesting, composting, aquaponics, and solar panels) and reshaped it with pivot_longer to focus solely on the eco-friendly practices. We explored the distribution of these practices using a multipart histogram to illustrate the prevalence of each practice by borough. Additionally, we created a map of the city that highlighted the location and addresses of each garden along with their corresponding eco-friendly practices. To deepen our analysis, we conducted a Chi-Square test of independence to examine the association between borough and the presence of eco-friendly practices, complemented by several logistic regression analyses to model the probability of a garden engaging in eco-friendly practices based on its borough.

Garden Amenities

After selecting garden-amenity-related columns from Site Visits (borough, garden name, ParksID), we merged the two datasets to create an optimal dataset for garden amenity analysis. Three main exploratory analyses were performed to assess garden amenities. To explore the distribution of garden amenity types, five garden amenity types (open lawn/communal area, murals, pond, farmer’s market, food) were selected from the original Site Visits dataset. The distribution of the types of garden amenities were visualized through a histogram; food and open lawns were the most common type of garden amenity. Garden amenity counts were then visualized by borough via histogram; Brooklyn had the greatest number of garden amenities, while Staten Island had the least. Income distribution by borough was visualized through a column chart, where Manhattan had the greatest income. The income distribution visualization was an exploratory step before running a Poisson analysis assessing the association between total amenities per borough and median income by borough.

Formal Statistical Analysis

Flora and Fauna

Several logistic regression analyses were run to model the probability of various types of trees (street trees, trees in garden, fruit trees) being present in a garden based on the borough that it is located in, controlling for sidewalk area. We hypothesized that there would be an association between tree presence and borough based on socioeconomic disparities and resources available to attend to the gardens. The p-values for a majority of the coefficients comparing Brooklyn to the remaining four boroughs deemed them insignificant. The only significant finding at the 5% level (and its interpretation) was as follows: The odds of a fruit tree being present in a community garden in Queens was 0.202 times the odds of a fruit tree being present in a community gardens in Brooklyn, controlling for sidewalk area (p-value: 0.007).

Eco-friendly Practices

We conducted a Chi-Square test of independence to investigate the association between borough and the presence of eco-friendly practices, aiming to determine whether the presence of these practices in a garden was independent of its borough. We hypothesized that there would be a strong association between gardens practicing eco-friendly practices and borough. The results (X^2 = 153.11, p-value < 2.2e-16) indicate a strong association between eco-friendly gardens and the boroughs they are located in. To further investigate, we performed several logistic regression analyses to model the probability of each eco-friendly practice being used by gardens based on the borough they are located in. We hypothesized that there is an association between borough and gardens with eco-friendly practices. The findings from our analyses are as follows: Gardens in Brooklyn exhibited the highest likelihood of rain harvesting, with an odds ratio of 2.10 (p = 0.04). Gardens in Manhattan were most likely to engage in composting, with an odds ratio of 5.30 (p = 0.00), while the use of solar panels had an odds ratio of 0.667; however, this association was not significant, as the p-value exceeded 0.05. We did not include aquaponics in our logistic regressions because only n=2 gardens utilized aquaponics, which resulted in a very small sample size that could skew models. To confirm our theory about aquaponics models, we conducted a residuals analysis through violin plots. Aquaponics has the greatest residuals of all the models, indicating poor model fit compared to rain harvesting, composting and solar panels. Rain harvesting and composting have the best model fit.

Garden Amenities

We used a Poisson analysis to assess the association between total amenities per borough and median income by borough. This analysis was run to determine whether income by borough (predictor) had any association with the number of amenities per borough (outcome). A Poisson analysis was chosen with no offset since the outcome variable was a count. Our hypothesis was that the predictor would have a strong association with the outcome, since higher socioeconomic status is typically associated with more open, clean green space. However, our hypothesis was not confirmed based on this analysis. The coefficient for median income is negative (-0.0000092), which means that as median income increases, the expected count of garden amenities decreases slightly. The odds ratio (OR = 0.9999908) indicates that a one-unit increase in median income is associated with a negligible decrease in the count of garden amenities. This change is statistically significant (p-value = 0.0000013), suggesting that median income has a minor negative effect on the number of garden amenities.

Finally, we completed a multinomial logistic regression with predictor variables of median income and garden status, and outcome of whether the garden had a pond. For this analysis we did not include gardens in Staten Island (due to low sample size) nor did we include gardens that were not under DPR jurisdiction, as those gardens would not be part of the GreenThumb program and would therefore impact the effect of the “status” variable. Individuals in the GreenThumb program have significantly lower odds (OR = 0.214) of having garden ponds compared to those not in the program, as indicated by a p-value of 0.000. This suggests that the program is associated with fewer garden ponds, reflecting its impact on gardening behaviors.Median borough income does not significantly influence the likelihood of having garden ponds (OR = 0.9999998, p-value = 0.9305), indicating that changes in income levels do not substantially affect the presence of garden ponds. Gardens in Brooklyn and Manhattan have higher odds of having a pond than gardens in other boroughs.

Discussion

Garden Amenities

Our analysis revealed key insights about the distribution of garden amenities and their association with socioeconomic factors across New York City boroughs. First, we found that food and open lawn/communal areas were the most common garden amenities citywide. Brooklyn stood out as the borough with the highest number of amenities, while Staten Island had the fewest. A separate exploratory analysis of median income distribution by borough showed that Manhattan had the highest median income.

When examining the relationship between median income and the number of garden amenities per borough using a Poisson regression model, we found an unexpected result: there was no positive association between higher income and more amenities.These findings partially deviated from our expectations. While the descriptive results regarding the distribution of amenities aligned with expectations—for example, Brooklyn’s prominence as a hub for community gardens—the Poisson analysis did not support our hypothesis that higher median income would be associated with more garden amenities. This suggests that factors other than income, such as community engagement, city planning policies, or historical investment in urban green spaces, may play a more significant role in shaping the availability of garden amenities.

Our statistical analyses returned a surprising amount of insights from the data. Firstly, the predominance of food and open lawn/communal areas may reflect the priorities of NYC’s community gardens, focusing on food security and spaces for community interaction. Secondly, the disparities in garden amenities by borough highlight differences in access and investment. Staten Island’s lower amenity count may suggest a need for targeted interventions to enhance green space utility and accessibility.The lack of a significant association between income and garden amenities challenges common assumptions about the socioeconomic distribution of green spaces. This finding raises important questions about equity in urban green space distribution and warrants further investigation into other factors, such as community advocacy or land availability, that influence garden development. Future analyses could include more advanced models with more predictor variables (such as racial distribution and population density) to determine factors behind public access to green spaces and their amenities.

The multinomial regression analysis on garden ponds in NYC reveals that the GreenThumb program significantly reduces the odds of having garden ponds, indicating that it may redirect gardening efforts towards other amenities. Median income does not significantly impact garden pond presence, suggesting that factors beyond socio-economic status—such as cultural preferences or local ecological conditions—are more influential. The higher odds of garden ponds in Manhattan and Brooklyn highlight geographical variability in urban gardening practices, reflecting different community values and ecological conditions. These findings have important implications for urban policy, emphasizing the need to tailor green infrastructure strategies to local contexts and preferences to effectively enhance urban green spaces and biodiversity.

Eco-Friendly Practices

In regards to Eco-friendly practices, our analysis revealed significant findings regarding the association between borough and the adoption of eco-friendly practices in gardens across New York City. The Chi-Square test of independence indicated a strong association (X² = 153.11, p-value < 2.2e-16), suggesting that the likelihood of engaging in eco-friendly practices varies significantly across different boroughs. We had to exclude aquaponics and Staten island from the logistic regression models as Staten Island only had three gardens, not enough gardens in New York City engaged in aquaponics. The logistic regression analyses showed that gardens in Brooklyn were significantly more likely to implement rain harvesting practices, with an odds ratio of 2.10 (p = 0.04). In contrast, gardens in Manhattan exhibited a noteworthy preference for composting, as indicated by an odds ratio of 5.30 (p = 0.00). However, while there was an association of solar panel usage, the odds ratio of 0.667 did not achieve statistical significance (p > 0.05). These findings align with our initial hypothesis, which anticipated a strong association between boroughs and the prevalence of eco-friendly practices. The higher likelihood of rain harvesting in Brooklyn and composting in Manhattan suggests that specific environmental initiatives may be influenced by borough characteristics, including community engagement, availability of resources, and local policies promoting sustainability.

The insights reflect the need to address geographical disparities in eco-friendly practices within urban gardening. The significant variance among boroughs highlights opportunities for targeted educational and community-driven initiatives designed to promote green practices, particularly in boroughs where engagement may be lower. Moreover, understanding the unique characteristics of each borough can inform local government and nonprofit organizations in developing tailored strategies that encourage more widespread adoption of sustainable practices, ultimately contributing to a greener New York City.

Additionally, our exploration of the gardens’ data not only amplifies awareness of existing green spaces but also emphasizes the potential for these areas to serve as catalysts for environmental change. By documenting and promoting eco-friendly practices in gardens, we can empower NYC residents to enhance their engagement with these valuable resources, leading to broader community benefits for both the environment and public health. In conclusion, our findings suggest that while certain boroughs lead in specific eco-friendly practices, there remains room for growth in promoting sustainability across all regions of NYC. Future studies may further explore the factors influencing these practices and examine how targeted outreach can enhance eco-friendly engagement citywide.

Flora and Fauna

The multi-series histogram reflecting the distribution of plants and animals in NYC gardens by borough highlighted Brooklyn as the leading borough. There were more gardens in Brooklyn with trees in the garden, fruit trees specifically, and street trees than in any other borough. In addition, there were more gardens with chickens in Brooklyn than in the other boroughs. Brooklyn has more community gardens than any other borough, which would offer an explanation for these numbers. Other flora and fauna features, like turtles and fish, only appeared in gardens in a few boroughs. For example, turtles only appeared in gardens in Manhattan. In addition to allowing for a comparison of how plants and animals differ by borough, this histogram provides information on how prevalent the various features are. It also informs statistical analyses about factors associated with the differences in how the features are distributed.

As reflected in the histogram, there are more trees of all kinds in Brooklyn than in any other borough. This prompts the following question: Is there a significant difference in tree presence based on borough when controlling for confounding variables? A major confounder of this presumed association is garden size, but the data set only includes information on sidewalk area. As this is the closest indicator of garden size, and is the only meaningful confounder present in the data set, it was used in the analyses. The logistic regression models use tree presence as the outcome and borough as the predictor, including sidewalk area in the models as a confounder. The presence of only significant finding (an OR of 0.202 when comparing Queens to Brooklyn) indicates that the findings from the histogram are explained by the fact that Brooklyn has more gardens than the other boroughs, and not because gardens in Brooklyn have a significantly greater frequency of trees than gardens in other boroughs. These findings are limited by the lack of information on other confounder variables that would allow for a more accurate model. Future statistical analyses could include information on boroughs from other data sources to understand how trees (and other flora/ fauna features) differ based on the borough.