Battle of Neighborhoods — New York City vs Toronto
Introduction
New York City and Toronto are financial capitals of their countries and throughout the years, their neighborhoods have evolved and transformed, offering a wide variety of venues to their inhabitants. The offer of venues is might be taken into consideration when people decide which neighborhood to live in. Being major cities with a lot to offer, with many job opportunities, a many number of people from both cities might consider a shift from one to the other, either temporarily or definitely. As one plans the move, wouldn’t it be advantageous to get a feel of the similarities and differences between neighborhoods from one city to other? Well, this eaxctly what the analysis that will follow wants to offer. Using Python and the Foursquare API, I hope to give you an idea of what to expect in terms of similiarities between both ciites by exploring their neighborhoods and their venues.
What data will be used?
Two datasets collected from web will be used for this exercise.
The New York City dataset was obtained from a JSON file available at https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json with columns representing the boroughs, neighborhoods, and the neighborhoods geographical location (latitude and longitude). The dataset was restricted to Manhattan neighborhoods for simplicity and a total of 40 neighborhoods populated the dataset.
The Toronto dataset was scrapped from a Wikipedia table available at https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M using the BeautifulSoup package in Pyhton. With the Postal Code data, I was able to merge the dataset with a geographical dataset with latitude and longitude data available at https://cocl.us/Geospatial_data. The dataset was restricted to boroughs with Toronto in the description (more central neighborhoods) and a total of 39 neighborhoods were obtained.
Both datasets were than merged and here is a print screen of the final dataset of Manhattan and central Toronto neighborhoods, a total of 79 neighborhoods:
With the Folium package in Pyhton, I also created a map depicting the geographical locations of each neighborhood. The image on the left shows Manhattan and the image on the right displays Toronto neighborhoods.
Using the Foursquare API to explore the venues in each neighborhood.
The top 100 venues in each neighborhood, within a radius of 500 meters of their geographical reference, were obtained using the Foursquare API and added to a new dataframe. A total of 4826 rows were created, each corresponding to a venue, with a total of 369 unique venue categories. With a little more data wrangling, I created a new dataframe listing the top 5 venue categories in each neighborhood, as you can see below.
After calculating the proportion of venue categories in each nighborhood, I employed an unsupervised machine learning algorithm to cluster neighborhoods based on their venue category composition. I used the k-means clustering.
Before running the algorithm, the ‘elbow method’ (also known as WCSS — within-cluster sums of squares) was used to select the number of clusters, k.
From the elbow method (on the left) we can see that it is not clear what is the best k. By running different values of k, I decided to use a k=7 which seemed the most informative for the purpose of this analysis.
After tunning the k-means clustering algorithm and associating each neighborhood to its cluster, I created a new map of each city (Manhattan on the left and Toronto on the right) with each neighborhood colored with respective cluster color:
- Cluster 0= red; Cluster 1= purple; Cluster 2= dark blue; Cluster 3= light blue; Cluster 4= baby blue, Cluster 5= olive green; Cluster 6= orange.
Results
From the maps created above, it is possible to see that only 2 clusters overlap with the two cities. These are Cluster 1 and Cluster 5. The other clusters only comprehend neighborhoods in Toronto. Moreover, these Toronto-only clusters are only composed of one (4 clusters) or two (1 cluster) and are mostly at the peripheries of central Toronto.
Looking closer at Clusters 1 and 5, we can see that Cluster 1 is mostly comprised of Toronto neighborhoods (28) and a few from Manhattan (6), and Cluster 5 has mostly neighborhoods from Manhattan (34) and a few from Toronto (5).
Moreover, by analyzing which venue categories are the 1st most common in those neighborhoods (on the left), w see that in Cluster 1, the most common venues are by far Coffee Shops, which are 1st in 20 neighborhoods, followed by Parks (3 neighborhoods) and Cafés (3 neighborhoods). In Cluster 5, the most common venues are Italian Restaurants, which are 1st in 8 neighborhoods, followed by Coffee Shops (4 neighborhoods), Cafés (3) and Bars (3). Moreover, Cluster 5 has much more diversity of venues that come in 1st place than Cluster 1.
Discussion
Most people living in Manhattan that are looking to move into central Toronto, would feel mostly ‘at home´ by choosing most peripheric neighborhoods rather than most central ones. The Toronto neighborhoods most similar to Manhattan ones are: The Danforth West & Riverdale, India Bazaar & The Beaches West, High Park & The Junction South, Davisville and Business reply mail Processing Centre & South Central Letter Processing Plant in Toronto.
On the other hand, most people from Toronto, especially those living in more central neighborhoods, would find the following Manhattan neighborhoods more familiar to their former town: Marble Hill, Chelsea, Morningside Heights, Battery Park City, Financial District and Stuyvesant Town.
Toronto also has a few neighborhoods at its peripheries that are rather unique, since they clustered alone, and could be interesting for those wanting to move to a different neighborhood, whether they are coming from Manhattan or from central Toronto. Some of the most common venues in those different neighborhoods are Parks, Trails, Yoga Studios and Escape Rooms, which would be more appealing to particular types of people.
Conclusion
New York City and Toronto are two amazing cities, with a huge offer of venues and neighborhoods with unique characteristics. By exploring the venues available in each of their neighborhoods it became clear that both cities are more similar within themselves than between themselves, which was expected, but there is some overlap. People moving between those places can choose if they want to stay in a neighborhood with a more familiar feeling or to one rather different or not similar at all. By using Data Science tools and harnessing the capabilities of the Foursquare API it is possible for anyone (with some patience and will to learn) to make comparisons between different places, explore what they have to offer and make informed decisions before going on holidays or moving into a new city.