I wonder how accurate this is in areas where tourism is a large contribution to the local economy. You don't actually have to be much good at running a business if you've got an endless stream of new people for several months out of the year and don't need to rely on repeat business. You just rename the place and hire a new GM when the negative reviews start overwhelming you (this applies more generally than restaurants btw).
I would probably try new restaurants more frequently if I could be more sure I wasn't gonna pay $10 for a $5 burger and help buy some sleazy J1-slave-driver (owner is too nice of a word) a new Land Rover in the process.
definitely more generally than restaurants. We have a tuxedo shop in our area that marks up cheap imports you can buy for like 1/2 the price straight off Amazon. They have a going out of business sale, closing sale, etc etc, which really means they're just moving next door in the same plaza. They've been bouncing around for years, and racked up a few lawsuits at some point. Guess it works because they're still doing it.
How did he get the data? It's pretty hard to pull the reviews and the data from yelp. I tried to do that to do some querying, but their search isn't so great and they pull a lot of stunts to prevent you from scraping.
---
Oh, I see he's using the kaggle data. That's not guaranteed to be reliable.
I wrote a scraper which pulled address info / phone number / star rating / review count for pretty much every restaurant in the US.
It was "easy" because all of that data is available within the search page, and you just need to correctly parse it out.
The hardest part was getting around their really crazy rate limiting and IP blocking.
I managed to get myself IP banned from yelp prior to ever trying to scrape by just doing a bunch of searches manually pretty quickly over like 20 min, next thing I knew I could no longer access anything on Yelp.
That's not suprising. You can get yourself IP blocked just by opening things in other tabs to queue them up to read. (If you notice yourself getting random 404s .. that's when you're being watched)
FWIW, I just tried gathering some current1 user-generated Yelp Data from Internet Archive and it was very easy to gather a list of all restaurants for a city2 and then all reviews for each restaurant.
This dataset contains sparse data from businesses in different cities around the world. It has a very small overlap with the dataset used in the study. Focusing on a particular city helps to understand the underlying trends better.
As the author mentioned changes in rent are a huge factor. Did the date of closures coincide with a new lease which can range from 1 - 10 years. Seeing a distribution of the age of the restaurant when closed could show them.
The other huge factor is cost of labor. Maybe looking at the minimum wage could be another feature. The news usually has those articles about how restaurants are struggling and the incremental minimum wage increase will hurt their business. It'd be interesting to see how strong of a factor that is in restaurant closures.
Also factors that could be tough to get but important
* Cost of the ingredients like meat, vegetables etc..
* General Economic conditions, are consumers going out to eat?
It sounds like they un-anonymized the data, which strikes me as slightly unethical. (I mean it's not medical data or anything, but I don't think that was the intended use of the anonymized data.)
Further, it seems like the results of this will be used to deny loans to restaurants that are not doing so great, thus ensuring that they fail because they can't get funding for renovations and improvements.
The original dataset already contained the names, addresses and coordinates of each restaurant. Finding the restaurant ids does not reveal any additional information. It just makes it easier to reveal recent information from yelp which is available through their API anyway
I would probably try new restaurants more frequently if I could be more sure I wasn't gonna pay $10 for a $5 burger and help buy some sleazy J1-slave-driver (owner is too nice of a word) a new Land Rover in the process.