Yelp’s research questions

This is an idea page for master research. The research work on the recently released Yelp Dataset Challenge with the analysis of a real-world “big data”.

Data description

Research question

  1. Descriptive statistics: visualization to help explore this data set and reveal interesting patterns.
  2. Predict potential ratings (recommendation problem)
  3. How reliable the reviews are? Identify quirky reviewers?
  4. Relation between the star rating of a business and its location?
  5. Guess a review’s rating from its text alone? –> Consider good/bad/true/false reviews?
  6. Take all of the reviews of a business and predict when it will be most busy?
  7. What makes a review useful, funny, or cool?
  8. Which business a user is likely to review next?
  9. How much of a business’s success is really just location?
  10. Classifying Yelp reviews into relevant categories

Research Flow

  1. Import to database (create data-schema)

    • PostgreSQL
    • MongoDB
  2. Visualization

    • Users number of reviews distribution (by star rating)
    • Restaurants number of reviews distribution (by star rating)
    • Restaurants number of users reviewed distribution
    • Locations + tags + features map
  3. How reliable reviews are

    • User rating difference with average rating for each restaurant
      • At each case, review topic modeling for more analytics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s