Tag Archives: yelp

Opinion phrases

Opinion phrases was published on September 02, 2014 and last modified onSeptember 02, 2014 by Vlad Sandulescu.

Website  and  Github: 

In Predicting what user reviews are about with LDA and gensim I played with extracting topics from short reviews and given a new review, tried to predict the most probable topic(s) it can be associated with. LDA relies on a bag-of-words model, which is a very popular document representation approach. The model disregards any syntactic dependencies between the words, i.e. any grammar, as well as word order in the documents. For a deeper read about the assumptions made by the LDA model, try to digest Blei’s paper…if you dare!

Continue reading

Predicting what user reviews are about with LDA and gensim

Predicting what user reviews are about with LDA and gensim was published onSeptember 09, 2014 and last modified on September 09, 2014 by Vlad Sandulescu.



I was rather impressed with the impressions and feedback I received for my Opinion phrases prototype – code repository here. So yesterday, I have decided to rewrite my previous post on topic prediction for short reviews using Latent Dirichlet Analysis and its implementation in gensim.
I have previously worked with topic modeling for my MSc thesis but there I used the Semilar toolkit and a looot of C# code. Having read many articles about gensim, I was itchy to actually try it out. Continue reading

8 Online Review Statistics Every Business Should Know About



There’s no denying the importance of online reviews. Consumers are using them more and more before, during, and after the buying process. From vetting a company or product to rating the customer service experience, consumers are using online reviews to publicly evaluate companies and products.

What does this mean for all of us? Simply put, we need to embrace and promote a culture that is catered to the online review world. Continue reading

Yelp new challenge dataset

Yelp has been published new challenge dataset on February 9th.

Yelp Dataset Challenge is doubling up: Added 10 cities across 4 countries! 

The Challenge Dataset:

  • 1.6M reviews and 500K tips by 366K users for 61K businesses
  • 481K business attributes, e.g., hours, parking availability, ambience.
  • Social network of 366K users for a total of 2.9M social edges.
  • Aggregated check-ins over time for each of the 61K businesses


  • U.K.: Edinburgh
  • Germany: Karlsruhe
  • Canada: Montreal and Waterloo
  • U.S.: Pittsburgh, Charlotte, Urbana-Champaign, Phoenix, Las Vegas, Madison

New challenge questions

Continue reading