Text mining 8 - concept categoriesIn our last posting we introduced the main concepts of text mining and illustrated them using a customer service example from a telecoms company handling a complaint received through Facebook.  In this posting we consider a different application of text mining: analysing a large body of unstructured text.  In this case we have taken some sets of data from the UK government’s Hansard system, which captures the proceedings of the Houses of Parliament.  The example below is based on the speeches and questions of two different members of parliament:

  • Nicholas Soames, a Conservative member (right of centre) and former Minister of Defence.
  • Dennis Skinner, a longstanding Labour member (left of centre).

The various source files were loaded into SPSS Modeler’s text mining platform.  The data was parsed using Natural Language Processing (NLP) to identify prominent concepts (see previous posting) and then some basic analysis of these concepts was carried out.

Nicholas Soames concepts

Nicholas Soames concepts

Let’s start with Nicholas Soames.  The most commonly occurring concepts identified are listed below with “country” being the most frequent.  The concept “immigration” occurred 40 times and so this was expanded further.

 

A concept map was created centred on “immigration”. This shows the strength of association between two concepts. In the case of “immigration”, the strongest concept associations are with “defence”, “society” and “social”.

 

Text mining 6 - Dennis Skinner conceptsOne of the top concepts in Dennis Skinner’s comments is “pits”.  This is a good example of where understanding context is really important.  “pits” here means deep coal mines, relevant for jobs in his constituency.  To illustrate the context issue a little more, what would you understand by “tiger woods”?  A well-known golfer, or something about a large cat in a jungle?

Text mining 8 - concept categoriesUsing a domain dictionary, concepts can be grouped into particular categories.  So doing that with Nicholas Soames’ concepts and expanding further on his military concept, there seem to be particularly strong links between the categories “human resources”, “finance” and “geographical location” so if we go back to relevant original texts, linked below, we may expect to find the cost of having people in certain locations as a prominent theme.

Considering how your business could benefit from text analytics?  If you’d like to discuss how Red Olive can help you with your text mining goals, please contact us here or by calling us on +44 1256 831100.