Tuesday, October 09, 2012

(a)R(hhh)...that's how it works!

I have done some SQL, HTML and CSS in the past...that's it for my coding. I have always thought it was not 'if' but 'when' would again have to grapple with the viper like serpent thing that is programing.

Well I wrote that a few hours ago when I was stuck at a dead-end, when my code was going nowhere, throwing up an error Google could not help with. But you may be pleased to know things have progressed, and largely at the suggestion of Tony Hirst (@psychemedia) have persisted with with RStudio and from being quite confused have been able to generate some quite exciting results.

What is R and RStudio? R is a computer language based on C and Fortran and focused on the functionality associated with needs of researchers. RStudio is a user interface into R which, while it does not do away with coding, does simplify some of the more tedious processes.

It has been an interesting process and in no way characterises my level of understanding of the code generating these results. And here's the thing....do you need to understand it? If you can appreciate the various processes, where the data calls are made how you can refashion the process to collect a different set of data, is that enough?

On the web you will find more than snippets of code. I used this code submitted by  Gaston Sanchez, which, after a few false starts, proved able at collecting up to 1500 tweets on an ascribable subject area (Starbucks in this case), then cleaning the feed for irrelevances like 'RT', followed by an analysis of relative favourability and emotional associations, both based on a fully trained up Bayes classifier.

The fully code is not much short of a hundred lines; but is it understandable? Well the syntax is challenging, but the instructions are clear enough and once you have your head straight on RStudio, relatively straightforward to initiate. However the issue will come when it does not quite do what's wanted.  Then a lack of coding knowledge might become an issue! But the point to really push is that it has been possible to run some relatively complex process, generating useful results after only a few weeks.


















Post a Comment