RSS Feed!

Recent Posts

Recent Comments




Archive for the ‘Uncategorized’ Category

Microsoft Azure

Thursday, April 2nd, 2015

I have not tried this yet, but it looks interesting to me.  It provides cloud-based access to machine learning models using a web browser.

From the link above:

Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments. Machine Learning Studio features a library of time-saving sample experiments, R and Python packages and best-in-class algorithms from Microsoft businesses like Xbox and Bing. Azure ML also supports R and Python custom code, which can be dropped directly into your workspace. Experiments are easily shared, so others can pick up where you left off.

Google BigQuery APIs

Tuesday, September 30th, 2014

As part of my consulting assignment I need to download geopolitical event data from the GDELT Project. This data is available as a public dataset on Google’s BigQuery. There are APIs for BigQuery in Python and R. Google provides a tutorial for BigQuery using Python or Java. Hadley Wickham has written an R API that is available from GitHub (it’s not on CRAN yet).

It is very helpful to be able to access these data sources from R or Python programmatically. One little pitfall I encountered using the Python API and OAuth2.0: The authentication request returned an error telling me that I needed a project name. Since that wasn’t part of the client_secret.json I was confused. It turns out that one has to set the project name on the Consent screen portion of the Google Developers Console. After doing that I was able to obtain a credential for my application.

ggplot2 Color Consistency

Tuesday, February 11th, 2014

I use the ggplot2 package in R for graphics in the analysis I perform for my clients.  In my current project I am analyzing the performance of 3 systems as compared to a base rate model.  Some of the charts include the base rate and a ground truth series, some exclude the base rate and some exclude the ground truth.  By the nature of the alphabetical ordering of the names for the systems, base rate, and ground truth, the color for a given system may be different on different charts.  It’s a minor thing, but consistency is better than inconsistency.  The answer is to use the scale_colour_manual and scale_fill_manual commands in ggplot2 to set the colors.  I have several lists of color settings, one for each combination that I wish to plot, wherein the color for a given system is consistent across settings.  Then, when I want to use the color settings I add a statement like “+ scale_colour_manual(values=’allSystemColors’)” to my ggplot command.

I found the page <> to be a handy reference to visually link the name of the color to the color itself.