RSS Feed!

Recent Posts

Recent Comments

Archives

Categories

Meta

Author Archive

Microsoft Azure

Thursday, April 2nd, 2015

http://azure.microsoft.com/en-us/services/machine-learning/

I have not tried this yet, but it looks interesting to me.  It provides cloud-based access to machine learning models using a web browser.

From the link above:

Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments. Machine Learning Studio features a library of time-saving sample experiments, R and Python packages and best-in-class algorithms from Microsoft businesses like Xbox and Bing. Azure ML also supports R and Python custom code, which can be dropped directly into your workspace. Experiments are easily shared, so others can pick up where you left off.

Esoteric Python Filename

Thursday, January 8th, 2015

I had a strange but minor problem with Python and iPython on my system.  I have a source directory for each of my projects.  Whenever I used the Terminal to launch iPython or Python from that directory I received this error message:

Traceback (most recent call last):

  File “/Applications/Canopy.app/appdata/canopy-1.4.1.1975.macosx-x86_64/Canopy.app/Contents/lib/python2.7/site.py”, line 75, in <module>

    import os

  File “/Applications/Canopy.app/appdata/canopy-1.4.1.1975.macosx-x86_64/Canopy.app/Contents/lib/python2.7/os.py”, line 49, in <module>

    import posixpath as path

  File “/Applications/Canopy.app/appdata/canopy-1.4.1.1975.macosx-x86_64/Canopy.app/Contents/lib/python2.7/posixpath.py”, line 17, in <module>

    import warnings

  File “warnings.py”, line 9, in <module>

    from osi import *

  File “osi.py”, line 7, in <module>

    import pandas as pd

ImportError: No module named pandas

This confused me, as I didn’t use a profile or startup script.  I designed a workaround where I launched iPython from another directory and did a cd from within iPython, but this was unsatisfying.  Today I decided to actually look at the error message in detail.  It turns out that I had a local script named “warnings.py” as part of my project which was clobbering the warnings module imported by the os module.  I renamed my local script and that fixed my problem.  I know it’s a bad idea to overload standard module names, I didn’t realize I was doing so.

Google BigQuery APIs

Tuesday, September 30th, 2014

As part of my consulting assignment I need to download geopolitical event data from the GDELT Project. This data is available as a public dataset on Google’s BigQuery. There are APIs for BigQuery in Python and R. Google provides a tutorial for BigQuery using Python or Java. Hadley Wickham has written an R API that is available from GitHub (it’s not on CRAN yet).

It is very helpful to be able to access these data sources from R or Python programmatically. One little pitfall I encountered using the Python API and OAuth2.0: The authentication request returned an error telling me that I needed a project name. Since that wasn’t part of the client_secret.json I was confused. It turns out that one has to set the project name on the Consent screen portion of the Google Developers Console. After doing that I was able to obtain a credential for my application.

ggplot2 Color Consistency

Tuesday, February 11th, 2014

I use the ggplot2 package in R for graphics in the analysis I perform for my clients.  In my current project I am analyzing the performance of 3 systems as compared to a base rate model.  Some of the charts include the base rate and a ground truth series, some exclude the base rate and some exclude the ground truth.  By the nature of the alphabetical ordering of the names for the systems, base rate, and ground truth, the color for a given system may be different on different charts.  It’s a minor thing, but consistency is better than inconsistency.  The answer is to use the scale_colour_manual and scale_fill_manual commands in ggplot2 to set the colors.  I have several lists of color settings, one for each combination that I wish to plot, wherein the color for a given system is consistent across settings.  Then, when I want to use the color settings I add a statement like “+ scale_colour_manual(values=’allSystemColors’)” to my ggplot command.

I found the page <http://sape.inf.usi.ch/quick-reference/ggplot2/colour> to be a handy reference to visually link the name of the color to the color itself.