Garmin TCX to KML: the Prelude (splitting my huge exported exercise file)

February 19, 2008

TCX is the Garmin proprietary file format that logs exercise information, here is a snippet:

<Activity Sport="Running">
<Id>2008-01-26T18:29:26Z</Id>
<Lap StartTime="2008-01-26T18:29:26Z">
<TotalTimeSeconds>6049.690000</TotalTimeSeconds>
<DistanceMeters>10347.431641</DistanceMeters>
<MaximumSpeed>6.847500</MaximumSpeed>
<Calories>1386</Calories>
<AverageHeartRateBpm xsi:type="HeartRateInBeatsPerMinute_t">
<Value>121</Value>
</AverageHeartRateBpm>
<MaximumHeartRateBpm xsi:type="HeartRateInBeatsPerMinute_t">
<Value>165</Value>
</MaximumHeartRateBpm>
<Intensity>Active</Intensity>
<Cadence>0</Cadence>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2008-01-26T18:29:27Z</Time>
<Position>
<LatitudeDegrees>47.297868</LatitudeDegrees>
<LongitudeDegrees>-121.287557</LongitudeDegrees>
</Position>
<AltitudeMeters>757.656250</AltitudeMeters>
<DistanceMeters>0.000000</DistanceMeters>
<HeartRateBpm xsi:type="HeartRateInBeatsPerMinute_t">
<Value>75</Value>
</HeartRateBpm>
<SensorState>Absent</SensorState>
</Trackpoint>
<Trackpoint>
...
</Track>
</Lap>
<Creator xsi:type="Device_t">
<Name>Forerunner305</Name>
<UnitId>3322440126</UnitId>
<ProductID>484</ProductID>
<Version>
<VersionMajor>2</VersionMajor>
<VersionMinor>40</VersionMinor>
<BuildMajor>0</BuildMajor>
<BuildMinor>0</BuildMinor>
</Version>
</Creator>
</Activity>

KML is Google file format to display geodata, here is a snippet of a path that is overlaid on a map:

<?xml version="1.0" encoding="UTF-8"?>

<kml xmlns="http://earth.google.com/kml/2.2">

  <Document>

    <name>Paths</name>

    <description>Examples of paths. Note that the tessellate tag is by default

      set to 0. If you want to create tessellated lines, they must be authored

      (or edited) directly in KML.</description>

    <Style id="yellowLineGreenPoly">

      <LineStyle>

        <color>7f00ffff</color>

        <width>4</width>

      </LineStyle>

      <PolyStyle>

        <color>7f00ff00</color>

      </PolyStyle>

    </Style>

    <Placemark>

      <name>Absolute Extruded</name>

      <description>Transparent green wall with yellow outlines</description>

      <styleUrl>#yellowLineGreenPoly</styleUrl>

      <LineString>

        <extrude>1</extrude>

        <tessellate>1</tessellate>

        <altitudeMode>absolute</altitudeMode>

        <coordinates> -112.2550785337791,36.07954952145647,2357

          -112.2549277039738,36.08117083492122,2357

          -112.2552505069063,36.08260761307279,2357

          -112.2564540158376,36.08395660588506,2357

          -112.2580238976449,36.08511401044813,2357

          -112.2595218489022,36.08584355239394,2357

          -112.2608216347552,36.08612634548589,2357

          -112.262073428656,36.08626019085147,2357

          -112.2633204928495,36.08621519860091,2357

          -112.2644963846444,36.08627897945274,2357

          -112.2656969554589,36.08649599090644,2357

        </coordinates>

      </LineString>

    </Placemark>

  </Document>

</kml>

In order to display geodata, I need to convert the geo location specific part of TCX to KML. Fortunately, this guy had run into this issue before, and provided some XSLT to do the job here: http://www.oe-files.de/ge/tcx2kml.xsl. Thanks, Jorn, and sorry about the missing umlaut on your name, my codepage foo is not what it should be.

Unfortunately, when I export data from my mac based Garmin Training Center, I get over a years worth of information — there is no way in this program to export a day, a week, or a month. So my first task is to break out this huge a** file into digestible chunks. I’m opting for breaking out by activity right now, maybe later I can break out by time.

I thought about the quickest way to do this, after all I’m not in the mood to do anything laborious after putting the kids to bed. I’ve written SAX parsers before, and I’m way too lazy to keep around a bunch of state I need to refer to whenever I get a ‘tag encountered’ event. Plus, I had a sneaking suspicion that sed or something sed-like would do the job utilizing regex. One of my mentors used to tell me ‘Arun, you think you’re really smart and you go around inventing all of these rounder wheels. Why dont you just take the time to read a couple of man pages?’ He went on to say that those man pages were written by much smarter people than he or I, which really used to piss me off 🙂

Turns out csplit does an admirable job of splitting out files based on context that matches a specific regex. There are a couple of ‘gotchas’.

(1) put your regex in quotes, otherwise it will be interpreted by the command shell. This _really_ sucks when using xml tag syntax in your regex, i.e. /<Activity Sport=.*>/ gets interpreted as a set of pipe symbols with arbitrary characters between in.

(2) csplit can execute at max 100 times, it creates files in xx00 – xx99 format by default. You can change the numbering scheme, but not the limit. For any XML file with > 100 sections of extractable XML, this poses a problem.

(3) if you don’t specify -k (keep written files on error), and you have < 100 files written out, all files written for that run will get erased.

My version of csplit that split out the chunks:
csplit -k -f act exercise.out.tcx '//' {100}

This seems like a great time to actually write some code (as opposed to writing a SAX parser) — I need to drive csplit until there are no more <Activity> tags to individually extract. Ruby has become my script of choice lately, primarily because I can maintain it over time, also because of irb, the Ruby commandline shell, which allows me to ‘test drive’ commands I want to eventually put into a shell.

csplit writes out the number of bytes in each created file to stdout, which we can take advantage of:

ret = `csplit -f act input.tcx '/<Activity Sport=.*>/' {100}`
puts a newline delimited set of byte values of output files, all starting with ‘act’ and ranging from 00 to 99.

vals = ret.split

if(vals.length == 100)

allows us to see if we have more work to do, i.e. 99 files have been created. We take the last file, act99, copy it to a new directory to start over, and repeat until vals.length < 100:


while(continue == true)

# run csplit here.
puts "splitting files by <Activity> tag in #{newdir}..."
ret = `csplit -k -f act #{input_file} '/<Activity Sport=.*>/' '{100}'`
vals = ret.split
if(vals.length == 100)

count+=1
newdir = "../#{gen_new_dir(count)}"
puts("creating #{newdir}/#{input_file}")
Dir.mkdir("#{newdir}") if(File.exists?(newdir) == false)
`cp act99 #{newdir}/#{input_file}`
Dir.chdir("#{newdir}")

else

continue = false

end

end

What is left: take these files and see if the XSLT code above works with them or pukes — these are not standard TCX files anymore, so I’m not expecting much love. Also, extracting KML is only one part of what I want to do with these files — showing heart rate vs distance vs altitude, etc is also something that isn’t super well done in the existing freeware.


Why url mapping sucks in Java Servlet land, and what I did about it.

February 13, 2008

This is more of ‘notes to self’ (like anyone else actually reads this!) than anything else. I bounce around so much at my current job that I forget everything and have to figure it out again. One such example: Servlets. I’ve only written servlets when absolutely necessary, i.e. when I’ve had to prototype a service and didn’t really care about what paths were coming in, how the web app was deployed, etc. So I always go through a bit of a learning curve when working with Servlets, because I usually have forgotten everything I know about them.

I am working on converting a set of services that offer POX over HTTP (see this example)into something more RESTful. I’ll spare you the RESTafarian evangelism and just say that my life has become much easier once I started thinking of infinite resources constrained by a (very) finite set of verbs. As the number of brain cells I kill increases, I have had to put my remaining ones to work figuring out how to be as effective as I was back in the day, when I had brainpower to spare.

As part of that assignment, we have had to think about combining separate services into a single, meaningful, easy to grasp API. I will say that thinking in resources helps here, because it’s easy to have a resource Foo that has sub resources Bar, Star, and Var, and request those resources as /foo/bar, etc. But mapping that elegant and simple layer to a sub strata of what are basically RPC calls has taken some thought.

One thing we decided to do was to access all services that are currently residing in separate WARs into one web app. The original goal was to have this web app be a very simple shell, and let web.xml route messages to specific services. All was good, less code was to be written, and we were supposed to live happily ever after….except in order to map objects to messages, we would end up routing requests from path foo/bar to servlet X and requests from /foo/bar/something to servlet Y. This is because unlike the happy world of self contained RESTful resources, our services actually provide different kinds of functionality for the same resources. But we really want to fake ‘resourceful’ ness.

The thing about web.xml servlet mapping is that it is limited to heirarchical path mapping, i.e.

map /foo/bar/star/* to servlet x

map /foo/glar/* to servlet y

map *.bat to servlet z

you cannot take /foo/bar/star/mar and map it to servlet z if you’ve already mapped /foo/bar/star/* to servlet x. So you can’t mix and match path heirarchies to servlets.

The solution, after not much time spent browsing the Servlet spec (good read, btw) is to use the built in RequestDispatcher object created from the ServletContext. In web.xml, we mapped all of our service servlets to private paths that would never be called from the clientAPI:


<servlet-mapping>
<servlet-name>ServiceX</servlet-name>
<url-pattern>/service_x/*</url-pattern>
</servlet-mapping>

<servlet-mapping>
<servlet-name>ServiceY</servlet-name>
<url-pattern>/service_y/*</url-pattern>
</servlet-mapping>

<servlet-mapping>
<servlet-name>ServiceZ</servlet-name>
<url-pattern>/service_z/*</url-pattern>
</servlet-mapping>

<servlet-mapping>
<servlet-name>Default</servlet-name>
<url-pattern>/</url-pattern>
</servlet-mapping>

Note that I’ve got a Default servlet catching all requests, because the paths above aren’t exposed in documentation (even if they are hit, they resolve to no ops). In the Default servlet init method, we created request dispatchers for all servlets that we had specified:


_servletXDispatcher = this.getServletContext().getRequestDispatcher("/service_x/*");
_serviceYDispatcher = this.getServletContext().getRequestDispatcher("/service_y/*");
_serviceZDispatcher = this.getServletContext().getRequestDispatcher("/service_z/*");

Note that in order to get valid request dispatchers, we had to specify the servlet mappings as specified in web.xml

Now we have RequestDispatchers, which can forward requests on to servlets:
_servletXDispatcher.include(httpRequest,httpResponse);

However I still needed a way to map partial paths to different request dispatchers. I ended up creating an ObjectMatcher class that regex matched incoming strings to specified objects:

public class ObjectMatcher {

Map _patternsMatchServlets;

public ObjectMatcher() {
_patternsMatchServlets = new HashMap();
}

public void load(Map servletMap) {

Set keys = servletMap.keySet();

for(String key : keys) {

_patternsMatchServlets.put(Pattern.compile(key),servletMap.get(key));
}
}

public T match(String uriPattern) {
T servlet = null;
boolean matches = false;
Set patterns = _patternsMatchServlets.keySet();

for(Pattern pattern : patterns) {

Matcher match = pattern.matcher(uriPattern);

matches = match.find();
if(matches == true) {
servlet = _patternsMatchServlets.get(pattern);
break;
}

}

return servlet;
}
}

The load method in this object takes a map of regex values to objects. It then compiles the regex values into pattern objects. The match method uses those regexes to match against inbound strings, and returns the appropriate object, or null if an object isn’t found.

I loaded this object with a map of regex values to objects as follows:

Map rdMap = new HashMap();

// TODO: put all new paths for client API here.
rdMap.put(".*/entities.*", _queryReqDispatcher);
rdMap.put(".*/media.*", _queryReqDispatcher);
rdMap.put(".*/actions.*", _queryReqDispatcher);
rdMap.put("person/.*", _entityReqDispatcher);
rdMap.put("popular/*", _zgReqDispatcher);
rdMap.put("media/.*", _zgReqDispatcher);

_matcher.load(rdMap);
and called it from my default servlet doGet method like this:

public void doGet( HttpServletRequest request, HttpServletResponse response )
throws ServletException, IOException {
dispatch(request,response);

}
to get (fairly) pain free routing in a central location.


Is BDD the new TDD? Adventures with RSpec

February 5, 2008

At Evri, I have the privilege of working with people who make it their business to write software in the most productively lazy way possible, by that I mean they strenuously avoid making rounder wheels. So when I see one of them start to use a new technology, I can only conclude that the technology must be making their (coding) life easier.

The first time I heard about rspec was when Phil Hagelberg had a practice run of his RailsConf presentation ‘tightening the feedback loop’ at one of our brown bags. He mentioned rspec along with rcov and flog. While rcov and flog struck me as having immediate value, I wasn’t so convinced about rspec. After all, isnt that what TestUnit is for?

At the time I was head deep in some Java code and couldn’t quite get to trying out rspec. When I surfaced, I felt very complicated and was happy to dive back into Ruby. However I had still forgotten about rspec and was still doing ‘old school TDD’ until I noticed that Travis, a notoriously ‘lazy bastard’, had completely switched over to rspec.

So I tried it, still skeptical. The whole ‘BDD vs TDD’ thing still confuses me, it’s like two people arguing whether chartreuse is really yellow or green. The whole point is to specify your expectations first, right?

My skepticism quickly faded as I started to use rspec. The best thing I can say about rspec is that it makes writing tests first so much easier. I believe it’s due to the DSL. Using rspec let me focus on what I wanted my class to do in a way that felt much more natural than writing tests for specific failure conditions. Instead of saying

class FooTest < Test::Unit::TestCase

def test_valid_foo_returned()

class_under_test = classUnderTest.new

foo = class_under_test.method1

assert(foo != nil)

assert_equal(foo.class.to_s,”Foo”)

end

end

I would instead say

describe classUnderTest do

it ‘should return a valid object Foo from method1 ‘ do

class_under_test = classUnderTest.new

foo = class_under_test.method1

foo.should_not eql(nil)

foo.class.to_s.should_eql(“Foo”)

end

end

I think a lot of people would look at the two code snippets above and think ‘chartreuse’. I know that’s what I was thinking. So what is the big deal?

First: the DSL lets me express my expectations about how the class under test behaves, using should and should_not. What I found is that tests tend to write themselves, and then Red/Green testing enables me to write the smallest amount of code to get past each line. In the example above, the description ‘it should return a valid object foo from method1’ allows me to stay clear about what I’m expecting from method1.

Compare that to the standard Test::Unit approach. The test that I wrote above does the same thing as the rspec code, but it doesn’t reinforce the fact that I’m testing a specific class and expecting specific behaviors. It may validate those behaviors, but I still have to go through a layer of translation, figuring out what each assertion really means, in order to understand the test.

That extra layer of translation makes Test::Unit start to feel heavier and slower because I’m still translating what I want to test into a test method, instead of having a method help me outline desired behavior and expectations. The extra layer is more energy I have to expend to use and maintain the test — energy that I could be using to write code, energy that I will not want to expend when I’m under deadline pressure.
I’m still playing around with rspec — I’m a newbie, and am still getting used to the way other users avoid fixtures, how and when to use mocks, stubs, and which is which, but so far I think it has gone a long ways toward keeping my coding restricted to fulfilling expectations and nothing more.

So life is easier with little effort — this is something that makes me extremely happy. Again, I don’t know whether to call it BDD, or TDD, or Fred, but this approach is working for me and I don’t care to debate the nuances. That said, I will continue to educate myself about the nuances and hope that some kind of enlightenment occurs 🙂

I will continue to explore rspec and other tools that make coding/maintenance easier. Specifically, I’m curious about:

  1. whether a story is an analog of a Test::Unit::TestSuite
  2. how matchers work — when I need them, etc.
  3. when to use a mock — when do I know that a real object is too painful/expensive? I’m not sure it makes sense to mock the model layer b/c I get implicit model layer testing when I use it, and if the model layer changes, my tests will (appropriately) break.