From 0f0ff3d16ee9ebfaac26e8e899ae160ffe4cb855 Mon Sep 17 00:00:00 2001 From: dakkar Date: Fri, 12 Apr 2013 18:46:08 +0100 Subject: oyster docs --- src/SW/oyster/.gitignore | 2 + src/SW/oyster/document.en.rest.txt | 107 +++++++++++++++++++++++++++++++++++++ src/SW/oyster/du2html.xsl | 1 + 3 files changed, 110 insertions(+) create mode 100644 src/SW/oyster/.gitignore create mode 100644 src/SW/oyster/document.en.rest.txt create mode 120000 src/SW/oyster/du2html.xsl (limited to 'src/SW/oyster') diff --git a/src/SW/oyster/.gitignore b/src/SW/oyster/.gitignore new file mode 100644 index 0000000..08d33d4 --- /dev/null +++ b/src/SW/oyster/.gitignore @@ -0,0 +1,2 @@ +/document.en.du.xml +/document.it.du.xml diff --git a/src/SW/oyster/document.en.rest.txt b/src/SW/oyster/document.en.rest.txt new file mode 100644 index 0000000..99698ec --- /dev/null +++ b/src/SW/oyster/document.en.rest.txt @@ -0,0 +1,107 @@ +=========================== + Logging your Oyster trips +=========================== +:CreationDate: 2013-04-12 16:53:02 +:Id: SW/oyster +:tags: - perl + - software + +Since I moved to London at the start of 2010, I've been using a Oyster +card to pay for my journeys on buses and Underground trains. I've +bought annual travelcards, because they free me from having to think +about "topping up", and also because they seemed to be cost-effective +with respect to the "pay as you go" option. + +Then, about a year ago, I noticed that I could get a list of my recent +trips from the TfL website. It was thus necessary that I wrote a +program to save that data on a machine under my control. So I did. + +Storing the journeys +-------------------- + +Let's look at the `source for the |oyster| program`_. + +It uses a SQLite database to store both the actual journey data and +the credentials for the website account, so I don't have to write them +on the command line, and I don't need a separate configuration +file. The database, by default, is called ``oyster.db`` and resides in +the same directory as the script (line 16). Command-line options +(lines 18-31) allow to specify a different database, and to set the +credentials on the first run (lines 76-83). + +The check for the presence of the database (lines 39-51) is not +particularly smart: try to read data, and if it fails, assume the db +is new. The database schema (lines 54-75) is also very +straightforward: a table for the credentials, and a table for the +journeys. The columns for the latter come from the CSV that TfL +provides. + +To get to the CSV, the script needs to pretend to be a person browsing +the website, so we use ``WWW::Mechanize`` and log into the site (lines +85-103). After reaching the "Journey History" page (line 105), we +should click a link that runs some JavaScript, which is hard to do +without an actual browser. Instead, we extract the URL from the +JavaScript that would be run (lines 107-113), and download it directly +(lines 116-118). + +Now we only have to parse the CSV: we use ``Text::CSV_XS``, and +``DateTime::Format::Strptime`` for the times (lines 120-129). Looping +over all the lines, we convert the times into ``DateTime`` object, +adjusting for a few minor issues: bus journeys don't have end times, +since you don't "tap out" of buses (line 139); also, if you tap in +before midnight, but tap out after, the CSV still shows a single date, +but the end time will appear to be about 23 hours before the start +time (line 144-146). + +Inserting into the database is trivial (lines 149-158). Since we set a +``unique on conflict replace`` constraint on the ``journeys`` table, +we don't have to worry about inserting twice the same trip: SQLite +will take care of it. + +Using the data +-------------- + +As interesting as it is to just look back at where I've been, I found +a better use for all those database records: I wanted to check how +much I would have spent if I had not been using the annual travelcard. + +Calculating that is not exactly easy: the rules are convoluted, not +obvious to find, and probably not completely documented. Also, I may +have interpreted them wrong. Anyway, I wrote `another program`_ to +read the database and give me the total cost. + +Of course it uses the same defaults and command line options as the +other one (lines 130-148). + +The main execution flow loops over all the journeys (lines 174-200), +grouping them by day; for each day, we store the journey and some +aggregate information: the minimum and maximum zones travelled, if +there were any tube journeys, and if there were any journeys during +peak time. Parsing the "journey description" is annoying but simple +(lines 185-192); extracting the day of a journey is trickier (lines +15-24), since TfL considers a day to start at 4:30am. Similarly, a +journey is during peak time (lines 26-35) if it starts between 4:30am +and 9:30am. I could find no mention of an evening peak time on TfL's +website. Mapping a station to its zone should be easy, but TfL does +not provide a nice text file with this information, so I had to write +it by hand (lines 37-118) taking the data from Wikipedia; the program +currently does not account for those stations that, being on the +border between two zones, count differently depending on which side +you're arriving from. + +After that first loop, we compute the cost of each day (lines +257-287): we add the cost of each journey in the day, using the bus +fare or the tube fare (peak or non-peak) for the zones travelled +(fares on lines 202-254 taken for the website). If the sum for a day +is above the daily cap for the set of journeys (only bus, or non-peak +for the zone range travelled in the day, or peak &c), we cap +it. Adding all the daily costs gives us the total. + +I'll have to get a couple more months of data, since I moved from zone +3 to zone 5 last June, but it looks like that, for the journeys I +make, the annual travelcard is about 15% more expensive than the "pay +as you go" option. Either that, or I got my calculations wrong… + +.. |oyster| replace:: ``oyster`` +.. _`source for the |oyster| program`: http://www.thenautilus.net/cgit/oyster/tree/oyster +.. _`another program`: http://www.thenautilus.net/cgit/oyster/tree/calc-price diff --git a/src/SW/oyster/du2html.xsl b/src/SW/oyster/du2html.xsl new file mode 120000 index 0000000..e2487e0 --- /dev/null +++ b/src/SW/oyster/du2html.xsl @@ -0,0 +1 @@ +../../../templates/du2html.xsl \ No newline at end of file -- cgit v1.2.3