src/SW/oyster/document.en.rest.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

===========================
 Logging your Oyster trips
===========================
:CreationDate: 2013-04-12 16:53:02
:Id: SW/oyster
:tags: - perl
       - software
 
Since I moved to London at the start of 2010, I've been using a Oyster
card to pay for my journeys on buses and Underground trains. I've
bought annual travelcards, because they free me from having to think
about "topping up", and also because they seemed to be cost-effective
with respect to the "pay as you go" option.
 
Then, about a year ago, I noticed that I could get a list of my recent
trips from the TfL website. It was thus necessary that I wrote a
program to save that data on a machine under my control. So I did.
 
Storing the journeys
--------------------
 
Let's look at the `source for the |oyster| program`_.
 
It uses a SQLite database to store both the actual journey data and
the credentials for the website account, so I don't have to write them
on the command line, and I don't need a separate configuration
file. The database, by default, is called ``oyster.db`` and resides in
the same directory as the script (line 16). Command-line options
(lines 18-31) allow to specify a different database, and to set the
credentials on the first run (lines 76-83).
 
The check for the presence of the database (lines 39-51) is not
particularly smart: try to read data, and if it fails, assume the db
is new. The database schema (lines 54-75) is also very
straightforward: a table for the credentials, and a table for the
journeys. The columns for the latter come from the CSV that TfL
provides.
 
To get to the CSV, the script needs to pretend to be a person browsing
the website, so we use ``WWW::Mechanize`` and log into the site (lines
85-103). After reaching the "Journey History" page (line 105), we
should click a link that runs some JavaScript, which is hard to do
without an actual browser. Instead, we extract the URL from the
JavaScript that would be run (lines 107-113), and download it directly
(lines 116-118).
 
Now we only have to parse the CSV: we use ``Text::CSV_XS``, and
``DateTime::Format::Strptime`` for the times (lines 120-129). Looping
over all the lines, we convert the times into ``DateTime`` object,
adjusting for a few minor issues: bus journeys don't have end times,
since you don't "tap out" of buses (line 139); also, if you tap in
before midnight, but tap out after, the CSV still shows a single date,
but the end time will appear to be about 23 hours before the start
time (line 144-146).
 
Inserting into the database is trivial (lines 149-158). Since we set a
``unique on conflict replace`` constraint on the ``journeys`` table,
we don't have to worry about inserting twice the same trip: SQLite
will take care of it.
 
Using the data
--------------
 
As interesting as it is to just look back at where I've been, I found
a better use for all those database records: I wanted to check how
much I would have spent if I had not been using the annual travelcard.
 
Calculating that is not exactly easy: the rules are convoluted, not
obvious to find, and probably not completely documented. Also, I may
have interpreted them wrong. Anyway, I wrote `another program`_ to
read the database and give me the total cost.
 
Of course it uses the same defaults and command line options as the
other one (lines 130-148).
 
The main execution flow loops over all the journeys (lines 174-200),
grouping them by day; for each day, we store the journey and some
aggregate information: the minimum and maximum zones travelled, if
there were any tube journeys, and if there were any journeys during
peak time. Parsing the "journey description" is annoying but simple
(lines 185-192); extracting the day of a journey is trickier (lines
15-24), since TfL considers a day to start at 4:30am. Similarly, a
journey is during peak time (lines 26-35) if it starts between 4:30am
and 9:30am. I could find no mention of an evening peak time on TfL's
website. Mapping a station to its zone should be easy, but TfL does
not provide a nice text file with this information, so I had to write
it by hand (lines 37-118) taking the data from Wikipedia; the program
currently does not account for those stations that, being on the
border between two zones, count differently depending on which side
you're arriving from.
 
After that first loop, we compute the cost of each day (lines
257-287): we add the cost of each journey in the day, using the bus
fare or the tube fare (peak or non-peak) for the zones travelled
(fares on lines 202-254 taken for the website). If the sum for a day
is above the daily cap for the set of journeys (only bus, or non-peak
for the zone range travelled in the day, or peak &c), we cap
it. Adding all the daily costs gives us the total.
 
I'll have to get a couple more months of data, since I moved from zone
3 to zone 5 last June, but it looks like that, for the journeys I
make, the annual travelcard is about 15% more expensive than the "pay
as you go" option. Either that, or I got my calculations wrong…
 
.. |oyster| replace:: ``oyster``
.. _`source for the |oyster| program`: https://www.thenautilus.net/cgit/oyster/tree/oyster
.. _`another program`: https://www.thenautilus.net/cgit/oyster/tree/calc-price