| dc.description.method | Generating Historic General Transit Feed Specification for Sydney: 1855 to 2015
Hema Rayaprolu, Bahman Lahoorpoor, and David Levinson
January 21, 2026
1. Abstract
We digitised Sydney’s historic public transport networks and constructed General Transit Feed
Specification (GTFS) datasets covering the period from 1855 to 2015. The dataset enables
systematic, reproducible analysis of the long-term evolution of public transport provision and
accessibility in Sydney using contemporary transit analytics. To validate the approach, we
compared accessibility metrics derived from the generated GTFS for 2015 with those from the
officially published GTFS for the same year. The generated data reproduced access levels within
2% of the official dataset, demonstrating the reliability of the method.
2. Methodology
For the Greater Sydney region, Transport for New South Wales (TfNSW), the state transport
authority, collates information from the region’s various public transport service operators and
publishes GTFS feeds regularly (Transport for NSW, 2021). The majority of feeds released since
March 2013 have been archived at Open Mobility Data (2021). However, the 2013 and 2014
archives are incomplete, making 2015 the earliest year for which a reliable published GTFS feed
is available.
To enable analysis of public transport provision prior to 2015, we generated historic GTFS feeds
based on archived route and timetable information. We constructed feeds for bus, train, and tram
services in the Greater Sydney region covering the period from 1855 to 2015. These were
combined with ferry services derived from the published 2019 GTFS feed. Ferry services were
assumed to be constant over the study period, as ferry routes have changed little over time
(Sandell, 2021) and their timetables have not been archived as rigorously.
The resulting historic GTFS feeds were validated by comparing public transport accessibility
metrics derived from the generated 2015 feed with those obtained from the officially published
2015 GTFS, and by calibrating key assumptions accordingly.
A GTFS feed comprises a set of comma-separated text files that follow a defined relational
structure. A detailed description of the GTFS files and their specifications is provided by
Google Developers (2021).
2.1 Historic tram and train GTFS: 1855 to 2015
For the generation of Sydney’s historical tram and train GTFS, the focus was limited to the
existence of services and network coverage over time, regardless of technological changes in
vehicle types or infrastructure.
All stations, stops, and track alignments were georeferenced, with associated opening and
closure dates encoded to enable temporal filtering. Schedules were derived from historical
sources where available. Otherwise, synthetic timetables were generated using assumed average
speeds:
- Trams: 20 km/h
- Trains: 30 km/h
These were based on known distances.
Bidirectional services were assumed throughout. Irregular or special services were excluded due
to their minimal impact on overall service frequency.
The tram network data was mainly sourced from historical works by Keenan (1979). The train
data was collected from online repositories such as Wikipedia, the official Sydney Trains
website, and reports from the Australasian Railway Association. Where headway or frequency
information was missing, consistent service intervals were estimated based on average travel
times.
More details on the historic GTFS generation process for trams and trains can be found in
Lahoorpoor (2022).
2.2 Historic bus GTFS: 1925 to 2015
To generate historic GTFS for buses, we used archived information on bus route and timetable
changes. Historic changes in Sydney’s metropolitan bus routes from 1925 onward have been
archived by Henderson (2021) at sydneybusroutes.com. The archives provide systematically
arranged records of changes in streets traversed, timetables, and operating arrangements for
every bus route operated in the region.
Using the archived bus route and timetable information, we generated bus service GTFS for
every year between 1925 and 2015. The process was:
1) Convert text archives to machine-readable form
- Manually rearrange and format each route variation in the text documents.
- Feed the edited text into a Python script developed to extract relevant information.
2) Generate route shapes for each route variation
- Use the Python package OSMnx (Boeing, 2017).
- Step A: Match consecutive streets in each route’s street list to OSMnx street intersections,
producing a list of georeferenced intersections.
- Step B: Compute the shortest path between consecutive intersections, then merge those
paths to produce the route shape.
3) Digitise timetables and estimate service levels
- Manually tabulate route timetables archived alongside route variations.
- For each route variation, the following were available:
* off-peak travel time
* start time and direction of first and last trips
* headway between trips
- Use these to calculate trips per day and estimate start times for each trip.
4) Determine stop locations and generate stop-time itineraries
- The archives did not contain bus stop information.
- Assume all stops served by buses in 2019 were also served historically, and create stops
where there were none in 2019.
- Generate points at 400-metre spacing along streets using QGIS (QGIS Development Team,
2021) and the 2019 OSM network (OpenStreetMap contributors, 2021).
- Use 400 m because:
* average 2019 GTFS bus stop spacing was 407 m
* the region’s guideline for stop spacing supports this (NSW Government, 2013)
- Once stops were set for each route shape, compute arrival times by using route speed and
cumulative distance to each stop.
5) Compile GTFS feeds by year
- Combine generated route shapes, trips, and stop times for each route variation in each year.
- Assign agency for all bus routes as “Sydney Buses Network” (as in the 2019 GTFS feed).
- Include typical weekday services plus Saturday and Sunday services, reflected in calendar
entries.
A detailed account of the step-by-step process, including assumptions and limitations, is
available in Rayaprolu (2023).
2.3 Validating 2015 GTFS using access
To ensure reliability and accuracy of the generated historic GTFS, we validated the feeds by
comparing generated 2015 feeds with those published.
Because sources and development processes differ, the datasets were not readily comparable.
Although routes could often be compared by matching route numbers, the published routes have
short-workings and other variations that are not fully captured in the archives and the generated
GTFS. Therefore, we used person-weighted average access (PWA) across the region, provided by
the two networks, to calibrate key assumptions.
Access was measured as the cumulative number of opportunities reached by public transport
within a given time threshold. Jobs are often used, but spatially disaggregated employment data
were unavailable historically, so we measured access to population instead. We used the historic
population distribution established by Lahoorpoor and Levinson (2021) for Greater Sydney.
With population counts by mesh block and GTFS for each year, we estimated population reachable
within a time threshold by public transport for each year. Steps:
- Query travel-time isochrones for each mesh block using OpenTripPlanner (OTP)
(OpenTripPlanner, 2021).
- OTP builds a network graph using a street network and GTFS, then computes point-to-point
travel times and isochrones for a given location, time cutoff, mode, day, and time.
- Supply OTP with the generated GTFS.
- Query isochrones for each mesh block centroid for 8:00 am on a weekday.
- Hold the street network constant (downloaded from OpenStreetMap contributors in July 2020),
because historical street networks are not consistently available (Turner et al., 2023).
- Overlap the isochrones with population for the corresponding year to estimate access.
- For computational ease, assign mesh-block population to the centroid.
Results:
- The 30-minute PWA difference at 08:00 am between generated 2015 GTFS and published 2015
GTFS was 1.9%.
- Computing access for each minute from 08:00 am to 08:30 am, the average PWA from the
generated 2015 GTFS over that window was 1.4% less than the published GTFS.
- Spatially (mesh-block level), most core and inner areas showed small differences.
Together, these results indicate that the generated 2015 GTFS closely reproduces temporal and
spatial accessibility patterns in the published data, supporting use for historical accessibility
analysis.
References
Boeing G (2017) OSMnx: New methods for acquiring, constructing, analyzing, and visualizing
complex street networks. Computers, Environment and Urban Systems 65: 126–139.
Google Developers (2021) GTFS Static Reference.
https://developers.google.com/transit/gtfs/reference. Accessed: 2021-12-22.
Henderson R (2021) Sydney bus routes: A history of private and government bus routes in Sydney.
https://sydneybusroutes.com. Accessed: 2021-07-14.
Keenan DR (1979) Tramways of Sydney. Transit Press Sydney. ISBN 0909338027.
Lahoorpoor B (2022) Terraces, towers, trams, and trains: Examining the growth of Sydney using
empirical models and agent-based simulation. PhD Thesis, The University of Sydney.
Lahoorpoor B and Levinson D (2021) An empirical model of land use and public transit
co-developments in Sydney. Working Paper 06, The University of Sydney.
NSW Government (2013) Sydney’s bus future. Simpler, faster, better bus service.
https://mysydneycbd.nsw.gov.au/sites/default/files/user-files/uploads/bus-future-final-web.pdf.
Accessed: 2021-07-22.
Open Mobility Data (2021) Greater Sydney GTFS.
https://transitfeeds.com/p/transport-for-nsw/237. Accessed: 2021-07-14.
OpenStreetMap contributors (2021) Planet dump retrieved from https://planet.osm.org.
https://www.openstreetmap.org.
OpenTripPlanner (2021) Multimodal Trip Planning. https://www.opentripplanner.org.
Accessed: 2021-07-22.
QGIS Development Team (2021) QGIS Geographic Information System. QGIS Association.
https://www.qgis.org.
Rayaprolu H (2023) The Co-Evolution Of Public Transport Access And Ridership. PhD Thesis,
The University of Sydney. https://ses.library.usyd.edu.au/handle/2123/31254.
Sandell R (2021) UrbanFerryist. Tweet, 30 December 2021.
https://twitter.com/UrbanFerryist/status/1476468691625594880.
(Claim: Sydney’s ferry network of 1909 was very similar to today’s, with the main change being
the end of car ferries after the Harbour Bridge opened.)
Transport for NSW (2021) TfNSW Open Data Hub and Developer Portal.
https://opendata.transport.nsw.gov.au. Accessed: 2021-07-14.
Turner H, Lahoorpoor B, and Levinson DM (2023) Creating a dataset of historic roads in Sydney
from scanned maps. Scientific Data 10(1): 683. | en |