Data pipeline for GTFS transit arrival and departure information
Access status:
Open Access
File/s
13_Cleaned_Daily_TU.7z
Transformed_TU_2020_06.7z
Transformed_TU_2020_07.7z
Transformed_TU_2020_08.7z
Transformed_TU_2020_09.7z
Transformed_TU_2020_10.7z
Transformed_TU_2020_11.7z
Transformed_TU_2020_12.7z
Transformed_TU_2021_01.7z
Transformed_TU_2021_02.7z
Transformed_TU_2021_03.7z
Transformed_TU_2021_04.7z
Transformed_TU_2021_05.7z
Transformed_TU_2021_06.7z
Transformed_TU_2021_07.7z
Transformed_TU_2021_08.7z
Transformed_TU_2021_09.7z
Transformed_TU_2021_10.7z
Transformed_TU_2021_11.7z
Transformed_TU_2021_12.7z
Transformed_TU_2022_01.7z
Transformed_TU_2022_02.7z
Transformed_TU_2022_03.7z
Transformed_TU_2022_04.7z
Transformed_TU_2022_05.7z
Transformed_TU_2022_06.7z
Transformed_TU_2020_06.7z
Transformed_TU_2020_07.7z
Transformed_TU_2020_08.7z
Transformed_TU_2020_09.7z
Transformed_TU_2020_10.7z
Transformed_TU_2020_11.7z
Transformed_TU_2020_12.7z
Transformed_TU_2021_01.7z
Transformed_TU_2021_02.7z
Transformed_TU_2021_03.7z
Transformed_TU_2021_04.7z
Transformed_TU_2021_05.7z
Transformed_TU_2021_06.7z
Transformed_TU_2021_07.7z
Transformed_TU_2021_08.7z
Transformed_TU_2021_09.7z
Transformed_TU_2021_10.7z
Transformed_TU_2021_11.7z
Transformed_TU_2021_12.7z
Transformed_TU_2022_01.7z
Transformed_TU_2022_02.7z
Transformed_TU_2022_03.7z
Transformed_TU_2022_04.7z
Transformed_TU_2022_05.7z
Transformed_TU_2022_06.7z
Permalink
https://hdl.handle.net/2123/29562Metadata
Show full item recordType
DatasetAbstract
Cities generate large volumes of data daily through digital services and smart city applications, these include Public Transport Authorities which generate big data as part of their daily operations, such as vehicle positions, counts of passengers and user travel patterns. The ...
See moreCities generate large volumes of data daily through digital services and smart city applications, these include Public Transport Authorities which generate big data as part of their daily operations, such as vehicle positions, counts of passengers and user travel patterns. The General Transit Feed Specification (GTFS) is a data format that allows public transport data to be consumed by a wide variety of software applications. This paper presents a data pipeline developed to manipulate the GTFS feeds into a general and flexible dataset of realtime transit arrivals. There are three barriers to widespread access to the information addressed by creating a one-size-fits-all data pipeline for realtime operations from GTFS. First, the protocol buffer format is not human readable and requires processing before use in most transport applications. Secondly, the general specification does vary place-to-place and the conditionally required and optional fields are inconsistent between locations. Thirdly, the raw data may contain errors including missing stop sequence or a reverse direction bus being detected in the bus stop area. The pipeline is constructed of set of data cleaning and transformation steps to address these challenges. The paper briefly presents a potential use cases of the processed data to illustrate its relevance to researchers and practitioners.
See less
See moreCities generate large volumes of data daily through digital services and smart city applications, these include Public Transport Authorities which generate big data as part of their daily operations, such as vehicle positions, counts of passengers and user travel patterns. The General Transit Feed Specification (GTFS) is a data format that allows public transport data to be consumed by a wide variety of software applications. This paper presents a data pipeline developed to manipulate the GTFS feeds into a general and flexible dataset of realtime transit arrivals. There are three barriers to widespread access to the information addressed by creating a one-size-fits-all data pipeline for realtime operations from GTFS. First, the protocol buffer format is not human readable and requires processing before use in most transport applications. Secondly, the general specification does vary place-to-place and the conditionally required and optional fields are inconsistent between locations. Thirdly, the raw data may contain errors including missing stop sequence or a reverse direction bus being detected in the bus stop area. The pipeline is constructed of set of data cleaning and transformation steps to address these challenges. The paper briefly presents a potential use cases of the processed data to illustrate its relevance to researchers and practitioners.
See less
Date
2022-09-19Licence
Creative Commons Attribution 4.0Faculty/School
The University of Sydney Business School, Institute of Transport and Logistics Studies (ITLS)Share