The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework

Lagos, Tomas; Auad, Ramon; Lagos, Felipe

Field	Value	Language
dc.contributor.author	Lagos, Tomas
dc.contributor.author	Auad, Ramon
dc.contributor.author	Lagos, Felipe
dc.date.accessioned	2024-08-21T05:41:14Z
dc.date.available	2024-08-21T05:41:14Z
dc.date.issued	2024	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/32973
dc.description.abstract	In the age of e-commerce, logistics companies often operate within extensive road networks without accurate knowledge of travel times for their specific fleet of vehicles. Moreover, millions of dollars are spent on routing services that fail to accurately capture the unique characteristics of the drivers and vehicles of the companies. In this work, we address the challenge faced by a logistic operator with limited travel time information, aiming to find the optimal expected shortest path between origin-destination pairs. We model this problem as an online shortest path problem, common to many lastmile routing settings; given a graph whose arcs’ travel times are stochastic and follow an unknown distribution, the objective is to find a vehicle route of minimum travel time from an origin to a destination. The planner progressively collects travel condition data as drivers complete their routes. Inspired by the combinatorial multiarmed bandit and kriging literature, we propose three methods with distinct features to effectively learn the optimal shortest path, highlighting the practical advantages of incorporating spatial correlation in the learning process. Our approach balances exploration (improving estimates for unexplored arcs) and exploitation (executing the minimum expected time path) using the Thompson sampling algorithm. In each iteration, our algorithm executes the path that minimizes the expected travel time based on data from a posterior distribution of the speeds of the arcs. We conduct a computational study comprising two settings: a set of four artificial instances and a real-life case study. The case study uses empirical data of taxis in the 17-km-radius area of the center of Beijing, encompassing Beijing’s “5th Ring Road.” In both settings, our algorithms demonstrate efficient and effective balancing of the exploration-exploitation trade-off.	en_AU
dc.language.iso	en	en_AU
dc.publisher	Institute for Operations Research and the Management Sciences	en_AU
dc.relation.ispartof	Transportation Science	en_AU
dc.subject	last-mile logistics	en_AU
dc.subject	machine learning	en_AU
dc.subject	multiarmed bandits	en_AU
dc.subject	Thompson sampling	en_AU
dc.subject	online shortest path	en_AU
dc.subject	kriging	en_AU
dc.title	The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework	en_AU
dc.type	Article	en_AU
dc.identifier.doi	10.1287/trsc.2023.0196
dc.type.pubtype	Author accepted manuscript	en_AU
usyd.faculty	SeS faculties schools::The University of Sydney Business School::Discipline of Business Analytics	en_AU
workflow.metadata.only	Yes	en_AU

Show simple item record

Associated file/s

There are no files associated with this item.

Associated collections

Research Publications and Outputs

Show simple item record

The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework

Associated file/s

Associated collections

Version history

Library

The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework

Associated file/s

Associated collections

Share

Version history

Filters

Library