Efficient Prompt Engineering for Large Foundation Models

Xu, Siyu

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Xu, Siyu
dc.date.accessioned	2025-03-21T05:09:06Z
dc.date.available	2025-03-21T05:09:06Z
dc.date.issued	2025	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/33728
dc.description.abstract	The rapid advancements in large foundation models, particularly GPT-4V, have unlocked significant potential in various applications, including visual recognition tasks. However, the high computational and financial costs associated with GPT-4V’s inference remain substantial barriers to its widespread use. In response to these challenges, this thesis introduces ``Collage Prompting'', a novel and budget-friendly prompt engineering technique that concatenates multiple images into a single visual input. By allowing GPT-4V to process several images simultaneously, this approach not only reduces inference costs but also opens new avenues for more efficient utilization of large-scale models in real-world scenarios. The thesis further investigates the influence of image arrangement within the collage prompt on recognition accuracy. We present a framework that uses a graph-based predictor to optimize the placement of images, improving the model’s performance by ensuring the most favorable configuration. To facilitate future research in this area, we introduce CollagePrompt, a comprehensive benchmark designed to evaluate the cost-effectiveness and recognition performance of collage prompts. This benchmark provides a platform for testing various image arrangements and includes a baseline optimization technique derived from genetic algorithms. Through extensive experimentation across diverse datasets, we demonstrate that collage prompts with optimized image layouts significantly outperform randomly arranged ones in terms of both accuracy and cost-efficiency. Moreover, two key metrics are proposed to measure the effectiveness of different collage configurations. This research contributes to the emerging field of prompt engineering by offering a practical solution that enhances the economic viability of large foundation models like GPT-4V, without compromising their visual recognition capabilities.	en_AU
dc.language.iso	en	en_AU
dc.subject	Large Language Models	en_AU
dc.subject	Large Multimodal Models	en_AU
dc.subject	GPT-4V	en_AU
dc.subject	Efficient Prompting Engineering	en_AU
dc.title	Efficient Prompt Engineering for Large Foundation Models	en_AU
dc.type	Thesis
dc.type.thesis	Masters by Research	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Xu, Chang