Show simple item record

FieldValueLanguage
dc.contributor.authorXu, Siyu
dc.date.accessioned2025-03-21T05:09:06Z
dc.date.available2025-03-21T05:09:06Z
dc.date.issued2025en_AU
dc.identifier.urihttps://hdl.handle.net/2123/33728
dc.description.abstractThe rapid advancements in large foundation models, particularly GPT-4V, have unlocked significant potential in various applications, including visual recognition tasks. However, the high computational and financial costs associated with GPT-4V’s inference remain substantial barriers to its widespread use. In response to these challenges, this thesis introduces ``Collage Prompting'', a novel and budget-friendly prompt engineering technique that concatenates multiple images into a single visual input. By allowing GPT-4V to process several images simultaneously, this approach not only reduces inference costs but also opens new avenues for more efficient utilization of large-scale models in real-world scenarios. The thesis further investigates the influence of image arrangement within the collage prompt on recognition accuracy. We present a framework that uses a graph-based predictor to optimize the placement of images, improving the model’s performance by ensuring the most favorable configuration. To facilitate future research in this area, we introduce CollagePrompt, a comprehensive benchmark designed to evaluate the cost-effectiveness and recognition performance of collage prompts. This benchmark provides a platform for testing various image arrangements and includes a baseline optimization technique derived from genetic algorithms. Through extensive experimentation across diverse datasets, we demonstrate that collage prompts with optimized image layouts significantly outperform randomly arranged ones in terms of both accuracy and cost-efficiency. Moreover, two key metrics are proposed to measure the effectiveness of different collage configurations. This research contributes to the emerging field of prompt engineering by offering a practical solution that enhances the economic viability of large foundation models like GPT-4V, without compromising their visual recognition capabilities.en_AU
dc.language.isoenen_AU
dc.subjectLarge Language Modelsen_AU
dc.subjectLarge Multimodal Modelsen_AU
dc.subjectGPT-4Ven_AU
dc.subjectEfficient Prompting Engineeringen_AU
dc.titleEfficient Prompt Engineering for Large Foundation Modelsen_AU
dc.typeThesis
dc.type.thesisMasters by Researchen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeMaster of Philosophy M.Philen_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorXu, Chang


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.