Abstract:
Quantum computers, which are subject to current research, offer, apart from the hope for an quantum advantage, the chance of reducing the number of used trainable parameters. This is especially interesting for machine learning, since it could lead to a faster learning process and lower the use of computational resources. In the current Noisy Intermediate-Scale Quantum (NISQ) era the limited number of qubits and quantum noise make learning a difficult task. Therefore the research focuses on Variational Quantum Circuits (VQCs) which are hybrid algorithms constructed of a parameterised quantum circuit with classic optimization and only need few qubits to learn. Literature of the recent years proposes some interesting approaches to solve reinforcement learning problems using the VQC, which utilize promising strategies to increase its results that deserve closer research. In this work we will investigate data re-uploading, input and output scaling and an exponentially declining learning rate for the actor-VQC of a quantum proximal policy optimization (QPPO) algorithm, in the Frozen Lake and Cart Pole environments, on their ability to reduce the parameters of the VQC in relation to its performance. Our results show an increase of hyperparameter stability and performance for data re-uploading and our exponentially declining learning rate. While input scaling has no effect on the parameter effectiveness, output scaling can archive powerful greediness control and lead to a rise in learning speed and robustness.
Author:
Timo Witter
Advisors:
Michael Kölle, Philipp Altmann, Claudia Linnhoff-Popien
Student Thesis | Published February 2024 | Copyright © QAR-Lab
Direct Inquiries to this work to the Advisors