Self-Motivated Composition of Strategic Action Policies

Anthony, Tom

dc.contributor.author	Anthony, Tom
dc.date.accessioned	2019-02-15T10:57:36Z
dc.date.available	2019-02-15T10:57:36Z
dc.date.issued	2018-06-11
dc.identifier.uri	http://hdl.handle.net/2299/21088
dc.description.abstract	In the last 50 years computers have made dramatic progress in their capabilities, but at the same time their failings have demonstrated that we, as designers, do not yet understand the nature of intelligence. Chess playing, for example, was long offered up as an example of the unassailability of the human mind to Artificial Intelligence, but now a chess engine on a smartphone can beat a grandmaster. Yet, at the same time, computers struggle to beat amateur players in simpler games, such as Stratego, where sheer processing power cannot substitute for a lack of deeper understanding. The task of developing that deeper understanding is overwhelming, and has previously been underestimated. There are many threads and all must be investigated. This dissertation explores one of those threads, namely asking the question “How might an artificial agent decide on a sensible course of action, without being told what to do?”. To this end, this research builds upon empowerment, a universal utility which provides an entirely general method for allowing an agent to measure the preferability of one state over another. Empowerment requires no explicit goals, and instead favours states that maximise an agent’s control over its environment. Several extensions to the empowerment framework are proposed, which drastically increase the array of scenarios to which it can be applied, and allow it to evaluate actions in addition to states. These extensions are motivated by concepts such as bounded rationality, sub-goals, and anticipated future utility. In addition, the novel concept of strategic affinity is proposed as a general method for measuring the strategic similarity between two (or more) potential sequences of actions. It does this in a general fashion, by examining how similar the distribution of future possible states would be in the case of enacting either sequence. This allows an agent to group action sequences, even in an unknown task space, into ‘strategies’. Strategic affinity is combined with the empowerment extensions to form soft-horizon empowerment, which is capable of composing action policies in a variety of unknown scenarios. A Pac-Man-inspired prey game and the Gambler’s Problem are used to demonstrate this selfmotivated action selection, and a Sokoban inspired box-pushing scenario is used to highlight the capability to pick strategically diverse actions. The culmination of this is that soft-horizon empowerment demonstrates a variety of ‘intuitive’ behaviours, which are not dissimilar to what we might expect a human to try. This line of thinking demonstrates compelling results, and it is suggested there are a couple of avenues for immediate further research. One of the most promising of these would be applying the self-motivated methodology and strategic affinity method to a wider range of scenarios, with a view to developing improved heuristic approximations that generate similar results. A goal of replicating similar results, whilst reducing the computational overhead, could help drive an improved understanding of how we may get closer to replicating a human-like approach.	en_US
dc.language.iso	en	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/us/	*
dc.subject	Information Theory	en_US
dc.subject	Artificial Life	en_US
dc.subject	Empowerment	en_US
dc.subject	Strategic Affinity	en_US
dc.title	Self-Motivated Composition of Strategic Action Policies	en_US
dc.type	info:eu-repo/semantics/doctoralThesis	en_US
dc.identifier.doi	doi:10.18745/th.21088	*
dc.identifier.doi	10.18745/th.21088
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD	en_US
dcterms.dateAccepted	2018-06-11
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US
rioxxterms.version	VoR	en_US
rioxxterms.licenseref.uri	https://creativecommons.org/licenses/by/4.0/	en_US
rioxxterms.licenseref.startdate	2019-02-15
herts.preservation.rarelyaccessed	true
rioxxterms.funder.project	ba3b3abd-b137-4d1d-949a-23012ce7d7b9	en_US