User description

Abstract.Procedural content generation for games is a growing trend in both research and industry, even though there is no consensus of how good content looks, nor how to automatically evaluate it. A number of metrics have been developed in the past, usually focused on the artifact as a whole, and mostly lacking grounding in human experience. In this study we develop a new set of automated metrics, motivated by ideas from architecture, namely isovists and space syntax, which have a track record of capturing human experience of space. These metrics can be computed for a specific game state, from the player’s perspective, and take into account their embodiment in the game world. We show how to apply those metrics to the 3d blockworld of Minecraft. We use a dataset of generated settlements from the GDMC Settlement Generation Challenge in Minecraft and establish several rank-based correlations between the isovist properties and the rating human judges gave those settelements. We also produce a range of heat maps that demonstrate the location based applicability of the approach, which allows for development of those metrics as measures for a game experience at a specific time and space.Game AI, Procedural Content Generation, Player Experience, Isovist, Minecraft, Procedural ArchitectureCan isovist theory from architecture provide us with a metric to automatically evaluate game in general, and procedural generated content (PCG) in particular? Being able to quickly, and without human intervention, evaluate generated content would provide a massive boost to the impact PCG has on game design. It would also methodologically helpful to the academic field of PCG (Shaker et al., 2016; Short and Adams, 2017; Compton, 2016; Liapis et al., 2014), which struggles to provide quantitative data about newly developed techniques. Approaches such as expressive range analysis (Smith and Whitehead, 2010), and its extensions (Cook et al., 2016), provide valuable insights, but focus on measuring the diversity, rather than the quality of various artifacts. A range of existing metrics are suitable to say if two levels or artifacts are similar or not, but struggle to indicate their quality. Some attempts of the past to ground these existing metrics by comparing them to human experience - either of general quality (Mariño et al., 2015; Hervé and Salge, 2021), or specific desired experiences (Guckelsberger et al., 2017; Yannakakis and Togelius, 2011), are a step in the right direction, but provide mixed results, and showcase several problems with existing metrics - such as some artifacts proving too complex to evaluate on a “per-level” basis, or difficulties with transitioning between 2d and 3d. In this paper, we attempt to develop a new set of metrics, based on theories from architecture, focused on isovists(Benedikt, 1979), i.e. the space visible from a given vantage point. This is a quantitative approach to space in architecture, with an established track record to reflect human experiences and behaviour (Wiener et al., 2007; Weitkamp et al., 2014). Its agent-focussed definition also allows for metrics that are influenced by the specific embodiment of the agent and are able to provide a metric for an experience of a given moment - allowing for evaluation of not just artefacts as a whole, but also for evaluations of trajectories through the state space of a game, or a specific game state. We develop a set-based computation approach, that allows us to apply these measures to 3d discrete environments, such as Minecraft (Studios, 2011) blockworlds - but aim to not incorporate any Minecraft specific features to keep the measures general. As a first study to evaluate these measures, we use a dataset from the GDMC AI Settlement Generation Challenge (Salge et al., 2018) in Minecraft. We don’t focus on the actual generation, and just look at the various published maps and evaluations from human judges. Similar to other PCG AI competitions, such as (Khalifa et al., 2016; Shaker et al., 2011; Stephenson and Renz, 2016), GDMC uses human judges rather than automatic evaluation, indicating further, that there seems to be no generally agreed upon, automatic, measure of quality.As human judgement is provided on a “per generator” basis, we compare the average score of our measurements with the human judgements, and find some interesting correlations, particularly between the perimeter of the isovist, and the perceived adaptability of the settlement. A measure for visible block types, which we considered as a agent-focussed refinement of the block diversity measure from (Hervé and Salge, 2021) also correlates well with various human judgements. We also provide location based heatmaps for selected maps and measures, to show how those values would change as the player moves around the map - showcasing the location-based possibilities of the new measures. We will now first introduce some more details about PCG and the GDMC challenge, and introduce the isovist concepts and its computation in more detail, before discussing the results and its implications. In general, we conclude that these isovist based measures seem a useful approach to PCG evaluation that warrant further study.2. PCGPCG is the ensemble of techniques that aim to create game content algorithmically. It has been used to generate content of various nature, from game assets to game play rules. Those techniques are being used in the video game industry, and in the same time constitute a field of research. One recurring challenge in PCG is evaluating the output (Smith and Whitehead, 2010; Summerville et al., 2017). A generator able to self evaluate the content it produces can improve itself or reliably curates which asset is relevant or not, depending of the technique being used. Metrics of evaluation can also be used by designers to tune the generators and optimize certain aspect of the generated artifacts.The core constraint is usually playability - aimed to ensure that a game can actually be used as such. But other metrics have been developed with the intent to evaluate other dimensions of an artifact, such as its looks, the narrative it conveys, its impact on game mechanics, or even how ”fun” it is. However these metrics tends to lack of human grounding (Mariño et al., 2015; Hervé and Salge, 2021). They are also build to focus the entirety of the artifact, and are rarely designed for a local use, targeting for instance a whole level instead of a single location.Therefore, developing new metrics and improving existing ones is necessary in order to improve PCG as a field, for various reason (automatic curation, co-authoring, …). Beyond the question of the metrics themselves lays another one : How to properly use them? In the current paper, we try to address both of these concern, by establishing a new range of metrics, polishing an existing one, and compare their efficiency both globally and locally.3. GDMCMinecraft (Studios, 2011) is a voxel based game developed by Mojang Studio, where the players progress in an open world made out of blocks. These blocks represent different materials, such as wood, rock and so on. Players can destroy blocks, place them in any position within the world, or even combine them through crafting mechanics in order to create new types of block or item. Minecraft is mostly known for its open-endedness and is mostly used as a sandbox game. Many players use the blocks mechanics to terraform the game world, create structures such as houses, castles or cities. Since the art style and the setting of Minecraft is very generic, the game affords free creation of almost any kind of artifact, with only the player’s imagination setting the limits.The Generative Design in Minecraft Competition (GDMC) is a PCG competition in which competitors submit a settlement generator (Salge et al., 2018). All the submitted generators are then tested on fixed maps, which are selected by the organizers (Salge et al., 2020). All the generated settlements are then sent to the jury. This jury includes experts in various field, such as AI, Game Design or Urbanism. Each judge scores in each of the following categories : Adaptability, Functionality, Narrative, Aesthetic. Adaptability is how well the settlement is adapted to its location - how well it adapts to the terrain, both on a large and small scale. Functionality is about what affordances the settlement provides, both to the Minecraft player and the simulated villagers. It covers various aspects, such as food, production, navigability, security, etc. Narrative reflects how well the settlement itself tells an evocative story about its own history, and about who its inhabitants are (There is a separate bonus challenge about also adding a written PCG text that tells the story of the settlement(Salge et al., 2019)). Aesthetic is a rating of the overall look of the settlements. In the competition, the rating of each category is computed for each generator by averaging (mean) across all judge’s scores. The judges provide for each generator, after looking at every maps, one score for each of the four categories. The overall score of the generators is then obtained by a mean average over the four categories.The human data we are working with are the average scores for the generator. We therefore have, for each generator, 5 scores: the overall score, adaptivity, functionality, narrative and aesthetics. In 2021, the competition received 20 submissions.4. Theory of space4.1. IsovistGiven a bounded environment, for each point x