Conclusions and Future Works - Learning Behavior Trees for Autonomous Agents with Hybrid Constr

Learning Behavior Trees for Autonomous Agents with Hybrid Constraints Evolution

5. Conclusions and Future Works

This paper proposed a modified evolving behavior trees approach, named evolving BTs with hybrid constraints (EBT-HC), to facilitate behavior modeling for autonomous agents in simulation and computer games. Our main contribution is a novel idea of dynamic constraint to improve the evolution of Behavior Trees, which discovers the frequent preponderant sub-trees and adjusts nodes crossover probability to accelerate preponderant structure combination. To improve the evolving BT with only dynamic constraint further, we proposed the evolving BTs approach with hybrid constraints by combing the existing static structural constraint with our dynamic constraint. The hybrid EBT-HC can further accelerate behavior learning and find better solutions without the loss of the domain-independence. Preliminary experiments on ’Ms Pac-Man vs Ghosts’ benchmark showed that the proposed EBT-HC approach can produce plausible behaviors than other approaches. The stable final best individual with higher fitness satisfies the goal of generating better behavior policy based on evaluation criteria provided by domain expert. The fast and stable learning curve showed the advantage of hybrid constraints to speed up convergence. From the perspective of tree design and implementation, the generated BTs are human readable and easy to analyze and fine-tune, which can be a promising initial step for transparent behavior modeling automatically.

There are some avenues of research to improve this study. Firstly, the proposed approach should be validated in more complex task scenarios and conﬁgurations for behavior modeling automatically.

In current work, the Pac-Man game is conﬁgured as a simple and typical simulation environment

to validate proposed approach. However, the interaction between the learning technique and the agent environment is nontrivial. The environmental model, behavior evaluation function, perception, and action sets are critical for behavior performance. Thus, more complex scenarios, such as bigger state-space representation, partial observation or multiple agents in real-time strategy game [31], should be considered to provide rich agent learning environment to validate the proposed approach.

On the other hand, it is important to research automatic learning for appropriate parameters setting in GP systems. It is necessary to broaden the applications of proposed approach in more scenarios and conﬁgurations.

Another interesting research topic is learning behavior trees from observation. For behavior modeling through experiential learning, it is measured by how well the task is performed based on the evaluation criteria provided by experts. However, the optimal behavior may be inappropriate or unnatural. Thus, there are a few works of learning BTs from observation emerged but still an open problem [7,25]. We believe GP is a promising method and the frequent sub-tree mining can be a potential tool to facilitate behavior block building and accelerate learning. The possible issue behind the method is the similarity metric to evaluate the generated behavior. In [32], the authors investigate a gamalyzer-based play trace metric to measure the difference between two play traces in computer games. Based on above techniques, we can develop a model-free framework to generate BT by learning from training examples.

Author Contributions: Q.Z. and J.Y. conceived and designed the paper structure and the experiments; Q.Z.

performed the experiments; Q.Y. and Y.Z. contributed with materials and analysis tools.

Funding:This work was partially supported by the National Science Foundation (NSF) project 61473300, CHINA.

Acknowledgments: This work was partially supported by the National Science Foundation (NSF) project 61473300, CHINA. The authors would like to thank the helpful discussions and suggestions with Xiangyu Wei, Kai Xu and Weilong Yang.

Conﬂicts of Interest:The authors declare no conﬂict of interest. The founding sponsors had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

BTs Behavior Trees FSMs Finite State Machines NPC Non-player Characters CGF Computer-Generated Force GP Genetic Programming RL Reinforcement Learning LfO Learning from Observation NN Neural Network

NSF National Science Foundation

References

1. Tirnauca, C.; Montana, J.L.; Ontanon, S.; Gonzalez, A.J.; Pardo, L.M. Behavioral Modeling Based on Probabilistic Finite Automata: An Empirical Study. Sensors 2016, 16, 958. [CrossRef] [PubMed]

2. Toubman, A.; Poppinga, G.; Roessingh, J.J.; Hou, M.; Luotsinen, L.; Lovlid, R.A.; Meyer, C.; Rijken, R.;

Turcanik, M. Modeling CGF Behavior with Machine Learning Techniques: Requirements and Future Directions. In Proceedings of the 2015 Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL, USA, 30 November–4 December 2015; pp. 2637–2647.

3. Diller, D.E.; Ferguson, W.; Leung, A.M.; Benyo, B.; Foley, D. Behavior modeling in commercial games.

In Proceedings of the 2004 Conference on Behavior Representation in Modeling and Simulation (BRIMS), Arlington, VA, USA, 17–20 May 2004; pp. 17–20.

Appl. Sci. 2018, 8, 1077

4. Kamrani, F.; Luotsinen, L.J.; Løvlid, R.A. Learning objective agent behavior using a data-driven modeling approach. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Budapest, Hungary, 9–12 October 2017; pp. 002175–002181.

5. Luotsinen, L.J.; Kamrani, F.; Hammar, P.; Jändel, M.; Løvlid, R.A. Evolved creative intelligence for computer generated forces. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Budapest, Hungary, 9–12 October 2017; pp. 003063–003070.

6. Yao, J.; Huang, Q.; Wang, W. Adaptive human behavior modeling for air combat simulation. In Proceedings of the 2015 IEEE/ACM 19th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Chengdu, China, 14–16 October 2015; pp. 100–103.

7. Sagredo-Olivenza, I.; Gómez-Martín, P.P.; Gómez-Martín, M.A.; González-Calero, P.A. Trained Behavior Trees: Programming by Demonstration to Support AI Game Designers. IEEE Trans. Games 2017. [CrossRef]

8. Sekhavat, Y.A. Behaivor Trees for Computer Games. Int. J. Artif. Intell. Tools 2017, 1, 1–27.

9. Rabin, S. AI Game Programming Wisdom 4; Nelson Education: Scarborough, ON, Canada, 2014; Volume 4.

10. Colledanchise, M.; Ögren, P. Behavior Trees in Robotics and AI: An Introduction. arXiv 2017, arXiv:1709.00084.

11. Nicolau, M.; Perezliebana, D.; Oneill, M.; Brabazon, A. Evolutionary Behavior Tree Approaches for Navigating Platform Games. IEEE Trans. Comput. Intell. AI Games 2017, 9, 227–238. [CrossRef]

12. Perez, D.; Nicolau, M.; O’Neill, M.; Brabazon, A. Evolving behaviour trees for the mario AI competition using grammatical evolution. In Proceedings of the European Conference on the Applications of Evolutionary Computation, Torino, Italy, 27–29 April 2011; pp. 123–132.

13. Lim, C.U.; Baumgarten, R.; Colton, S. Evolving behaviour trees for the commercial game DEFCON.

In Proceedings of the European Conference on the Applications of Evolutionary Computation, Torino, Italy, 7–9 April 2010; pp. 100–110.

14. Mcclarron, P.; Ollington, R.; Lewis, I. Effect of Constraints on Evolving Behavior Trees for Game AI.

In Proceedings of the International Conference on Computer Games Multimedia & Allied Technologies, Los Angeles, CA, USA, 15–18 November 2016.

15. Press, J.R.K.M. Genetic programming II: Automatic discovery of reusable programs. Comput. Math. Appl.

1995, 29, 115.

16. Poli, R.; Langdon, W.B.; Mcphee, N.F. A Field Guide to Genetic Programming; lulu.com: Morrisville, NC, USA, 2008; pp. 229–230.

17. Scheper, K.Y.W.; Tijmons, S.; Croon, G.C.H.E.D. Behavior Trees for Evolutionary Robotics. Artif. Life 2016, 22, 23–48. [CrossRef] [PubMed]

18. Champandard, A.J. Behaivor Trees for Next-gen Game AI. Available online:http://aigamedev.com/insider/

presentations/behavior-trees/(accessed on 12 December 2007).

19. Zhang, Q.; Yin, Q.; Xu, K. Towards an Integrated Learning Framework for Behavior Modeling of Adaptive CGFs. In Proceedings of the IEEE 9th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 10–11 December 2016; Volume 2, pp. 7–12.

20. Fernlund, H.K.G.; Gonzalez, A.J.; Georgiopoulos, M.; Demara, R.F. Learning tactical human behavior through observation of human performance. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2006, 36, 128.

[CrossRef]

21. Stein, G.; Gonzalez, A.J. Building High-Performing Human-Like Tactical Agents Through Observation and Experience; IEEE Press: Piscataway, NJ, USA, 2011; p. 792.

22. Teng, T.H.; Tan, A.H.; Tan, Y.S.; Yeo, A. Self-organizing neural networks for learning air combat maneuvers.

In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, 10–15 June 2012; pp. 1–8.

23. Ontañón, S.; Mishra, K.; Sugandh, N.; Ram, A. Case-Based Planning and Execution for Real-Time Strategy Games. In Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Northern Ireland, UK, 13–16 August 2007; pp. 164–178.

24. Aihe, D.O.; Gonzalez, A.J. Correcting ﬂawed expert knowledge through reinforcement learning.

Expert Syst. Appl. 2015, 42, 6457–6471. [CrossRef]

25. Robertson, G.; Watson, I. Building behavior trees from observations in real-time strategy games.

In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, Madrid, Spain, 2–4 September 2015; pp. 1–7.

26. Dey, R.; Child, C. Ql-bt: Enhancing behaviour tree design and implementation with q-learning.

In Proceedings of the IEEE Conference on Computational Intelligence in Games (CIG), Niagara Falls, ON, Canada, 11–13 August 2013; pp. 1–8.

27. Zhang, Q.; Sun, L.; Jiao, P.; Yin, Q. Combining Behavior Trees with MAXQ Learning to Facilitate CGFs Behavior Modeling. In Proceedings of the 4th International Conference on IEEE Systems and Informatics (ICSAI), Hangzhou, China, 11–13 November 2017.

28. Asai, T.; Abe, K.; Kawasoe, S.; Arimura, H.; Sakamoto, H.; Arikawa, S. Efﬁcient Substructure Discovery from Large Semi-structured Data. In Proceedings of the Siam International Conference on Data Mining, Arlington, Arlington, VA, USA, 11–13 April 2002.

29. Chi, Y.; Muntz, R.R.; Nijssen, S.; Kok, J.N. Frequent Subtree Mining—An Overview. Fundam. Inf. 2005, 66, 161–198.

30. Williams, P.R.; Perezliebana, D.; Lucas, S.M. Ms. Pac-Man Versus Ghost Team CIG 2016 Competition.

In Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), Santorini, Greece, 20–23 September 2016.

31. Christensen, H.J.; Hoff, J.W. Evolving Behaviour Trees: Automatic Generation of AI Opponents for Real-Time Strategy Games. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2016.

32. Osborn, J.C.; Samuel, B.; Mccoy, J.A.; Mateas, M. Evaluating play trace (Dis)similarity metrics. In Proceedings of the AIIDE, Raleigh, NC, USA, 3–7 October 2014.

2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open accessc article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

applied sciences

Article

Incremental Design of Perishable Goods Markets through Multi-Agent Simulations

Kazuo Miyashita

National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan; [email protected]; Tel.: +81-29-861-5963

Received: 28 November 2017; Accepted: 11 December 2017; Published: 14 December 2017

Abstract: In current markets of perishable goods such as ﬁsh and vegetables, sellers are typically in a weak bargaining position, since perishable products cannot be stored for long without losing their value. To avoid the risk of spoiling products, sellers have few alternatives other than selling their goods at the prices offered by buyers in the markets. The market mechanism needs to be reformed in order to resolve unfairness between sellers and buyers. Double auction markets, which collect bids from both sides of the trades and match them, allow sellers to participate proactively in the price-making process. However, in perishable goods markets, sellers have an incentive to discount their bid gradually for fear of spoiling unsold goods. Buyers can take advantage of sellers’ discounted bids to increase their proﬁt by strategic bidding. To solve the problem, we incrementally improve an online double auction mechanism for perishable goods markets, which promotes buyers’ truthful bidding by penalizing their failed bids without harming their individual rationality. We evaluate traders’ behavior under several market conditions using multi-agent simulations and show that the developed mechanism achieves fair resource allocation among traders.

Keywords: online double auction; mechanism design; perishable goods; multi-agent simulation

1. Introduction

In the research of multi-agent simulations, several types of auction mechanisms have been investigated extensively to solve large-scale distributed resource allocation problems [1] and several applications have been proposed in different kinds of markets [2–4].

In the auctions, resources are generally supposed to have clear capacity limitations, and in some cases they also have explicit temporal limitations on their value [5]. In other words, the resources are modeled to be perishable in some problem settings. This study discusses the problem of trading perishable goods, such as ﬁsh and vegetables, in a market.

Several enterprises produce perishable goods or services and make proﬁts by allocating them to dynamic demand before they deteriorate. In the services industries, revenue management techniques [6]

have been investigated. Their objective is to maximize the revenues of a single seller because revenue management is typically practiced by the seller for its own proﬁt. Therefore, it is difﬁcult to apply those techniques to the markets, in which perishable products of multiple sellers must be traded in a coordinated manner to maximize social utility.

Agricultural and marine products are usually traded at spot markets (The opposite of spot markets are forward markets, which trade goods before production.) that deal in already-produced goods because their quantity and quality are highly uncertain in advance of production. Hence, their production costs are ﬁxed in the markets. Their salvage value is zero because they perish when they remain unsold in the markets. Therefore, their production costs are sunk and sellers’ marginal costs are zero in the traditional one-shot markets for perishable goods. This justiﬁes extensive use of one-sided auctions, in which only buyers submit bids, for deciding allocations and prices of perishable goods in fresh

Appl. Sci. 2017, 7, 1300; doi:10.3390/app7121300 39 www.mdpi.com/journal/applsci

markets [7]. However, in such markets, sellers cannot straightforwardly inﬂuence price making to obtain fair proﬁts [8].

In double action (DA) markets [9], both buyers and sellers submit their bids and an auctioneer determines resource allocation and prices on the basis of their bids. Recent studies [10–12] have applied multi-attribute double auction mechanisms to perishable goods supply chain problems using mixed integer linear programming. Although their approach is powerful in dealing with idiosyncratic properties of perishable goods, such as lead time and a shelf life, they have not considered the ﬂuctuating nature of perishable goods markets, where supply and demand change dynamically and unpredictably.

To solve the problem, we develop an online DA market, in which multiple buyers and sellers dynamically tender their bids for trading commodities before their due dates. Our online DA market is developed as an instance of a call market, which collects bids continuously and clears the market periodically using predetermined rules. Since bids in the call market have multiple matching chances, the perishable goods in the call market can hold certain salvage values until their time limits. Therefore, sellers can participate in the price-making process of the online DA market, in which their reservation prices are equal to remaining values of the goods. However, since values of the perishable goods decrease progressively, sellers have an incentive to discount their price for fear of spoiling unsold goods. Taking advantage of such an incentive of the sellers, buyers can increase their surplus by bidding a lower price. Nevertheless, buyers need to bid a moderate price for securing the goods before time limits for procurement.

Related Research

Until recently there has been little research undertaken on the online DA. As a preliminary study for an online DA, an efﬁcient and truthful mechanism for a static DA with temporally constrained bids was developed using weighted bipartite matching in graph theory [13]. In addition, for online DAs, some studies have addressed several important aspects of the mechanism, such as design of matching algorithms with good worst-case performance within the competitive analysis framework [14], construction of a general framework that facilitates a truthful dynamic DA by extending static DA rules [15], and an application to electric vehicle charging problems [16]. Although these research results are innovative and signiﬁcant, we cannot directly apply their mechanisms to our online DA problem because their models incorporate the assumption that trade failures never cause a loss to traders, which is not true in perishable goods markets.

It is demonstrated that no efficient and Bayesian–Nash incentive compatible exchange mechanism can be simultaneously budget balanced and individually rational [17]. For a static DA mechanism with temporal constraints, it is shown that the mechanism is dominant-strategy incentive compatible (or strategy-proof) if and only if the following three conditions are met: (1) its allocation policy is monotonic, where a buyer (seller) agent that loses with a bid (ask) v, arrival a and departure d also loses all bids (asks) with v≤ v (v≥ v), a≥ a and d≤ d (When we must distinguish between claims made by buyers and sellers, we refer to the bid from a buyer and the ask from a seller.); (2) every winning trader pays (or is paid) her critical value at which the trader first wins in some period; and (3) the payment is zero for losing traders [13]. In the perishable goods market, the third condition cannot be satisfied because losing sellers have to give up the value of their unsold goods. Even when buyers bid truthfully, the sellers have a strong incentive to discount their valuation when their departure time is approaching and their goods remain unsold. With the knowledge of the sellers’ incentive to discount their goods, buyers have an incentive to take advantage of sellers’ discounts by underbidding.

A mechanism that fails to induce truthful behavior in its participants cannot be efficient, because it does not have the information necessary to make welfare-maximizing decisions. In addition, for online DA markets, it is impossible to achieve efficiency because of the imperfect foresight about upcoming bids [18]. Therefore, neither perfect efficiency nor incentive compatibility can be achieved in the online DA market for perishable goods. In order to develop stable and socially profitable markets

Appl. Sci. 2017, 7, 1300

for perishable goods, we need to investigate a mechanism that imposes (weak) budget balance and individual rationality while promoting reasonable efﬁciency and moderate incentive compatibility.

In perishable goods markets, sellers have an incentive to discount their price for fear of spoiling unsold goods. Taking advantage of such an incentive of the sellers, buyers can increase their surplus by bidding a lower price. In our previous research [19], we designed a new online DA mechanism called the criticality-based allocation policy. In order to reduce trade failures, the mechanism prioritizes bids with closer time limits over bids with farther time limits that might produce more social surplus.

The proposed mechanism achieves fair resource allocation in the computer simulations, assuming that agents report their temporal constraints truthfully. However, in the proposed mechanism, traders have an incentive to bid late in order to improve allocation opportunities by increasing the criticality of their bids. This causes unpredictable behavior of traders and leads to deterioration of market efﬁciency.

In this study, extending our previous research, we incrementally develop a heuristic online DA mechanism for perishable goods with a standard greedy price-based allocation policy and achieve fair resource allocation by penalizing buyers’ untruthful bids while maintaining their individual rationality.

In the penalized online DA mechanism, buyers have an incentive to report a truthful value because they must pay penalties for their failed bids. At the same time, buyers’ individual rationality is not harmed because buyers are asked to pay their penalty only on the occasion of a successful matching, in which the sum of a market-clearing price and the penalty does not exceed the valuation of the buyers’ bids.

The rest of this paper is organized as follows: Section2introduces our market model. Section3 explains the settings of multi-agent simulations to be used in the following sections. Section4presents a naive mechanism design in our online DA market and experimentally analyzes buyers’ behavior trading perishable goods. Section5improves the market mechanisms and evaluates their performance based on market equilibria in an incremental fashion. Finally, Section6concludes.

Dalam dokumen Multi-Agent Systems (Halaman 46-52)