<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>01860nam a22001697a 4500</leader>
  <controlfield tag="003">NUST</controlfield>
  <datafield tag="082" ind1=" " ind2=" ">
    <subfield code="a">005.1,KAM</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Kamal, Muhammad Haider</subfield>
    <subfield code="9">112571</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Replication of Multi-Agent Reinforcement Learning for &#x201C;Hide &amp; Seek&#x201D; Problem /</subfield>
    <subfield code="c">Muhammad Haider Kamal, </subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2=" ">
    <subfield code="a">Rawalpindi </subfield>
    <subfield code="b">MCS, NUST </subfield>
    <subfield code="c">2023</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
    <subfield code="a">viii, 79 p</subfield>
  </datafield>
  <datafield tag="505" ind1=" " ind2=" ">
    <subfield code="a">Reinforcement learning generates policies based on reward functions and hyperparameters. Slight changes in these can significantly affect results. The lack of documentation and reproducibility in Reinforcement learning research makes it difficult to replicate once-deduced strategies. While previous research has identified strategies using grounded maneuver, there is limited work in the more complex environments. The agents in this study are simulated similarly to Open Al&#x2019;s hide and seek agents, in addition to a flying mechanism, enhancing their mobility, and expanding their range of possible actions and strategies. This added functionality improves the Hider agents to develop chasing strategy from approximately 2 million steps to 1.6 million steps and hiders shelter strategy from approximately 25 million steps to 2.3 million steps while using a smaller batch size of 3072 instead of 64000. We also discuss the importance of reward functions design and deployment in a curriculum-based environment to encourage agents to learn basic skills along with the challenges in replicating these Reinforcement learning strategies. We demonstrated that the results of the reinforcement agent can be replicated in more complex environment and similar strategies are evolved including&#x201D; running and chasing&#x201D; and &#x201D;fort building&#x201D;.</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2=" ">
    <subfield code="a">MSCSE / MSSE-27 </subfield>
    <subfield code="9">112568</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
    <subfield code="b">MSCSE / MSSE</subfield>
    <subfield code="9">112573</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Supervisor Dr. Muaz Ahmed Khan Niazi</subfield>
    <subfield code="9">112572</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="2">ddc</subfield>
    <subfield code="c">THE</subfield>
  </datafield>
  <datafield tag="999" ind1=" " ind2=" ">
    <subfield code="c">594849</subfield>
    <subfield code="d">594849</subfield>
  </datafield>
  <datafield tag="952" ind1=" " ind2=" ">
    <subfield code="0">0</subfield>
    <subfield code="1">0</subfield>
    <subfield code="4">0</subfield>
    <subfield code="7">0</subfield>
    <subfield code="a">MCS</subfield>
    <subfield code="b">MCS</subfield>
    <subfield code="c">THE</subfield>
    <subfield code="d">2023-05-25</subfield>
    <subfield code="o">005.1,KAM</subfield>
    <subfield code="p">MCSTCS-544</subfield>
    <subfield code="y">THE</subfield>
  </datafield>
</record>
