One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.
One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.
March 17, 2021 Comments Off on One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem. Assignment Assignment helpSub-task 3.1: The below grid is a variant of the grid world problem studied in unit 8. It has three inaccessible states (the black shaded squares) and four terminal states (the grey shaded squares). In each state, the agent: • receives a reward of -0.25 in a non-terminal state or of the value indicated below if in a terminal state, • ends the game if it is in a terminal state, • otherwise, it must choose to try and move to one of the neighbouring states (the horizontally or vertically adjacent states. Diagonal movement is not permitted). After attempting to move, the agent: • reaches the state it was attempting to move to with probability 0.8, • fails and makes a perpendicular move with probability 0.2 (each direction is equally likely), • if, as a result of this, the agent attempts to move outside of the grid or to an inaccessible state, it instead remains where it is. In the diagram below, the number in each state shows the (expected) utility of that state under some policy, rounded to two decimal places. Using this information, draw a diagram to show the policy which would lead to these utility values. For each of the three states highlighted in green, show how you determined the policy action for that state. -1.27 -0.92 -0.57 -0.25 0.06 -1.55 -1.27 0.41 -0.75 -0.39 -1 1 0.72 -0.35 0.05 0.46 0.7 1 -0.62 -0.31 -1 0.5 Sub-task 3.2: You are asked to work on a different variant of the grid world problem. In this variant, you do not have access to a diagram of the form given above – showing which states are adjacent – and nor do you know the rules governing how agents probabilistically transition between states after choosing an action. You are asked to propose a method for finding the optimal policy in this setting. One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.

Order Management
Discount

WHY CHOOSE US?
Time-tested quality
Master’s and Ph.D. writers
The expert team of editors
100% plagiarism-free papers
Set of Free features
Talk to your writer directly
Get FREE revision upon request
24/7 Customer Support Department
100% Money-Back guarantees
order now

Recent Post
Format and Features
- At least 275 words per page
- Free inquiry
- Free title page
- Free outline
- Free bibliography
- Free plagiarism report
- Free unlimited revisions
- Instant email delivery
- Flexible prices and discounts
Professional academic writer
