One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.

March 17, 2021 Comments Off Assignment Assignment help

Sub-task 3.1: The below grid is a variant of the grid world problem studied in unit 8. It has three inaccessible states (the black shaded squares) and four terminal states (the grey shaded squares). In each state, the agent: • receives a reward of -0.25 in a non-terminal state or of the value indicated below if in a terminal state, • ends the game if it is in a terminal state, • otherwise, it must choose to try and move to one of the neighbouring states (the horizontally or vertically adjacent states. Diagonal movement is not permitted). After attempting to move, the agent: • reaches the state it was attempting to move to with probability 0.8, • fails and makes a perpendicular move with probability 0.2 (each direction is equally likely), • if, as a result of this, the agent attempts to move outside of the grid or to an inaccessible state, it instead remains where it is. In the diagram below, the number in each state shows the (expected) utility of that state under some policy, rounded to two decimal places. Using this information, draw a diagram to show the policy which would lead to these utility values. For each of the three states highlighted in green, show how you determined the policy action for that state. -1.27 -0.92 -0.57 -0.25 0.06 -1.55 -1.27 0.41 -0.75 -0.39 -1 1 0.72 -0.35 0.05 0.46 0.7 1 -0.62 -0.31 -1 0.5 Sub-task 3.2: You are asked to work on a different variant of the grid world problem. In this variant, you do not have access to a diagram of the form given above – showing which states are adjacent – and nor do you know the rules governing how agents probabilistically transition between states after choosing an action. You are asked to propose a method for finding the optimal policy in this setting. One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.

Previous Post Next

Order Management

Discount

WHY CHOOSE US?

Time-tested quality

Master’s and Ph.D. writers

The expert team of editors

100% plagiarism-free papers

Set of Free features

Talk to your writer directly

Get FREE revision upon request

24/7 Customer Support Department

100% Money-Back guarantees

order now

Format and Features

At least 275 words per page
Free inquiry
Free title page
Free outline
Free bibliography
Free plagiarism report
Free unlimited revisions
Instant email delivery
Flexible prices and discounts

Assignment Lama

One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.

One of your colleagues proposes using passive temporal difference learning (TDL) to solve the problem. Explain to them why passive TDL would not be well suited to solving this problem.

Order Management

Discount

WHY CHOOSE US?

order now

Recent Post

Format and Features

Professional academic writer