UNIT 2 - AI - Merged
UNIT 2 - AI - Merged
ARTIFICIAL INTELLIGENCE
➢ It is the one that uses problem-specific knowledge beyond the definition of the problem
itself.
➢ A*
o MA*
o IDA*
o R-BFS
o SMA*
➢ The evaluation function is construed as a cost estimate, so the node with the lowest
evaluation is expanded first.
➢ The implementation of best-first graph search is identical to that for uniform-cost search
except for the use of f instead of g to order the priority queue.
Advantages:
➢ Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
➢ This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
➢ It can behave as an unguided depth-first search in the worst case scenario.
➢ It can get stuck in a loop as DFS.
➢ This algorithm is not optimal.
HEURISTIC FUNCTION
h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
➢ Heuristic functions are the most common form in which additional knowledge of the
problem is imparted to the search algorithm.
GREEDY BFS
➢ Greedy best-first search tries to expand the node that is closest to the goal, on the
grounds that this is likely to lead to a solution quickly.
➢ Thus, it evaluates nodes by using just the heuristic function; that is, f(n) = h(n).
➢ The algorithm is called “greedy”—at each step it tries to get as close to the goal as it can.
ALGORITHM:
IMPLEMENTATION
PERFORMANCE EVALUATION:
a. Time Complexity: The worst-case time complexity of Greedy best first search is
O(bm).
b. Space Complexity: The worst-case space complexity of Greedy best first search
is O(bm). Where, m is the maximum depth of the search space.
c. Complete: Greedy best-first search is also incomplete, even if the given state
space is finite.
d. Optimal: Greedy best first search algorithm is not optimal.
Note:
✓ With a good heuristic function, however, the complexity can be reduced substantially.
✓ The amount of the reduction depends on the particular problem and on the quality of the
heuristic.
A* ALGORITHM
➢ The most widely known form of best-first search is called A∗ search. It evaluates nodes
by combining g(n), the cost to reach the node, and h(n), the cost to get from the node to
the goal:
o f(n) = g(n) + h(n)
➢ Since g(n) gives the path cost from the start node to node n, and h(n) is the estimated cost
of the cheapest path from n to the goal, we have
➢ Thus, if we are trying to find the cheapest solution, a reasonable thing to try first is the
node with the lowest value of g(n) + h(n). It turns out that this strategy is more than just
reasonable: provided that the heuristic function h(n) satisfies certain conditions, A∗
search is both complete and optimal. The algorithm is identical to UNIFORM-COST-
SEARCH except that A∗ uses g + h instead of g.
ALGORITHM:
The tree-search version of A* is optimal if h(n) is admissible, while the graph-search version is
optimal if h(n) is consistent.
EXAMPLE:
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of
all states is given in the below table so we will calculate the f(n) of each state using the formula
f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Solution:
Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost
6.
PERFORMANCE EVALUATION:
ITERATIVE DEEPENING A*
➢ The simplest way to reduce memory requirements for A∗ is to adapt the idea of iterative
deepening to the heuristic search context, resulting in the iterative-deepening A∗ (IDA∗)
algorithm.
➢ The main difference between IDA∗ and standard iterative deepening is that the cutoff
used is the f-cost (g+h) rather than the depth; at each iteration, the cutoff value is the
smallest f-cost of any node that exceeded the cutoff on the previous iteration.
ALGORITHM:
EXAMPLE:
SOLUTION:
We want to find the optimal path from node A to node F using the IDA* algorithm. The
first step is to set an initial cost limit. Let's use the heuristic estimate of the optimal path,
which is 7 (the sum of the costs from A to C to F).
1. Set the cost limit to 7.
2. Start the search at node A.
3. Expand node A and generate its neighbors, B and C.
4. Evaluate the heuristic cost of the paths from A to B and A to C, which are 5 and 10
respectively.
5. Since the cost of the path to B is less than the cost limit, continue the search from node B.
6. Expand node B and generate its neighbors, D and E.
7. Evaluate the heuristic cost of the paths from A to D and A to E, which are 10 and 9
respectively.
8. Since the cost of the path to D exceeds the cost limit, backtrack to node B.
9. Evaluate the heuristic cost of the path from A to C, which is 10.
10. Since the cost of the path to C is less than the cost limit, continue the search from node C.
11. Expand node C and generate its neighbor, F.
12. Evaluate the heuristic cost of the path from A to F, which is 7.
13. Since the cost of the path to F is less than the cost limit, return the optimal path, which is
A - C - F.
ADVANTAGES:
➢ IDA∗ is practical for many problems with unit step costs and avoids the substantial
overhead associated with keeping a sorted queue of nodes.
➢ Completeness: The IDA* method is a complete search algorithm, which means that, if an
optimum solution exists, it will be discovered.
➢ Memory effectiveness: The IDA* method only keeps one path in memory at a time,
making it memory efficient.
➢ Flexibility: Depending on the application, the IDA* method may be employed with a
number of heuristic functions.
➢ Performance: The IDA* method sometimes outperforms other search algorithms like
uniform-cost search (UCS) or breadth-first search (BFS) (UCS).
DISADVANTAGES
➢ Unfortunately, it suffers from the same difficulties with real valued costs as does the
iterative version of uniform-cost search.
➢ Although IDA* is memory-efficient in that it only saves one path at a time, there are
some situations when it may still be necessary to use a substantial amount of memory.
RECURSIVE-BFS:
Recursive best-first search (RBFS) is RECURSIVE a simple recursive algorithm that attempts
to mimic the operation of standard best-first search, but using only linear space.
Its structure is similar to that of a recursive depth-first search, but rather than continuing
indefinitely down the current path, it uses the f limit variable to keep track of the f-value of the
best alternative path available from any ancestor of the current node. If the current node exceeds
this limit, the recursion unwinds back to the alternative path. As the recursion unwinds, RBFS
replaces the f-value of each node along the path with a backed-up value—the best f-value of its
children. In this way, RBFS remembers the f-value of the best leaf in the forgotten subtree and
can therefore decide whether it’s worth re-expanding the subtree at some later time. RBFS is
somewhat more efficient than IDA∗, but still suffers from excessive node regeneration.
EXAMPLE:
PERFORMANCE EVALUATION:
1.OPTIMALITY: Like A∗ tree search, RBFS is an optimal algorithm if the heuristic function
h(n) is
admissible.
✓ Its space complexity is linear in the depth of the deepest optimal solution, but its time
complexity is rather difficult to characterize: it depends both on the accuracy of the
heuristic function and on how often the best path changes as nodes are expanded.
✓ IDA∗ and RBFS suffer from using too little memory. Between iterations, IDA∗ retains
only a single number: the current f-cost limit. RBFS retains more information in memory,
but it uses only linear space: even if more memory were available, RBFS has no way to
make use of it. Because they forget most of what they have done, both algorithms may
end up re-expanding the same states many times over.
SMA*
Refer handwritten notes.
LOCAL SEARCH ALGORITHM AND OPTIMISATION PROBLEM
The informed and uninformed search expands the nodes systematically in two ways:
Which leads to a solution state required to reach the goal node. But beyond these “classical
search algorithms," we have some “local search algorithms” where the path cost does not
matters, and only focus on solution-state needed to reach the goal node.
A local search algorithm completes its task by traversing on a single current node rather than
multiple paths and following the neighbors of that node generally.
Although local search algorithms are not systematic, still they have the following two
advantages:
Local search algorithms use a very little or constant amount of memory as they operate
only on a single path.
Most often, they find a reasonable solution in large or infinite state spaces where the
classical or systematic algorithms do not work.
Does the local search algorithm work for a pure optimized problem?
Yes, the local search algorithm works for pure optimized problems. A pure optimization problem
is one where all the nodes can give a solution. But the target is to find the best state out of all
according to the objective function. Unfortunately, the pure optimization problem fails to find
high-quality solutions to reach the goal state from the current state.
Note: An objective function is a function whose value is either minimized or maximized in
different contexts of the optimization problems. In the case of search algorithms, an objective
function can be the path cost for reaching the goal node, etc.
Working of a Local search algorithm
Let's understand the working of a local search algorithm with the help of an example:
Consider the below state-space landscape having both:
Location: It is defined by the state.
Elevation: It is defined by the value of the objective function or heuristic cost function.
The local search algorithm explores the above landscape by finding the following two points:
Global Minimum: If the elevation corresponds to the cost, then the task is to find the
lowest valley, which is known as Global Minimum.
Global Maxima: If the elevation corresponds to an objective function, then it finds the
highest peak which is called as Global Maxima. It is the highest point in the valley.
➢ It is simply a loop that continually moves in the direction of increasing value—that is,
uphill. It terminates when it reaches a “peak” where no neighbor has a higher value.
➢ The algorithm does not maintain a search tree, so the data structure for the current node
need only record the state and the value of the objective function.
➢ Hill climbing does not look ahead beyond the immediate neighbors of the current state.
This resembles trying to find the top of Mount Everest in a thick fog while suffering from
amnesia.
To understand the concept of hill climbing algorithm, consider the below landscape representing
the goal state/peak and the current state of the climber. The topographical regions shown in the
figure can be defined as:
Global Maximum: It is the highest point on the hill, which is the goal state.
Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It
is a saturated point of the hill.
Shoulder: It is also a flat area where the summit is possible.
Current state: It is the current position of the person.
FLOWCHART
It implements stochastic hill climbing by generating successors randomly until one is generated
that is better than the current state. This is a good strategy when a state has many (e.g.,
thousands) of successors.
➢ Local Maxima: It is that peak of the mountain which is highest than all its neighboring
states but lower than the global maxima. It is not the goal peak because there is another peak
higher than it.
➢ Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.
➢ Ridges: It is a challenging problem where the person finds two or more local maxima of
the same height commonly. It becomes difficult for the person to navigate the right point
and stuck to that point itself.
A hill-climbing algorithm that never makes “downhill” moves toward states with lower
value(or higher cost) is guaranteed to be incomplete, because it can get stuck on a local
maximum.In contrast, a purely random walk—that is, moving to a successor chosen uniformly at
random from the set of successors—is complete but extremely inefficient. Therefore, it seems
reasonable to try to combine hill climbing with a random walk in some way that yields both
efficiency and completeness. Simulated annealing is such an algorithm.
In metallurgy, annealing is the process used to temper or harden metals and glass by
heating them to a high temperature and then gradually cooling them, thus allowing the material
to reach a low energy crystalline state. To explain simulated annealing, we switch our point of
view from hill climbing to gradient descent (i.e., minimizing cost) and imagine the task of
getting a ping-pong ball into the deepest crevice in a bumpy surface. If we just let the ball roll, it
will come to rest at a local minimum. If we shake the surface, we can bounce the ball out of the
local minimum. The trick is to shake just hard enough to bounce the ball out of local minima but
not hard enough to dislodge it from the global minimum.
Instead of picking the best move, however, it picks a random move. If the move improves
the situation, it is always accepted. Otherwise, the algorithm accepts the move with some
probability less than 1. The probability decreases exponentially with the “badness” of the
move—the amount ΔE by which the evaluation is worsened. The probability also decreases as
the “temperature” T goes down: “bad” moves are more likely to be allowed at the start when T is
high, and they become more unlikely as T decreases. If the schedule lowers T slowly enough, the
algorithm will find a global optimum with probability approaching 1.
Simulated annealing was first used extensively to solve VLSI layout problems in the
early 1980s. It has been applied widely to factory scheduling and other large-scale optimization
tasks.
ALGORITHM:
EXAMPLE:
• Problem: Sudoku - Find a completed Sudoku grid that satisfies all the rules.
• Start with an initial partially filled Sudoku grid.
• While the temperature is above the minimum temperature:
– Randomly select an empty cell and assign a random number.
– Calculate the number of conflicts in the current and neighbor grids.
– If the neighbor grid has fewer conflicts, accept it.
– If it has more conflicts, accept it with a probability based on the Boltzmann
distribution and the current temperature.
• Decrease the temperature.
• Continue until a termination criterion is met.
• Output the Sudoku grid with the fewest conflicts or a solved Sudoku puzzle if zero
conflicts are reached.
The local beam search algorithm keeps track of k states rather than just one. It begins with k
randomly generated states. At each step, all the successors of all k states are generated. If any one is a
goal, the algorithm halts. Otherwise, it selects the k best successors from the complete list and repeats.At
first sight, a local beam search with k states might seem to be nothing more than running k random
restarts in parallel instead of in sequence.
In fact, the two algorithms are quite different. In a random-restart search, each search process
runs independently of the others. In a local beam search, useful information is passed among the parallel
search threads. In effect, the states that generate the best successors say to the others, “Come over
here, the grass is greener!” The algorithm quickly abandons unfruitful searches and moves its resources to
where the most progress is being made. In its simplest form, local beam search can suffer from a lack of
diversity among the k states—they can quickly become concentrated in a small region of the state space,
making the search little more than an expensive version of hill climbing.
A variant called stochastic beam search, analogous to stochastic hill climbing, helps alleviate
this problem. Instead of choosing the best k from the the pool of candidate successors, stochastic beam
search chooses k successors at random, with the probability of choosing a given successor being an
increasing function of its value. Stochastic beam search bears some resemblance to the process of natural
selection, whereby the “successors” (offspring) of a “state” (organism) populate the next generation
according to its “value” (fitness).
GENETIC ALGORITHM:
A genetic algorithm (or GA) is a variant of stochastic beam search in which successor states are
generated by combining two parent states rather than by modifying a single state.
GAs begins with a set of k randomly generated states, called the population.
2.INDIVIDUAL
Each state, or individual, is represented as a string over a finite alphabet—most commonly, a string of 0s
and 1s.
3.FITNESS FUNCTION
The production of the next generation of states is shown in Figure 4.6(b)–(e). In (b),each state is rated by
the objective function, or (in GA terminology) the fitness function.
A fitness function should return higher values for better states, so, for the 8-queens problem we use the
number of nonattacking pairs of queens, which has a value of 28 for a solution. The values of the four
states are 24, 23, 20, and 11.
In this particular variant of the genetic algorithm, the probability of being chosen for reproducing is
directly proportional to the fitness score, and the percentages are shown next to the raw scores.
4.CROSS OVER
For each pair to be mated, a crossover point is chosen randomly from the positions in the string. In Figure
4.6, the crossover points are after the third digit in the first pair and after the fifth digit
in the second pair. In (d), the offspring themselves are created by crossing over the parent strings at the
crossover point. For example, the first child of the first pair gets the first three digits from the first parent
and the remaining digits from the second parent, whereas the second child gets the first three digits from
the second parent and the rest from the first parent. The 8-queens states involved in this reproduction step
are shown in Figure 4.7. The example shows that when two parent states are quite different, the crossover
operation can produce a state that is a long way from either parent state. It is often the case that the
population is quite diverse early on in the process, so crossover (like simulated annealing) frequently
takes large steps in the state space early in the search process and smaller steps later on when most
individuals are quite similar.
5.MUTATION:
Finally, in (e), each location is subject to random mutation with a small independent probability. One
digit was mutated in the first, third, and fourth offspring. In the 8-queens problem, this corresponds to
choosing a queen at random and moving it to a random square in its column. Figure 4.8 describes an
algorithm that implements all these steps.
ADVANTAGE OF GENETIC ALGORITHM
Like stochastic beam search, genetic algorithms combine an uphill tendency with random exploration and
exchange of information among parallel search threads. The primary advantage, if any, of genetic
algorithms comes from the crossover operation. Yet it can be shown mathematically that, if the positions
of the genetic code are permuted initially in a random order, crossover conveys no advantage. Intuitively,
the advantage comes from the ability of crossover to combine large blocks of letters that have evolved
independently to perform useful functions, thus raising the level of granularity at which the search
operates. For example, it could be that putting the first three queens in positions 2, 4, and 6 (where they
do not attack each other) constitutes a useful block that can be combined with other blocks toconstruct a
solution.
The theory of genetic algorithms explains how this works using the idea of a schema,which is a
substring in which some of the positions can be left unspecified. For example,the schema 246*****
describes all 8-queens states in which the first three queens are in positions 2, 4, and 6, respectively.
Strings that match the schema (such as 24613578) are called instances of the schema. It can be shown
that if the average fitness of the instances ofa schema is above the mean, then the number of instances of
the schema within the population will grow over time. Clearly, this effect is unlikely to be significant if
adjacent bits are totally unrelated to each other, because then there will be few contiguous blocks that
provide a consistent benefit. Genetic algorithms work best when schemata correspond to meaningful
components of a solution. For example, if the string is a representation of an antenna, then the schemata
may represent components of the antenna, such as reflectors and deflectors.
ALGORITHM:
APPLICATION OF GENETIC ALGORITHM
In practice, genetic algorithms have had a widespread impact on optimization problems, such as circuit
layout and job-shop scheduling. At present, it is not clear whether the appeal of genetic algorithms arises
from their performance or from their æsthetically pleasing origins in the theory of evolution. Much work
remains to be done to identify the conditions under which genetic algorithms perform well.
➢ We assumed that the environment is fully observable and deterministic and that the agent
knows what the effects of each action are. Therefore, the agent can calculate exactly which state
results from any sequence of actions and always knows which state it is in. Its percepts provide
no new information after each action, although of course they tell the agent the initial state.
➢ When the environment is either partially observable or nondeterministic (or both), percepts
become useful. In a partially observable environment, every percept helps narrow down the set of
possible states the agent might be in, thus making it easier for the agent to achieve its goals.
When the environment is nondeterministic, percepts tell the agent which of the possible outcomes
of its actions has actually occurred. In both cases, the future percepts cannot be determined in
advance and the agent’s future actions will depend on those future percepts.So the solution to a
problem is not a sequence but a contingency plan (also known as a strategy) that specifies what
to do depending on what percepts are received.
The state space has eight states, as shown in Figure 4.9. There are three actions—Left, Right, and
Suck—and the goal is to clean up all the dirt (states 7 and 8).
Now suppose that we introduce nondeterminism in the form of a powerful but erratic vacuum cleaner. In
the erratic vacuum world, the Suck action works as follows:
• When applied to a dirty square the action cleans the square and sometimes cleans up
To provide a precise formulation of this problem, we need to generalize the notion of a transition
model . Instead of defining the transition model by a RESULT function that returns a single state, we use a
RESULTS function that returns a set of possible outcome states. For example, in the erratic vacuum world,
the Suck action in state 1 leads to a state in the set {5, 7}—the dirt in the right-hand square may or may
SOLUTION:
We also need to generalize the notion of a solution to the problem. For example, if we start in state 1,
there is no single sequence of actions that solves the problem. Instead, we need a contingency plan such
as the following:
Thus, solutions for nondeterministic problems can contain nested if–then–else statements;this means that
they are trees rather than sequences. This allows the selection of actions based on contingencies arising
during execution. Many problems in the real, physical world are contingency problems because exact
prediction is impossible. For this reason, many people keep their eyes open while walking around or
driving.
AND-OR SEARCH TREE
In a deterministic environment, the only branching is introduced by the agent’s own choices in each state.
We call these nodes OR nodes. In the vacuum world, for example, at an OR
node the agent chooses Left or Right or Suck.
One may also consider a somewhat different agent design, in which the agent can act before it has
found a guaranteed plan and deals with some contingencies only as they arise INTERLEAVING during execution.
This type of interleaving of search and execution is also useful for exploration problems and for game
playing
Figure 4.11 gives a recursive, depth-first algorithm for AND–OR graph search. One key aspect of
the algorithm is the way in which it deals with cycles, which often arise in nondeterministic problems
(e.g., if an action sometimes has no effect or if an unintended effect can be corrected). If the current state
is identical to a state on the path from the root, then it returns with failure. This doesn’t mean that there is
no solution from the current state;it simply means that if there is a noncyclic solution, it must be reachable
from the earlier incarnation of the current state, so the new incarnation can be discarded.
With this check, we ensure that the algorithm terminates in every finite state space, because every
path must reach a goal, a dead end, or a repeated state. Notice that the algorithm does not check whether
the current state is a repetition of a state on some other path from the root, which is important for
efficiency.
AND–OR graphs can also be explored by breadth-first or best-first methods. The concept of a heuristic
function must be modified to estimate the cost of a contingent solution rather than a sequence, but the
notion of admissibility carries over and there is an analog of the A∗ algorithm for finding optimal
solutions.
The key concept required for solving partially observable problems is the belief state, representing the
agent’s current belief about the possible physical states it might be in, given the sequence of actions and
percepts up to that point.
SEARCHING WITH NO OBSERVATION
When the agent’s percept provides no information at all, we have what is called a sensor less
problem or sometimes a conformant problem. At first, one might think the sensor less agent has no hope
of solving a problem if it has no idea what state it’s in; in fact, sensor less problems are quite often
solvable. Moreover, sensor less agents can be surprisingly useful, primarily because they don’t rely on
sensors working properly.
In manufacturing systems, for example, many ingenious methods have been developed for
orienting parts correctly from an unknown initial position by using a sequence of actions with no sensing
at all. The high cost of sensing is another reason to avoid it: for example, doctors often prescribe a broad-
spectrum antibiotic rather than using the contingent plan of doing an expensive blood test, then waiting
for the results to come back, and then prescribing a more specific antibiotic and perhaps hospitalization
because the infection has progressed too far.
We can make a sensor less version of the vacuum world. Assume that the agent knows the
geography of its world, but doesn’t know its location or the distribution of dirt. In that case, its initial state
could be any element of the set {1, 2, 3, 4, 5, 6, 7, 8}. Now, consider what happens if it tries the
action Right. This will cause it to be in one of the states {2, 4, 6, 8}—the agent now has more
information! Furthermore, the action sequence [Right, Suck] will always end up in one of the states {4,
8}. Finally, the sequence [Right, Suck, Left, Suck] is guaranteed to reach the goal state 7 no matter what
the start state. We say that the agent can coerce the world into state 7.
Steps in Solving a problem in No Observation Environment:
Searching with Partial Observation:
➢ For a general partially observable problem, we have to specify how the environment generates
percepts for the agent. For example, we might define the local-sensing vacuum world to be one in
which the agent has a position sensor and a local dirt sensor but has no sensor capable of
detecting dirt in other squares.
➢ The formal problem specification includes a PERCEPT(s) function that returns the percept
received in a given state. (If sensing is nondeterministic, then we use a PERCEPTS function that
returns a set of possible percepts.) For example, in the local-sensing vacuum world, the
PERCEPT in state 1 is [A, Dirty]. Fully observable problems are a special case in which
PERCEPT(s)=s for every state s, while sensorless problems are a special case in which
PERCEPT(s)=null.
➢ When observations are partial, it will usually be the case that several states could have produced
any given percept. For example, the percept [A, Dirty] is produced by state 3 as well as by state 1.
Hence, given this as the initial percept, the initial belief state for the local-sensing vacuum world
will be {1, 3}. The ACTIONS, STEP-COST, and GOAL-TEST are constructed from the
underlying physical problem just as for sensorless problems, but the transition model is a bit more
complicated. We can think of transitions from one belief state to the next for a particular action as
occurring in three stages.
1. The prediction stage is the same as for sensorless problems: given the action a in belief state b,
the predicted belief state is ˆb =PREDICT(b, a).11
2. The observation prediction stage determines the set of percepts o that could be observed in the
predicted belief state:
3. The update stage determines, for each possible percept, the belief state that would result from the
percept. The new belief state bo is just the set of states in ˆb that could have produced the
percept:
Notice that each updated belief state bo can be no larger than the predicted belief state ˆ b; observations
can only help reduce uncertainty compared to the sensorless case. Moreover, for deterministic sensing,
the belief states for the different possible percepts will be disjoint, forming a partition of the original
predicted belief state.
Solving partially observable problems
The preceding section showed how to derive the RESULTS function for a nondeterministic belief-state
problem from an underlying physical problem and the PERCEPT function. Given
An agent for partially observable environments:
The design of a problem-solving agent for partially observable environments is quite similar to the simple
problem-solving agent in Figure 3.1: the agent formulates a problem, calls a search algorithm (such as
AND-OR-GRAPH-SEARCH) to solve it, and executes the solution. There are two main differences.
First, the solution to a problem will be a conditional plan rather than a sequence; if the first step is an if–
then–else expression, the agent will need to test the condition in the if-part and execute the then-part or
the else-part accordingly. Second, the agent will need to maintain its belief state as it performs actions and
receives percepts. This process resembles the prediction–observation–update process in Equation (4.5)
but is actually simpler because the percept is given by the environment rather than calculated by the
➢ Figure 4.17 shows the belief state being maintained in the kindergarten vacuum world with local
sensing, wherein any square may become dirty at any time unless the agent is actively cleaning it
at that moment.
➢ In partially observable environments—which include the vast majority of real-world
environments—maintaining one’s belief state is a core function of any intelligent system. This
function goes under various names, including monitoring, filtering and state estimation.
Equation (4.6) is called a recursive state estimator because it computes the new belief state from
the previous one rather than by examining the entire percept sequence.
➢ If the agent is not to “fall behind,” the computation has to happen as fast as percepts are coming
in. As the environment becomes more complex, the exact update computation becomes infeasible
and the agent will have to compute an approximate belief state, perhaps focusing on the
implications of the percept for the aspects of the environment that are of current interest.
ONLINE SEARCH AGENTS AND UNKNOWN ENVIRONMENTS
They compute a complete solution before setting foot in the real world and then execute the
solution.
In contrast, an online search agent interleaves computation and action: first, it takes an action,
then it observes the environment and computes the next action.
After each action, an online agent receives a percept telling it what state it has reached; from this
information, it can augment its map of the environment. The current map is used to decide where
to go next. This interleaving of planning and action means that online search algorithms are quite
different from the offline search algorithms.
An online search problem must be solved by an agent executing actions, rather than by pure
computation. We assume a deterministic and fully observable environment but we stipulate that
the agent knows only the following:
Typically, the agent’s objective is to reach a goal state while minimizing cost. (Another possible
objective is simply to explore the entire environment.) The cost is the total path cost of the path
that the agent actually travels. It is common to compare this cost with the path cost of the path the
agent would follow if it knew the search space in advance—that is, the actual shortest path (or
shortest complete exploration). In the language of online algorithms, this is called the competitive
ratio; we would like it to be as small as possible.
ADVERSARY ARGUMENT:
If some actions are irreversible—IRREVERSIBLE i.e., they lead to a state from which no
action leads back to the previous state—the online search might accidentally reach a dead-
end state from which no goal state is reachable.Our claim, to be more precise, is that no
algorithm can avoid dead ends in all state spaces.
To an online search algorithm that has visited states S and A, the two state spaces look
identical, so it must make the same decision in both. Therefore, it will fail in one of them.
This is an example of an adversary argument.
Like depth-first search, hill-climbing search has the property of locality in its node expan sions. In fact,
because it keeps just one current state in memory, hill-climbing search is already an online search
algorithm! Unfortunately, it is not very useful in its simplest form because it leaves the agent sitting at
local maxima with nowhere to go. Moreover, random restarts cannot be used, because the agent cannot
transport itself to a new state.
Augmenting hill climbing with memory rather than randomness turns out to be a more effective approach.
LRTA* algorithm:
The basic idea is to store a “current best estimate” H(s) of the cost to reach the goal from each state that
has been visited. H(s) starts out being just the heuristic estimate h(s) and is updated as the agent gains
experience in the state space.
Figure 4.23 shows a simple example in a one-dimensional state space. In (a), the agent seems to be stuck
in a flat local minimum at the shaded state.
Rather than staying where it is, the agent should follow what seems to be the best path to the
goal given the current cost estimates for its neighbors.
The estimated cost to reach the goal through a neighbor s is the cost to get to s plus the estimated
cost to get to a goal from there—that is, c(s,a,s)+H(s).
There are two actions in the example, with estimated costs 1+9and 1+2, so it seems best to move
right.
Now, it is clear that the cost estimate of 2 for the shaded state was overly optimistic.
Since the best move cost 1 and led to a state that is at least 2 steps from a goal, the shaded state
must be at least 3 steps from a goal, so its H should be updated accordingly, as shown in Figure
4.23(b).
Continuing this process, the agent will move back and forth twice more, updating H each time and
“flattening out” the local minimum until it escapes to the right.
An LRTA∗agent is guaranteed to find goal in any finite, safely explorable environment. Unlike A∗, however,
it is not complete for infinite state spaces—there are cases where it can be led infinitely astray. It can
explore an environment of n states in O(n2) steps in the worst case.
The initial ignorance of online search agents provides several opportunities for learning.
First, the agents learn a “map” of the environment—more precisely, the outcome of each action in each
state—simply by recording each of their experiences. (Notice that the assumption of deterministic
environments means that one experience is enough for each action.)
Second, the local search agents acquire more accurate estimates of the cost of each state by using local
updating rules, as in LRTA∗.