UCLan, CO3519 (2021/22), semester 1: Glossary
Agent
An agent is a system that interacts with its surroundings. It receives
percepts through sensors and can carry out actions through actuators.
- Beside its sensors and actuators, an agent is characterized by
its agent function: The way in which the past and present percepts
determine or influence the present and future actions.
- A goal-oriented agent is an agent that exhibits the
tendency "to achieve a certain state of the
world" (Conte 2009, p. 2578).
Goal-orientation can emerge by a multitude of mechanisms, including biological evolution.
It does not necessarily require the agent to be
consciously aware of its goals.
- "Intelligent agents are goal-oriented agents using their
knowledge to solve problems, including taking decisions and planning
actions" (Conte 2009, p. 2578). This requires the agent to have some
kind of internal representation of its surroundings, and to store and
process information about its surroundings.
- A knowledge-based agent is an intelligent agent that uses
a knowledge base to store and process its
information about its surroundings.
- A rational agent is an intelligent agent that exhibits rationality,
i.e., a tendency toward optimizing a quantity: The performance measure
of the agent. As in the case of goal-orientation, this does not necessarily
require the agent to be aware of its performance measure.
- "Goal-directed agents are intelligent agents that have an
internal representation of the goals they [tend to] achieve" (Conte 2009, p. 2578).
See also: Inductive reasoning,
Knowledge base,
Rationality,
Turing test.
Dimension
Colloquially, the dimension of a space, set, or object is very clear to us:
A line or curve is one-dimensional, hence it has dimension 1; surfaces are two-dimensional,
they have dimension 2; volumes have dimension 3; and so on. There are two major ways
in defining the dimension of something that is, in the broadest possible sense,
a geometrical object:
In how many independent directions can you go while remaining on the object? That is its dimension.
Independence here means linear independence; where one direction would be linearly dependent
if it could be obtained as a linear combination of the other directions. This definition
of dimensionality originates in the theory of vectors and vector spaces, where
directions and positions are represented by vectors.
For example, on the surface of earth, starting from any point except the north pole,
we might go north or go west; these two directions are linearly independent - you cannot
formulate "going west" as some way of "going north." Add any third direction
and it becomes redundant; the third direction can be expressed as a linear combination of going
north and going west: For example, you can go southwest by going north by a negative distance
and then west by a positive distance. Therefore, the surface of earth is a two-dimensional object:
It has dimension 2.
If we scale up all the lengths (or all coordinates, or similar) in our system by
some constant factor c, how does this scale up the size of the object? If it increases
by the factor cd, this means that d is the dimension of the object.
This definition is sometimes called the Kolmogorov dimension. It is particularly helpful
for fractals: Objects that have a non-integer Kolmogorov dimension.
However, for typical (non-fractal) objects and spaces, this is simply another way of expressing
the same concept of dimensionality as above.
For example, the surface area of a sphere with the radius r is given by 4πr2.
If we scale up all the lengths by a factor two, c = 2, we obtain a sphere with the radius
2r, and its surface area is 4π(2r)2 = 16πr2;
the size of the surface has increased by the factor cd = 22 = 4, indicating that
the surface of a sphere is a two-dimensional object: It has dimension 2.
See also: Hypothesis,
Optimization parameter,
Optimization objective,
Pareto optimality.
Hypothesis
In machine learning, a hypothesis is a
function y = f(x0, x1, …) that
predicts an outcome variable y
on the basis of values of one or multiple independent
variables x0, x1, …
- In this sense, hypothesis means the same
as model or model function, where the independent variables in the hypothesis
are the arguments of the model function; it can also be called a correlation or a regression.
All these words have other meanings as well (and so does the word "hypothesis"),
creating an unfortunate level of ambiguity. Be aware of this potential for ambiguity and misunderstandings.
You can counteract it by providing clear definitions.
- A hypothesis space is a kind of model, or a model class, with free parameters (i.e., model parameters)
that can be adjusted to optimize quantitative agreement with the data. For example, with two independent
variables x0 and x1, the hypothesis
space for linear regression is given by the space (i.e., set)
of functions that have the form f(x0, x1)
= ax0 + bx1 + c, with three adjustable parameters: a, b, and c.
- Model parameterization, or the process of computing a regression,
means to solve an optimization problem where the hypothesis space is the parameter space,
and some measure for model quality is the optimization objective.
If this is done by the ordinary least squares (OLS) method,
the root mean square deviation between
the hypothesis and the actual data is used as a minimization objective.
Colloquially it is also possible and common to refer to the hypothesis space as "the hypothesis."
See also: Dimension,
Linear regression,
Optimization objective,
Optimization parameter,
Overfitting,
Regression analysis,
Supervised learning,
Validation and testing.
Inductive reasoning
There are three major ways for an intelligent agent to acquire knowledge,
but only one of them - namely, inductive reasoning - is commonly called learning in an AI context.
These three ways to improve the agent's understanding of the world are:
- Direct input as part of the percepts received through the agent's sensors. Depending on the kind of agent,
this may include observation of the surroundings or data ingest by an authorized user. For example:
- I know that the sun is shining when I see it shine. (Observation)
- The administrator has ingested data on the past season's football
results into the agent's knowledge base, and our agent is programmed to
work with this information. (Data ingest)
- Logical or mathematical reasoning by which the agent explores the consequences
of its axioms, i.e., of the propositions that it accepts as true to begin with.
All that can be proven to be true on that basis must also be accepted as true.
This way of thinking is called deductive reasoning. a) Jack talked to me yesterday,
b) Jack is a human, and c) humans can only talk after they have been born,
therefore d) Jack was born yesterday or before. If the three premises
a), b), and c) are indeed true, then the consequence d) must be true as well.
This is called automated reasoning if it is done by an algorithm.
- Detecting patterns and trends, i.e., correlations between phenomena,
in the knowledge available to the agent. This can be done to better explain and understand
that knowledge, creating a mental model of it. It can also be done in order to predict the
behaviour of an observed system under conditions for which no data have been provided so far.
Normally, both of these goals are pursued at the same time. This is inductive reasoning;
it is called machine learning if it is done by an algorithm.
Colloquially, all the three items above might be called "learning," since they are ways
of expanding an agent's knowledge. For example, learning from a book or a teacher is of the first type,
whereas studying mathematics is often of the second type. In AI, the term learning and particularly
machine learning is typically understood to refer to inductive reasoning only,
whereas reasoning without any further qualification usually means deductive reasoning.
See also: Agent,
Knowledge base,
Regression analysis,
Supervised learning,
Validation and testing.
Knowledge base
"The central component of a knowledge-based agent is its knowledge
base" (Russell & Norvig 2021, p. 227). Interactions with a knowledge base take two forms:
- Data ingest to extend or update the information about the world.
- Data retrieval by querying.
Knowledge bases are typically designed to support deductive
reasoning (logical inference and theorem proving).
See also: Agent,
Inductive reasoning.
Linear regression
Linear regression is the most common way of conducting regression analysis.
It considers the hypothesis space where the model is linear in terms of all the
independent variables: The outcome variable is expressed as a linear combination of the other variables.
- For example, if y is to be described as a function of u, v, and w,
linear regression will determine optimal values of a, b, c, and d
such that the available data set (consisting of instances with known, given values
of u, v, w, and y) is represented by the
equation y = au + bv + cw + d as accurately as possible.
- The methodology from linear regression can be used to develop non-linear models as well.
For example, if a data set consisting of (x, y) pairs is given,
a model y = f(x) = ax4 + bx2 + cx ln x + d
is non-linear in x. However, it is linear in u = x4,
v = x2, and w = x ln x, and the
(x, y) data set can easily be converted into a (u, v, w, y)
data set. The model parameters a, b, c, and d can then be determined by linear regression.
- In Python, the statsmodels library
can be used for linear regression.
See also: Hypothesis,
Regression analysis,
Root mean square deviation.
Optimization objective
An optimization objective is a quantity that is used to formulate preferences
for the outcome of a decision making scenario. In case of a maximization objective, greater
values are preferred, and in case of a minimization objective, smaller values are preferred.
- An optimization objective can also be called an optimization criterion
or a key performance indicator (KPI).
If it is a minimization objective, it can also be called cost,
and if it is a maximization objective, it can also be called utility.
- In multicriteria optimization (MCO),
multiple conflicting optimization objectives are used simultaneously.
In this case, there is a multidimensional objective space;
the dimension of the objective space is
given by the number of optimization objectives.
- The function f(x) that maps points in parameter space to points in objective
space is called the objective function; in case of maximization,
it is also referred to as a utility function, and in case of minimization,
as a cost function.
See also: Dimension,
Hypothesis,
Optimization parameter,
Pareto optimality,
Rationality,
SMART objective.
Optimization parameter
In decision making, an optimization parameter is a quantity
over which the decision maker has direct control;
a parameter value (or parameterization) is selected in order to
obtain the best possible outcome for the optimization
objective(s).
- In multivariate optimization, there are multiple optimization parameters;
accordingly, the parameter space is multidimensional.
- If an optimization problem with multiple parameters is formulated adequately,
it should be possible to vary all optimization parameters independently.
If that is not the case and one of the parameters can be expressed as a function
of the others, the problem needs to be reformulated, eliminating redundant parameter(s).
See also: Dimension,
Hypothesis,
Optimization objective,
Pareto optimality.
Overfitting
"We say a function is overfitting the data when it pays too much
attention to the particular data set it is trained on, causing it
to perform poorly
on unseen data." Obversely,
"a hypothesis is underfitting when it
fails to find a pattern in the data" even though
such a pattern is actually present (Russell & Norvig 2021, p. 673).
Overfitting leads to a model that has excellent agreement with the training data,
but poor predictive quality for the validation
data. Therefore, such models can be eliminated during validation if they
are compared against other, simpler models that do not exhibit overfitting.
See also: Hypothesis,
Regression analysis,
Supervised learning,
Validation and testing.
Pareto optimality
In multicriteria optimization (MCO), rational compromises between
multiple conflicting optimization objectives are characterized by Pareto optimality.
A point in objective space is Pareto optimal
if it is accessible and no other accessible point in objective space dominates it.
- The Pareto front consists of all the Pareto optimal points in objective space.
- A point y in objective space is accessible if there is
a point x in parameter space such that f(x) = y,
where f(x) is the objective function (utility function in case of
maximization objectives, cost function in case of minimization objectives).
- A point y in objective space dominates another point y'
if there is at least one objective for which y is better than y',
whereas there is no objective for which y' is better than y.
If that is the case, there is no possible compromise between the objectives
that would lead a rational agent to prefer
y' over y. Therefore, if y is accessible, y' cannot be Pareto optimal.
- By extension, a point x in parameter space can also be called Pareto optimal
(e.g., a Pareto optimal solution, parameterization, or design choice)
if y = f(x) is Pareto optimal, i.e., if the
point y in objective space is on the Pareto front.
- It is a common technique in AI-driven decision support to compute
the Pareto front and the associated Pareto optimal design choices,
presenting them to decision makers.
All the other possible solutions can be discarded since they cannot
correspond to a rational compromise between the objectives.
See also: Dimension,
Optimization parameter,
Optimization objective,
Rationality,
SMART objective.
Rationality
A rational agent is an agent that exhibits a tendency toward
maximizing a performance measure (or minimizing it, depending on how it is formulated).
In particular, rational preferences, or decisions and choices made by a rational agent,
satisfy a series of constraints including, but not limited to the following (Russell & Norvig 2021, p. 520).
- Transitivity: If the agent prefers A over B, and B over C,
then the agent also prefers A over C whenever given the choice.
- Monotonicity: Assume that the agent prefers A over B. The
lotteries (i.e., probability distributions) X and Y both have
A and B as their only possible outcomes, where the
probability of A is greater in case of the lottery X
than in case of the lottery Y. Then the agent prefers X over Y.
- Continuity: If the agent prefers A over B, and B over C,
then here is exactly one lottery X with A and C as its only possible outcomes
such that the agent is indifferent between B and X, i.e.,
the agent neither prefers B over X nor does the agent prefer X over B.
For any other lottery Y with the two possible outcomes A and C,
the agent prefers Y over B if the chance of A is greater in case of Y than in case of X;
obversely, the agent prefers B over Y if the chance of A is smaller in case of Y than in case of X.
For a more complete and more mathematically oriented discussion
of rational choice, cf. Russell & Norvig (2021, p. 520f.).
See also: Agent,
Pareto optimality,
Optimization objective.
Regression analysis
Regression is a method or process in quantitative inductive
reasoning, i.e., in machine learning applied to numerical data. The learning
problem consists in finding out how an outcome variable y (also called
the dependent variable) depends on the values
of one or multiple independent variables.
- Regression is based on a pre-selected functional form of the permitted hypotheses (models),
i.e., on a pre-selected hypothesis space.
It must be known what the model function looks like and what free (i.e., adjustable) parameters it contains; e.g.,
cubic regression produces a hypothesis according to which the outcome y
is modelled by y = ax3 + bx2 + cx + d,
with the model parameters a, b, c, and d.
The outcome of the regression would then consist in a set of values for these parameters.
- The outcome of the regression (i.e., the outcome of the learning process)
is often also called regression; to avoid confusion
it is advisable to refer to the outcome of the regression as a model,
a model function, a hypothesis, or a correlation. Unfortunately, all these terms can have other meanings as well.
- On the origin of the term, Russell & Norvig (2021, p. 670) comment that the name regression
for this problem and methodology is "admittedly obscure - a better name would have
been function approximation or numeric prediction. But in 1886 Francis Galton
wrote an influential
article on the concept of regression to the mean (e.g.,
the children of tall parents are likely to be taller than average, but not as tall
as the parents). Galton showed plots with what he called 'regression lines,' and
readers came to associate the word 'regression' with the statistical technique
of function approximation rather than with the topic of regression to the mean."
- In Python, the statsmodels library
can be used for regression analysis.
Regression analysis can refer to a discussion of regression
methodology (e.g., ordinary least squares fits
based on the root mean square deviation)
or to analysing the outcome of a regression,
such as assessing the confidence in the model. Standardized techniques
and concepts for analysing the regression outcome are particularly widespread
for linear regression.
See also: Hypothesis,
Inductive reasoning,
Linear regression,
Overfitting,
Root mean square deviation,
Supervised learning,
Validation and testing.
Root mean square deviation
The root mean square deviation is a common measure for describing how far two data sets are apart.
As the name suggests, it is the square root of the mean square deviation between
the two data sets.
- The root mean square deviation between two data sets is always a positive number.
The root mean square deviation between a sequence (list) of values and the average value
is also called the standard deviation of that sequence of values.
- In model parameterization (regression) by
an ordinary least squares (OLS) fit, the root mean square deviation between
model data and actual data for the outcome variable y is
used as a minimization
criterion; i.e., model parameters are adjusted such that the root mean square deviation
between correlated and actual values of y becomes as small as possible.
(In supervised learning with
multiple hypotheses, only
the training data set should be used for that purpose.)
- The root mean square deviation between actual data and predicted data for
an outcome variable y can be used as a measure for the accuracy of a model;
in validation, this can help select the
hypothesis that performs best, and in testing,
this can provide a basis for statements on the margin of error.
See also: Linear_regression,
Regression analysis,
Validation and testing.
SMART objective
Following Doran (1981), a management "objective should be:
- Specific - target a specific area for improvement.
- Measurable - quantify or at least suggest an indicator of progress.
- Assignable - specify who will do it.
- Realistic - state what results can realistically be achieved, given available resources.
- Time-related - specify when the result(s) can be achieved."
Formulating SMART objectives is not only good organizational or management practice.
By including an "indicator of progress," which in decision making and decision support
is usually called a key performance indicator (KPI), it can help establish the
optimization objective when expressing
a scenario as an optimization problem. Multiple conflicting KPIs give rise to a
multicriteria optimization (MCO) problem with a multidimensional
objective space.
See also: Optimization objective,
Pareto optimality.
Supervised learning
Supervised learning is one of the major approaches to machine learning,
i.e., of inductive reasoning using computers.
In supervised learning, an algorithm is given input-output pairs (or, equivalently,
combinations of independent variables x and outcomes y). On this basis,
the algorithm proceeds to develop a model of the provided data. However, the
hypothesis space (that is, the kind of model),
needs to be specified by the user; the algorithm will only determine a
parameterization of the model.
It is good practice to develop multiple candidate models (i.e., hypotheses
taken from different hypothesis spaces)
and compare their performance by validation.
The other two major approaches to machine learning are unsupervised learning,
where a data set is given to the algorithm without any additional supporting information or hypothesis,
and reinforcement learning where the algorithm receives feedback
that drives it toward developing better models.
See also: Hypothesis,
Inductive reasoning,
Overfitting,
Regression analysis,
Validation and testing.
Turing test
The Turing test is a game-like criterion devised by Turing (1950)
addressing the question: "Can machines think?"
The test is a game with
three players: The machine, a human, and an interrogator (who is also human).
- The machine and the human both exchange text with the interrogator.
- The aim of the machine is to deceive the interrogator into believing that it is human.
- The aim of the human is to make the interrogator realize that he/she, and not the machine, is the human.
- The aim of the interrogator is to figure out who of the two others is the machine and who is the human.
According to Turing, the success rate of a
machine at passing this test (winning the game) is a measure of
its capacity to intelligently emulate human behaviour and communication.
The success rate will depend on a multitude of factors, including the amount of
text that can be exchanged until the interrogator must make a decision;
Turing suggested five minutes, using typewriters.
Since the opponent of the AI is an actual human,
even a perfect AI cannot be expected to win more than 50% of the time.
See also: Agent.
Validation and testing
In supervised learning, it is often unclear
what hypothesis is the best for modelling
the phenomena underlying a given data set. In that case, it is common practice
to develop multiple candidate models based on different
hypotheses (e.g., a linear, quadratic, and cubic model),
compare them to each other by validation, and finally assess the
accuracy of the selected model by testing.
For this purpose, the overall data set can be split up into three parts:
- The training data are used to parameterize multiple candidate models, i.e.,
to adjust any free variables in the models (such as a and b in y = ax + b),
optimizing the accuracy of the model; one such method would be an ordinary least squares (OLS) fit,
which minimizes the root mean square deviation
between the actual data for the outcome y (using training data only) and the values obtained from the model.
- The validation data are used to compare the candidate models against
each other and, where appropriate, to the null hypothesis which states that
the outcome y is constant. For example, the strategy
may consist in selecting the model with the smallest
root mean square deviation between the
actual and the predicted outcome y (using validation data only).
- The test data are used to provide an independent accuracy assessment for the final model.
This permits statements on the margin of error, e.g., based on the
root mean square deviation between the test data and the
corresponding predictions for the outcome variable y.
If a normal distribution is assumed for the deviation between actual and predicted data,
using a margin of error given by two times the root mean square deviation (in both directions) leads to
a 95.4% probability for observing a deviation that is smaller than the margin of error.
The split between training and validation data is helpful to prevent overfitting.
The split between validation and test data prevents a selection bias: Since the validation data are
used to choose the best hypothesis, the performance of the
selected hypothesis will usually tend to be overestimated slightly.
See also: Hypothesis,
Inductive reasoning,
Overfitting,
Regression analysis,
Root mean square deviation,
Supervised learning.
Referenced literature
- (Conte 2009) R. Conte, "Rational, goal-oriented agents," doi:10.1007/978-1-4614-1800-9_158, in R. A. Meyers (ed.), Encyclopedia of Complexity and Systems Science, New York: Springer (ISBN 978-1-4614-1801-6), 2009.
- (Doran 1981) G. T. Doran, "There's a S.M.A.R.T. way to write management's goals and objectives," Management Review 70(11): 35-36, 1981.
- (Galton 1886) F. Galton, "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland 15: 246-263, doi:10.2307/2841583, 1886.
- (Russell & Norvig 2021) S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 4th edn. (global), Harlow: Pearson (ISBN 978-1-29240113-3), 2021.
- (Turing 1950) A. M. Turing, "Computing machinery and intelligence," Mind 59(236): 433-460, doi:10.1093/mind/LIX.236.433, 1950.
Index