UCLan, CO3519: Glossary

UCLan, CO3519 (2021/22), semester 1: Glossary

Agent

An agent is a system that interacts with its surroundings. It receives percepts through sensors and can carry out actions through actuators.

Beside its sensors and actuators, an agent is characterized by its agent function: The way in which the past and present percepts determine or influence the present and future actions.
A goal-oriented agent is an agent that exhibits the tendency "to achieve a certain state of the world" (Conte 2009, p. 2578). Goal-orientation can emerge by a multitude of mechanisms, including biological evolution. It does not necessarily require the agent to be consciously aware of its goals.
"Intelligent agents are goal-oriented agents using their knowledge to solve problems, including taking decisions and planning actions" (Conte 2009, p. 2578). This requires the agent to have some kind of internal representation of its surroundings, and to store and process information about its surroundings.
A knowledge-based agent is an intelligent agent that uses a knowledge base to store and process its information about its surroundings.
A rational agent is an intelligent agent that exhibits rationality, i.e., a tendency toward optimizing a quantity: The performance measure of the agent. As in the case of goal-orientation, this does not necessarily require the agent to be aware of its performance measure.
"Goal-directed agents are intelligent agents that have an internal representation of the goals they [tend to] achieve" (Conte 2009, p. 2578).

Dimension

Colloquially, the dimension of a space, set, or object is very clear to us: A line or curve is one-dimensional, hence it has dimension 1; surfaces are two-dimensional, they have dimension 2; volumes have dimension 3; and so on. There are two major ways in defining the dimension of something that is, in the broadest possible sense, a geometrical object:

In how many independent directions can you go while remaining on the object? That is its dimension. Independence here means linear independence; where one direction would be linearly dependent if it could be obtained as a linear combination of the other directions. This definition of dimensionality originates in the theory of vectors and vector spaces, where directions and positions are represented by vectors.

For example, on the surface of earth, starting from any point except the north pole, we might go north or go west; these two directions are linearly independent - you cannot formulate "going west" as some way of "going north." Add any third direction and it becomes redundant; the third direction can be expressed as a linear combination of going north and going west: For example, you can go southwest by going north by a negative distance and then west by a positive distance. Therefore, the surface of earth is a two-dimensional object: It has dimension 2.
If we scale up all the lengths (or all coordinates, or similar) in our system by some constant factor c, how does this scale up the size of the object? If it increases by the factor c^d, this means that d is the dimension of the object. This definition is sometimes called the Kolmogorov dimension. It is particularly helpful for fractals: Objects that have a non-integer Kolmogorov dimension. However, for typical (non-fractal) objects and spaces, this is simply another way of expressing the same concept of dimensionality as above.

For example, the surface area of a sphere with the radius r is given by 4πr². If we scale up all the lengths by a factor two, c = 2, we obtain a sphere with the radius 2r, and its surface area is 4π(2r)² = 16πr²; the size of the surface has increased by the factor c^d = 2² = 4, indicating that the surface of a sphere is a two-dimensional object: It has dimension 2.

Hypothesis

In machine learning, a hypothesis is a function y = f(x₀, x₁, …) that predicts an outcome variable y on the basis of values of one or multiple independent variables x₀, x₁, …

In this sense, hypothesis means the same as model or model function, where the independent variables in the hypothesis are the arguments of the model function; it can also be called a correlation or a regression. All these words have other meanings as well (and so does the word "hypothesis"), creating an unfortunate level of ambiguity. Be aware of this potential for ambiguity and misunderstandings. You can counteract it by providing clear definitions.
A hypothesis space is a kind of model, or a model class, with free parameters (i.e., model parameters) that can be adjusted to optimize quantitative agreement with the data. For example, with two independent variables x₀ and x₁, the hypothesis space for linear regression is given by the space (i.e., set) of functions that have the form f(x₀, x₁) = ax₀ + bx₁ + c, with three adjustable parameters: a, b, and c.
Model parameterization, or the process of computing a regression, means to solve an optimization problem where the hypothesis space is the parameter space, and some measure for model quality is the optimization objective. If this is done by the ordinary least squares (OLS) method, the root mean square deviation between the hypothesis and the actual data is used as a minimization objective.

Colloquially it is also possible and common to refer to the hypothesis space as "the hypothesis."

Inductive reasoning

There are three major ways for an intelligent agent to acquire knowledge, but only one of them - namely, inductive reasoning - is commonly called learning in an AI context. These three ways to improve the agent's understanding of the world are:

Direct input as part of the percepts received through the agent's sensors. Depending on the kind of agent, this may include observation of the surroundings or data ingest by an authorized user. For example:
- I know that the sun is shining when I see it shine. (Observation)
- The administrator has ingested data on the past season's football results into the agent's knowledge base, and our agent is programmed to work with this information. (Data ingest)
Logical or mathematical reasoning by which the agent explores the consequences of its axioms, i.e., of the propositions that it accepts as true to begin with. All that can be proven to be true on that basis must also be accepted as true. This way of thinking is called deductive reasoning. a) Jack talked to me yesterday, b) Jack is a human, and c) humans can only talk after they have been born, therefore d) Jack was born yesterday or before. If the three premises a), b), and c) are indeed true, then the consequence d) must be true as well. This is called automated reasoning if it is done by an algorithm.
Detecting patterns and trends, i.e., correlations between phenomena, in the knowledge available to the agent. This can be done to better explain and understand that knowledge, creating a mental model of it. It can also be done in order to predict the behaviour of an observed system under conditions for which no data have been provided so far. Normally, both of these goals are pursued at the same time. This is inductive reasoning; it is called machine learning if it is done by an algorithm.

Colloquially, all the three items above might be called "learning," since they are ways of expanding an agent's knowledge. For example, learning from a book or a teacher is of the first type, whereas studying mathematics is often of the second type. In AI, the term learning and particularly machine learning is typically understood to refer to inductive reasoning only, whereas reasoning without any further qualification usually means deductive reasoning.

Knowledge base

"The central component of a knowledge-based agent is its knowledge base" (Russell & Norvig 2021, p. 227). Interactions with a knowledge base take two forms:

Data ingest to extend or update the information about the world.
Data retrieval by querying.

Knowledge bases are typically designed to support deductive reasoning (logical inference and theorem proving).

Linear regression

Linear regression is the most common way of conducting regression analysis. It considers the hypothesis space where the model is linear in terms of all the independent variables: The outcome variable is expressed as a linear combination of the other variables.

For example, if y is to be described as a function of u, v, and w, linear regression will determine optimal values of a, b, c, and d such that the available data set (consisting of instances with known, given values of u, v, w, and y) is represented by the equation y = au + bv + cw + d as accurately as possible.
The methodology from linear regression can be used to develop non-linear models as well. For example, if a data set consisting of (x, y) pairs is given, a model y = f(x) = ax⁴ + bx² + cx ln x + d

⁴

In Python, the statsmodels library can be used for linear regression.

Optimization objective

An optimization objective is a quantity that is used to formulate preferences for the outcome of a decision making scenario. In case of a maximization objective, greater values are preferred, and in case of a minimization objective, smaller values are preferred.

An optimization objective can also be called an optimization criterion or a key performance indicator (KPI). If it is a minimization objective, it can also be called cost, and if it is a maximization objective, it can also be called utility.
In multicriteria optimization (MCO), multiple conflicting optimization objectives are used simultaneously. In this case, there is a multidimensional objective space; the dimension of the objective space is given by the number of optimization objectives.
The function f(x) that maps points in parameter space to points in objective space is called the objective function; in case of maximization, it is also referred to as a utility function, and in case of minimization, as a cost function.

Optimization parameter

In decision making, an optimization parameter is a quantity over which the decision maker has direct control; a parameter value (or parameterization) is selected in order to obtain the best possible outcome for the optimization objective(s).

In multivariate optimization, there are multiple optimization parameters; accordingly, the parameter space is multidimensional.
If an optimization problem with multiple parameters is formulated adequately, it should be possible to vary all optimization parameters independently. If that is not the case and one of the parameters can be expressed as a function of the others, the problem needs to be reformulated, eliminating redundant parameter(s).

Overfitting

"We say a function is overfitting the data when it pays too much attention to the particular data set it is trained on, causing it to perform poorly on unseen data." Obversely, "a hypothesis is underfitting when it fails to find a pattern in the data" even though such a pattern is actually present (Russell & Norvig 2021, p. 673). Overfitting leads to a model that has excellent agreement with the training data, but poor predictive quality for the validation data. Therefore, such models can be eliminated during validation if they are compared against other, simpler models that do not exhibit overfitting.

Pareto optimality

In multicriteria optimization (MCO), rational compromises between multiple conflicting optimization objectives are characterized by Pareto optimality. A point in objective space is Pareto optimal if it is accessible and no other accessible point in objective space dominates it.

The Pareto front consists of all the Pareto optimal points in objective space.
A point y in objective space is accessible if there is a point x in parameter space such that f(x) = y, where f(x) is the objective function (utility function in case of maximization objectives, cost function in case of minimization objectives).
A point y in objective space dominates another point y' if there is at least one objective for which y is better than y', whereas there is no objective for which y' is better than y. If that is the case, there is no possible compromise between the objectives that would lead a rational agent to prefer y' over y. Therefore, if y is accessible, y' cannot be Pareto optimal.
By extension, a point x in parameter space can also be called Pareto optimal (e.g., a Pareto optimal solution, parameterization, or design choice) if y = f(x) is Pareto optimal, i.e., if the point y in objective space is on the Pareto front.
It is a common technique in AI-driven decision support to compute the Pareto front and the associated Pareto optimal design choices, presenting them to decision makers. All the other possible solutions can be discarded since they cannot correspond to a rational compromise between the objectives.

Rationality

A rational agent is an agent that exhibits a tendency toward maximizing a performance measure (or minimizing it, depending on how it is formulated). In particular, rational preferences, or decisions and choices made by a rational agent, satisfy a series of constraints including, but not limited to the following (Russell & Norvig 2021, p. 520).

Transitivity: If the agent prefers A over B, and B over C, then the agent also prefers A over C whenever given the choice.
Monotonicity: Assume that the agent prefers A over B. The lotteries (i.e., probability distributions) X and Y both have A and B as their only possible outcomes, where the probability of A is greater in case of the lottery X than in case of the lottery Y. Then the agent prefers X over Y.
Continuity: If the agent prefers A over B, and B over C, then here is exactly one lottery X with A and C as its only possible outcomes such that the agent is indifferent between B and X, i.e., the agent neither prefers B over X nor does the agent prefer X over B. For any other lottery Y with the two possible outcomes A and C, the agent prefers Y over B if the chance of A is greater in case of Y than in case of X; obversely, the agent prefers B over Y if the chance of A is smaller in case of Y than in case of X.

For a more complete and more mathematically oriented discussion of rational choice, cf. Russell & Norvig (2021, p. 520f.).

Regression analysis

Regression is a method or process in quantitative inductive reasoning, i.e., in machine learning applied to numerical data. The learning problem consists in finding out how an outcome variable y (also called the dependent variable) depends on the values of one or multiple independent variables.

Regression is based on a pre-selected functional form of the permitted hypotheses (models), i.e., on a pre-selected hypothesis space. It must be known what the model function looks like and what free (i.e., adjustable) parameters it contains; e.g., cubic regression produces a hypothesis according to which the outcome y is modelled by y = ax³ + bx² + cx + d, with the model parameters a, b, c, and d. The outcome of the regression would then consist in a set of values for these parameters.
The outcome of the regression (i.e., the outcome of the learning process) is often also called regression; to avoid confusion it is advisable to refer to the outcome of the regression as a model, a model function, a hypothesis, or a correlation. Unfortunately, all these terms can have other meanings as well.
On the origin of the term, Russell & Norvig (2021, p. 670) comment that the name regression for this problem and methodology is "admittedly obscure - a better name would have been function approximation or numeric prediction. But in 1886 Francis Galton wrote an influential article on the concept of regression to the mean (e.g., the children of tall parents are likely to be taller than average, but not as tall as the parents). Galton showed plots with what he called 'regression lines,' and readers came to associate the word 'regression' with the statistical technique of function approximation rather than with the topic of regression to the mean."
In Python, the statsmodels library can be used for regression analysis.

Regression analysis can refer to a discussion of regression methodology (e.g., ordinary least squares fits based on the root mean square deviation) or to analysing the outcome of a regression, such as assessing the confidence in the model. Standardized techniques and concepts for analysing the regression outcome are particularly widespread for linear regression.

Root mean square deviation

The root mean square deviation is a common measure for describing how far two data sets are apart. As the name suggests, it is the square root of the mean square deviation between the two data sets.

The root mean square deviation between two data sets is always a positive number. The root mean square deviation between a sequence (list) of values and the average value is also called the standard deviation of that sequence of values.
In model parameterization (regression) by an ordinary least squares (OLS) fit, the root mean square deviation between model data and actual data for the outcome variable y is used as a minimization criterion; i.e., model parameters are adjusted such that the root mean square deviation between correlated and actual values of y becomes as small as possible. (In supervised learning with multiple hypotheses, only the training data set should be used for that purpose.)
The root mean square deviation between actual data and predicted data for an outcome variable y can be used as a measure for the accuracy of a model; in validation, this can help select the hypothesis that performs best, and in testing, this can provide a basis for statements on the margin of error.

SMART objective

Following Doran (1981), a management "objective should be:

Specific - target a specific area for improvement.
Measurable - quantify or at least suggest an indicator of progress.
Assignable - specify who will do it.
Realistic - state what results can realistically be achieved, given available resources.
Time-related - specify when the result(s) can be achieved."

Formulating SMART objectives is not only good organizational or management practice. By including an "indicator of progress," which in decision making and decision support is usually called a key performance indicator (KPI), it can help establish the optimization objective when expressing a scenario as an optimization problem. Multiple conflicting KPIs give rise to a multicriteria optimization (MCO) problem with a multidimensional objective space.

Supervised learning

Supervised learning is one of the major approaches to machine learning, i.e., of inductive reasoning using computers. In supervised learning, an algorithm is given input-output pairs (or, equivalently, combinations of independent variables x and outcomes y). On this basis, the algorithm proceeds to develop a model of the provided data. However, the hypothesis space (that is, the kind of model), needs to be specified by the user; the algorithm will only determine a parameterization of the model.

It is good practice to develop multiple candidate models (i.e., hypotheses taken from different hypothesis spaces) and compare their performance by validation.

The other two major approaches to machine learning are unsupervised learning, where a data set is given to the algorithm without any additional supporting information or hypothesis, and reinforcement learning where the algorithm receives feedback that drives it toward developing better models.

Turing test

The Turing test is a game-like criterion devised by Turing (1950) addressing the question: "Can machines think?"

The test is a game with three players: The machine, a human, and an interrogator (who is also human).

The machine and the human both exchange text with the interrogator.
The aim of the machine is to deceive the interrogator into believing that it is human.
The aim of the human is to make the interrogator realize that he/she, and not the machine, is the human.
The aim of the interrogator is to figure out who of the two others is the machine and who is the human.

According to Turing, the success rate of a machine at passing this test (winning the game) is a measure of its capacity to intelligently emulate human behaviour and communication. The success rate will depend on a multitude of factors, including the amount of text that can be exchanged until the interrogator must make a decision; Turing suggested five minutes, using typewriters. Since the opponent of the AI is an actual human, even a perfect AI cannot be expected to win more than 50% of the time.

Validation and testing

In supervised learning, it is often unclear what hypothesis is the best for modelling the phenomena underlying a given data set. In that case, it is common practice to develop multiple candidate models based on different hypotheses (e.g., a linear, quadratic, and cubic model), compare them to each other by validation, and finally assess the accuracy of the selected model by testing.

For this purpose, the overall data set can be split up into three parts:

The training data are used to parameterize multiple candidate models, i.e., to adjust any free variables in the models (such as a and b in y = ax + b), optimizing the accuracy of the model; one such method would be an ordinary least squares (OLS) fit, which minimizes the root mean square deviation between the actual data for the outcome y (using training data only) and the values obtained from the model.
The validation data are used to compare the candidate models against each other and, where appropriate, to the null hypothesis which states that the outcome y is constant. For example, the strategy may consist in selecting the model with the smallest root mean square deviation between the actual and the predicted outcome y (using validation data only).
The test data are used to provide an independent accuracy assessment for the final model. This permits statements on the margin of error, e.g., based on the root mean square deviation between the test data and the corresponding predictions for the outcome variable y. If a normal distribution is assumed for the deviation between actual and predicted data, using a margin of error given by two times the root mean square deviation (in both directions) leads to a 95.4% probability for observing a deviation that is smaller than the margin of error.

The split between training and validation data is helpful to prevent overfitting. The split between validation and test data prevents a selection bias: Since the validation data are used to choose the best hypothesis, the performance of the selected hypothesis will usually tend to be overestimated slightly.

Referenced literature

(Conte 2009) R. Conte, "Rational, goal-oriented agents," doi:10.1007/978-1-4614-1800-9_158, in R. A. Meyers (ed.), Encyclopedia of Complexity and Systems Science, New York: Springer (ISBN 978-1-4614-1801-6), 2009.
(Doran 1981) G. T. Doran, "There's a S.M.A.R.T. way to write management's goals and objectives," Management Review 70(11): 35-36, 1981.
(Galton 1886) F. Galton, "Regression towards mediocrity in hereditary stature," Journal of the Anthropological Institute of Great Britain and Ireland 15: 246-263, doi:10.2307/2841583, 1886.
(Russell & Norvig 2021) S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 4th edn. (global), Harlow: Pearson (ISBN 978-1-29240113-3), 2021.
(Turing 1950) A. M. Turing, "Computing machinery and intelligence," Mind 59(236): 433-460, doi:10.1093/mind/LIX.236.433, 1950.

Index