Skip to main content

Fundamentals of AI

2727 words·13 mins
Guides Hack the Box AI Machine Learning Cybersecurity
Mike Vu
Author
Mike Vu
Table of Contents

Fundamentals of AI
#

Artificial Intelligence
#

Artificial Intelligence (AI) is a broad field focused on developing intelligent systems capable of performing tasks that typically require human intelligence:

  • Understanding natural language,
  • Recognizing objects,
  • Making decisions,
  • Solving problems
  • Learning from experience.

AI systems exhibit cognitive abilities like reasoning, perception, and problem-solving across various domains:

  • Natural Language Processing (NLP): Enabling computers to understand, interpret, and generate human language.
  • Computer Vision: Allowing computers to “see” and interpret images and videos.
  • Robotics: Developing robots that can perform tasks autonomously or with human guidance.
  • Expert Systems: Creating systems that mimic the decision-making abilities of human experts.

One of the primary goals of AI is to augment human capabilities, not just replace human efforts. 

AI systems are designed to enhance human decision-making and productivity, providing support in complex data analysis, prediction, and mechanical tasks.

AI solves complex problems in many diverse domains like healthcare, finance, and cybersecurity. For example:

  • In healthcareAI improves disease diagnosis and drug discovery.
  • In financeAI detects fraudulent transactions and optimizes investment strategies.
  • In cybersecurityAI identifies and mitigates cyber threats.

Machine Learning (ML)
#

Machine Learning (ML) is a subfield of AI that focuses on enabling systems to learn from data and improve their performance on specific tasks without explicit programming. ML algorithms use statistical techniques to identify patterns, trends, and anomalies within datasets, allowing the system to make predictions, decisions, or classifications based on new input data.

ML can be categorized into three main types:

  • Supervised Learning: The algorithm learns from labeled data, where each data point is associated with a known outcome or label. Examples include:
    • Image classification
    • Spam detection
    • Fraud prevention
  • Unsupervised Learning: The algorithm learns from unlabeled data without providing an outcome or label. Examples include:
    • Customer segmentation
    • Anomaly detection
    • Dimensionality reduction
  • Reinforcement Learning: The algorithm learns through trial and error by interacting with an environment and receiving feedback as rewards or penalties. Examples include:

For instance, an ML algorithm can be trained on a dataset of images labeled as “cat” or “dog.” By analyzing the features and patterns in these images, the algorithm learns to distinguish between cats and dogs. When presented with a new image, it can predict whether it depicts a cat or a dog based on its learned knowledge.

ML has a wide range of applications across various industries, including:

  • Healthcare: Disease diagnosis, drug discovery, personalized medicine
  • Finance: Fraud detection, risk assessment, algorithmic trading
  • Marketing: Customer segmentation, targeted advertising, recommendation systems
  • Cybersecurity: Threat detection, intrusion prevention, malware analysis
  • Transportation: Traffic prediction, autonomous vehicles, route optimization

ML is a rapidly evolving field with new algorithms, techniques, and applications emerging. It is a crucial enabler of AI, providing the learning and adaptation capabilities that underpin many intelligent systems.

Deep Learning (DL)
#

Deep Learning (DL) is a subfield of ML that uses neural networks with multiple layers to learn and extract features from complex data. These deep neural networks can automatically identify intricate patterns and representations within large datasets, making them particularly powerful for tasks involving unstructured or high-dimensional data, such as images, audio, and text.

Key characteristics of DL include:

  • Hierarchical Feature Learning: DL models can learn hierarchical data representations, where each layer captures increasingly abstract features. For example, lower layers might detect edges and textures in image recognition, while higher layers identify more complex structures like shapes and objects.
  • End-to-End Learning: DL models can be trained end-to-end, meaning they can directly map raw input data to desired outputs without manual feature engineering.
  • Scalability: DL models can scale well with large datasets and computational resources, making them suitable for big data applications.

Common types of neural networks used in DL include:

  • Convolutional Neural Networks (CNNs): Specialized for image and video data, CNNs use convolutional layers to detect local patterns and spatial hierarchies.
  • Recurrent Neural Networks (RNNs): Designed for sequential data like text and speech, RNNs have loops that allow information to persist across time steps.
  • Transformers: A recent advancement in DL, transformers are particularly effective for natural language processing tasks. They leverage self-attention mechanisms to handle long-range dependencies.

DL has revolutionized many areas of AI, achieving state-of-the-art performance in tasks such as:

  • Computer Vision: Image classification, object detection, image segmentation
  • Natural Language Processing (NLP): Sentiment analysis, machine translation, text generation
  • Speech Recognition: Transcribing audio to text, speech synthesis
  • Reinforcement Learning: Training agents for complex tasks like playing games and controlling robots

The Relationship Between AI, ML, and DL
#

Machine Learning (ML) and Deep Learning (DL) are subfields of Artificial Intelligence (AI) that enable systems to learn from data and make intelligent decisions.

ML algorithms, including DL algorithms, allow machines to learn from data, recognize patterns, and make decisions. The various types of ML, such as supervised, unsupervised, and reinforcement learning, each contribute to achieving AI’s broader goals. For instance:

  • In Computer Vision, supervised learning algorithms and Deep Convolutional Neural Networks (CNNs) enable machines to “see” and interpret images accurately.
  • In Natural Language Processing (NLP), traditional ML algorithms and advanced DL models like transformers allow for understanding and generating human language, enabling applications like chatbots and translation services.

DL has significantly enhanced the capabilities of ML by providing powerful tools for feature extraction and representation learning, particularly in domains with complex, unstructured data.

Examples of use cases:

  • In Autonomous Driving, a combination of ML and DL techniques processes sensor data, recognizes objects, and makes real-time decisions, enabling vehicles to navigate safely.
  • In Robotics, reinforcement learning algorithms, often enhanced with DL, train robots to perform complex tasks in dynamic environments.

Mathematics Refresher for AI
#

Multiplication (*)
#

The multiplication operator denotes the product of two numbers or expressions.

3 * 4 = 12

Division(/)
#

The division operator denotes dividing one number or expression by another.

10 / 2 = 5

Addition (+)
#

The addition operator represents the sum of two or more numbers or expressions.

5 + 3 = 8

Subtraction (-)
#

The subtraction operator represents the difference between two numbers or expressions. 

9 - 4 = 5

Algebraic Notations (x_t)
#

The subscript notation represents a variable indexed by t, often indicating a specific time step or state in a sequence. x_t = q(x_t | x_{t-2})

This notation is commonly used in sequences and time series data, where each x_t represents the value of x at time t.

Superscript Notation (x^n )
#

Superscript notation is used to denote exponents or powers.

x^2 = x * x

This notation is used in polynomial expressions and exponential functions.

Norm (||...|| )
#

The norm measures the size or length of a vector. The most common norm is the Euclidean norm, which is calculated as follows: ||v|| = sqrt{v_1^2 + v_2^2 + ... + v_n^2}

Other norms include the L1 norm (Manhattan distance) and the L∞ norm (maximum absolute value):

||v||_1 = |v_1| + |v_2| + ... + |v_n|
||v||_ = max(|v_1|, |v_2|, ..., |v_n|)

Norms are used in various applications, such as measuring the distance between vectors, regularizing models to prevent overfitting, and normalizing data.

Summation Symbol (Σ )
#

The summation symbol indicates the sum of a sequence of terms.

Σ_{i=1}^{n} a_i

This represents the sum of the terms a_1, a_2, ..., a_n. Summation is used in many mathematical formulas, including calculating means, variances, and series.

Logarithms and Exponentials
#

Logarithm Base 2 (log2(x))
#

The logarithm base 2 is the logarithm of x with base 2, often used in information theory to measure entropy. For example:

Code: python

log2(8) = 3

Logarithms are used in information theory, cryptography, and algorithms for their properties in reducing large numbers and handling exponential growth.

Natural Logarithm (ln(x))
#

The natural logarithm is the logarithm of x with base e (Euler’s number). For example:

Code: python

ln(e^2) = 2

Due to its smooth and continuous nature, the natural logarithm is widely used in calculus, differential equations, and probability theory.

Exponential Function (e^x)
#

The exponential function represents Euler’s number e raised to the power of x. For example:

Code: python

e^{2}  7.389

The exponential function is used to model growth and decay processes, probability distributions (e.g., the normal distribution), and various mathematical and physical models.

Exponential Function (Base 2) (2^x)
#

The exponential function (base 2) represents 2 raised to the power of x, often used in binary systems and information metrics. For example:

Code: python

2^3 = 8

This function is used in computer science, particularly in binary representations and information theory.

Matrix and Vector Operations
#

Matrix-Vector Multiplication (A * v)
#

Matrix-vector multiplication denotes the product of a matrix A and a vector v. For example:

Code: python

A * v = [ [1, 2], [3, 4] ] * [5, 6] = [17, 39]

This operation is fundamental in linear algebra and is used in various applications, including transforming vectors, solving systems of linear equations, and in neural networks.

Matrix-Matrix Multiplication (A * B)
#

Matrix-matrix multiplication denotes the product of two matrices A and B. For example:

Code: python

A * B = [ [1, 2], [3, 4] ] * [ [5, 6], [7, 8] ] = [ [19, 22], [43, 50] ]

This operation is used in linear transformations, solving systems of linear equations, and deep learning for operations between layers.

Transpose (A^T)
#

The transpose of a matrix A is denoted by A^T and swaps the rows and columns of A. For example:

Code: python

A = [ [1, 2], [3, 4] ]
A^T = [ [1, 3], [2, 4] ]

The transpose is used in various matrix operations, such as calculating the dot product and preparing data for certain algorithms.

Inverse (A^{-1})
#

The inverse of a matrix A is denoted by A^{-1} and is the matrix that, when multiplied by A, results in the identity matrix. For example:

Code: python

A = [ [1, 2], [3, 4] ]
A^{-1} = [ [-2, 1], [1.5, -0.5] ]

The inverse is used to solve systems of linear equations, inverting transformations, and various optimization problems.

Determinant (det(A))
#

The determinant of a square matrix A is a scalar value that can be computed and is used in various matrix operations. For example:

Code: python

A = [ [1, 2], [3, 4] ]
det(A) = 1 * 4 - 2 * 3 = -2

The determinant determines whether a matrix is invertible (non-zero determinant) in calculating volumes, areas, and geometric transformations.

Trace (tr(A))
#

The trace of a square matrix A is the sum of the elements on the main diagonal. For example:

Code: python

A = [ [1, 2], [3, 4] ]
tr(A) = 1 + 4 = 5

The trace is used in various matrix properties and in calculating eigenvalues.

Set Theory
#

Cardinality (|S|)
#

The cardinality represents the number of elements in a set S. For example:

Code: python

S = {1, 2, 3, 4, 5}
|S| = 5

Cardinality is used in counting elements, probability calculations, and various combinatorial problems.

Union ()
#

The union of two sets A and B is the set of all elements in either A or B or both. For example:

Code: python

A = {1, 2, 3}, B = {3, 4, 5}
A  B = {1, 2, 3, 4, 5}

The union is used in combining sets, data merging, and in various set operations.

Intersection ()
#

The intersection of two sets A and B is the set of all elements in both A and B. For example:

Code: python

A = {1, 2, 3}, B = {3, 4, 5}
A  B = {3}

The intersection finds common elements, data filtering, and various set operations.

Complement (A^c)
#

The complement of a set A is the set of all elements not in A. For example:

Code: python

U = {1, 2, 3, 4, 5}, A = {1, 2, 3}
A^c = {4, 5}

The complement is used in set operations, probability calculations, and various logical operations.

Comparison Operators
#

Greater Than or Equal To (>=)
#

The greater than or equal to operator indicates that the value on the left is either greater than or equal to the value on the right. For example:

Code: python

a >= b

Less Than or Equal To (<=)
#

The less than or equal to operator indicates that the value on the left is either less than or equal to the value on the right. For example:

Code: python

a <= b

Equality (==)
#

The equality operator checks if two values are equal. For example:

Code: python

a == b

Inequality (!=)
#

The inequality operator checks if two values are not equal. For example:

Code: python

a != b

Eigenvalues and Scalars
#

Lambda (Eigenvalue) (λ)
#

The lambda symbol often represents an eigenvalue in linear algebra or a scalar parameter in equations. For example:

Code: python

A * v = λ * v, where λ = 3

Eigenvalues are used to understand the behavior of linear transformations, principal component analysis (PCA), and various optimization problems.

Eigenvector
#

An eigenvector is a non-zero vector that, when multiplied by a matrix, results in a scalar multiple of itself. The scalar is the eigenvalue. For example:

Code: python

A * v = λ * v

Eigenvectors are used to understand the directions of maximum variance in data, dimensionality reduction techniques like PCA, and various machine learning algorithms.

Functions and Operators
#

Maximum Function (max(...))
#

The maximum function returns the largest value from a set of values. For example:

Code: python

max(4, 7, 2) = 7

The maximum function is used in optimization, finding the best solution, and in various decision-making processes.

Minimum Function (min(...))
#

The minimum function returns the smallest value from a set of values. For example:

Code: python

min(4, 7, 2) = 2

The minimum function is used in optimization, finding the best solution, and in various decision-making processes.

Reciprocal (1 / ...)
#

The reciprocal represents one divided by an expression, effectively inverting the value. For example:

Code: python

1 / x where x = 5 results in 0.2

The reciprocal is used in various mathematical operations, such as calculating rates and proportions.

Ellipsis (...)
#

The ellipsis indicates the continuation of a pattern or sequence, often used to denote an indefinite or ongoing process. For example:

Code: python

a_1 + a_2 + ... + a_n

The ellipsis is used in mathematical notation to represent sequences and series.

Functions and Probability
#

Function Notation (f(x))
#

Function notation represents a function f applied to an input x. For example:

Code: python

f(x) = x^2 + 2x + 1

Function notation is used in defining mathematical relationships, modeling real-world phenomena, and in various algorithms.

Conditional Probability Distribution (P(x | y))
#

The conditional probability distribution denotes the probability distribution of x given y. For example:

Code: python

P(Output | Input)

Conditional probabilities are used in Bayesian inference, decision-making under uncertainty, and various probabilistic models.

Expectation Operator (E[...])
#

The expectation operator represents a random variable’s expected value or average over its probability distribution. For example:

Code: python

E[X] = sum x_i P(x_i)

The expectation is used in calculating the mean, decision-making under uncertainty, and various statistical models.

Variance (Var(X))
#

Variance measures the spread of a random variable X around its mean. It is calculated as follows:

Code: python

Var(X) = E[(X - E[X])^2]

The variance is used to understand the dispersion of data, assess risk, and use various statistical models.

Standard Deviation (σ(X))
#

Standard Deviation is the square root of the variance and provides a measure of the dispersion of a random variable. For example:

Code: python

σ(X) = sqrt(Var(X))

Standard deviation is used to understand the spread of data, assess risk, and use various statistical models.

Covariance (Cov(X, Y))
#

Covariance measures how two random variables X and Y vary. It is calculated as follows:

Code: python

Cov(X, Y) = E[(X - E[X])(Y - E[Y])]

Covariance is used to understand the relationship between two variables, portfolio optimization, and various statistical models.

Correlation (ρ(X, Y))
#

The correlation is a normalized covariance measure, ranging from -1 to 1. It indicates the strength and direction of the linear relationship between two random variables. For example:

Code: python

ρ(X, Y) = Cov(X, Y) / (σ(X) * σ(Y))

Correlation is used to understand the linear relationship between variables in data analysis and in various statistical models.