Modeling Process
Modeling is a multi-stage methodology for creating trained and tested Machine Learning and AI models.
The Modeling Process is essentially a scientific experiment which includes:
Development of a Hypothesis - e.g., data collected about a specific previous consumer behavior can be used to predict future behavior
Design of the Experiment - e.g., model/algorithm selection
Execution of the Experiment - e.g., model training and testing
Evaluation and Explanation of Results - e.g., is the hypothesis true or false, what is the accuracy
Process Phases
Phases in the modeling process, which can be highly recursive/iterative, generally include:
Type Identification
Platform Selection
Data Collection
Model/Algorithm Selection
Model Hyperparameters Setting
Model Training
Model Testing
Model Evaluation
Model Deployment
Type Identification
The type of ML/AI needed can have a significant influence on the details of modelng process phases. Type identification can be driven by:
Areas of Interest
Educational Needs
Research and Development
Major category types include:
Computer Vision (e.g., object recognition, facial recognition, handwriting recognition)
Natural Language Processing (e.g., speech to text, translation, understanding)
Pattern Recognition (e.g., event prediction, medical diagnosis)
Traditional Machine Learning vs. AI Modeling
Generative AI Models such as Large Language Models differ from more traditional Models such as Decision Trees in a number of aspects.
Model Architecture
AI Models - rely heavily on Transformer Neural Networks
ML Models - use a large variety of algorithms such as Artificial Neural Networks and Decision Trees
Training Data
AI Models - are trained on very large volumes of data
ML Models - are trained on much smaller volumes of data
Training Compute Resources
AI Models - use high levels of computing resources during training
ML Models - use relatively lower levels of computing resources during training
Model Fine Tuning
AI Models - can be fine tuned using small datasets for focused applications
ML Models - are typically not fine tuned with additional data after model training
Transfer Learning
AI Models - can be applied to a wide variety of applications
ML Models - are trained and used for specific applications
Platform Selection
ML and AI Platforms are generally of two types:
Open Source (e.g., TensorFlow, Keras, Scikit-Learn, Theano, Caffe, Torch)
Data Collection
Data Collection is the processing of finding, organizing, cleaning, and storing data in a form that can be fed into model training and prediction processing.
Data Collection can involve:
Databases (e.g., Columnar, Document, Relational)
Model/Algorithm Selection
Model/Algorithm Options
Model algorithms to select from include:
Selection Methodologies
Methods of selecting an algorithm include:
Identifying Project Key Criteria - often include model application, need for model explainability and interpretability, training data availability
Reviewing Model Categories - a categorization of models and their variations can provide insights useful for algorithm selection
Researching the Latest Advancements - Machine Learning is a very dynamic field; internet searches related to the type of ML being pursued can be valuable; use the Application page of this site to see a Google search for specific areas of interest
Experimenting with Various Options - running tests using various algorithms can provide insights into their effectiveness for the type of use envisioned
Comparing Models - use a method such as a spreadsheet to compare various models
Model Hyperparameter Settings
Hyperparameters control aspects of model instantiation and training and can include factors, depending on the model algorithm being used, such as:
activation_function: which Activation Function is used in Activation Nodes
batch_size: the number of inputs to include in each processing iteration linked to the learning rate
hidden_network_layers: the number of nodes in each hidden network layer; hidden layers are those between the input and output layers
learning_rate: what algorithm to use for controlling Weight Optimization
maximum_number_of_iterations: the maximum number of iterations of data is processed through the neural network
number_of_data_features: the number of data features used for model training and inference processing
number_of_informative_data_features: the number of data features correlated to the training outputs; this simulates real world model training where the correlation of data features may not be known
number_of_model_classes: the number of output classes the neural network is being trained to predict
number_of_training_and_test_samples: the number of data samples processed through model training
print_training_progress: whether to print the loss after each training iteration; loss is a measure of the difference between calculated outputs and expected outputs
tolerance_for_optimization: a numeric value used for ending the model training iteration cycles
weight_optimization_algorithm: the algorithm used for Weight Optimization, such as Stochastic Gradient Descent
Model Training
Data is iteratively processed through the model to adjust the weights and biases applied to data array links to produced increasingly more accurate output results. The diagram below illustrates an Artificial Neural Network; the concepts are true for other model algorithms.
Data Inputs - data is fed into the training process
Iteration - data is iteratively passed through the neural network
Forward Propagation - data is passed from node to node
Outputs - output results are fed into loss calculations
Loss Calculation - the difference between output results and desired results is calculated
Weight Optimization - the amount of change to data flow weights is calculated
Backpropagation - modifies the weights and biases applied to data array links
Typically the training process is performed iteratively while monitoring for factors such as best accuracy results as illustrated below:
Model Testing
Data is passed forward through the neural network to produce a result and associated confidence level that the result is true. The diagram below illustrates an Artificial Neural Network; the concepts are true for other model algorithms.
Data Inputs - data is fed into the training process
Forward Propagation - data is passed from node to node
Outputs - output results are fed into confidence level calculations
Confidence Level - is a number from 0 to 1 indicating the probability that the output results is correct
Model Evaluation
Model Evaluation involves applying Probability and Statistics using measurements such as:
Depending on the results of model evaluation, previous modeling steps may need to be adjusted and repeated.
To reduce overfitting, consider using:
Fewer Variables
Reduced Model Training Time
Model Reinforcement Learning with Human Feedback (RLHF)
RLHF is a type of machine learning that combines reinforcement learning and human feedback to train AI models.
Key Benefits
Improved alignment: RLHF helps align the agent's objectives with human values and preferences.
Flexibility: RLHF can be applied to various domains, including those with complex or nuanced objectives.
Efficient learning: Human feedback accelerates learning, reducing the need for large amounts of data or trial and error.
Challenges and Limitations
Scalability: Obtaining high-quality human feedback can be time-consuming and expensive.
Bias and variability: Human feedback may be subjective, inconsistent, or biased.
Evaluation metrics: Assessing the effectiveness of RLHF can be challenging due to the complexity of human feedback.
By combining reinforcement learning with human feedback, RLHF enables AI agents to learn complex behaviors and make decisions that align with human values and preferences. Steps 2-4 below are repeated, with the agent refining its policy through continuous human feedback and reward signals.
Step 1: Environment and Agent
The AI agent interacts with an environment, such as a game, simulation, or text-based interface.
Step 2: Human Feedback
Humans provide feedback on the agent's actions, such as:
Rewards (e.g., +1 for good action, -1 for bad action)
Preferences (e.g., "I like this action better than that one")
Corrections (e.g., "No, do this instead")
Step 3: Reward Signal
The human feedback is converted into a reward signal, which guides the agent's learning process. One method of doing this is Proximal Policy Optimization (PPO).
PPO was introduced by OpenAI in 2017, designed to optimize the policy of an agent in a stable and efficient manner. PPO is a type of policy gradient method, which means it focuses on optimizing the policy directly rather than relying on a value function.
The key innovation of PPO lies in its use of a clipped surrogate objective function. This function constrains the policy updates by clipping the probability ratio between the new and old policies within a specified range. By doing so, PPO prevents large, destabilizing updates to the policy, ensuring that changes remain within a "trust region" that maintains training stability.
This approach allows PPO to achieve a balance between exploration and exploitation, making it more sample efficient and stable compared to previous methods like Trust Region Policy Optimization (TRPO).
PPO's simplicity, combined with its effectiveness, has made it a popular choice for various applications, including robotics, game playing, and other high-dimensional tasks.
Step 4: Policy Update
The agent updates its policy (behavior) based on the reward signal, using reinforcement learning algorithms (e.g., Q-learning, policy gradients).
Q-learning is a reinforcement learning algorithm that enables an agent to learn optimal action-selection policies in an environment. Here's how it works:
1. Q-Table Initialization
The algorithm starts by creating a Q-table, which is a matrix where rows represent states and columns represent actions. All Q-values are initially set to zero or random small values.
2. Exploration and Exploitation
The agent interacts with the environment, balancing between exploring new actions and exploiting known good actions, often using an epsilon-greedy strategy.
3. Action Selection
In each state, the agent selects an action, either randomly (exploration) or based on the highest Q-value for that state (exploitation).
4. Reward Observation
After taking an action, the agent observes the reward received and the new state it has transitioned to.
5. Q-Value Update
The Q-value for the state-action pair is updated using the Q-learning formula:
Q(s,a) = Q(s,a) + α * [R + γ * max(Q(s',a')) - Q(s,a)]
Where:
- Q(s,a) is the current Q-value
- α is the learning rate
- R is the reward received
- γ is the discount factor
- max(Q(s',a')) is the maximum Q-value for the next state
6. Iteration
Steps 3-5 are repeated for many episodes, allowing the agent to learn from various experiences.
7. Convergence
Over time, the Q-values converge to optimal values, representing the expected cumulative reward for each action in each state.
8. Policy Extraction
Once training is complete, the optimal policy can be extracted by selecting the action with the highest Q-value for each state.
Q-learning is model-free (doesn't require knowledge of the environment's dynamics) and off-policy (can learn from actions not in the current policy). It effectively learns to make optimal decisions by iteratively improving its estimates of action values based on the rewards received and the structure of the environment.
Model Deployment
Model software deployment typically involves: