Amazon Redshift ML permits information analysts, builders, and information scientists to coach machine studying (ML) fashions utilizing SQL. In earlier posts, we demonstrated how you should use the automated mannequin coaching functionality of Redshift ML to coach classification and regression fashions. Redshift ML means that you can create a mannequin utilizing SQL and specify your algorithm, corresponding to XGBoost. You need to use Redshift ML to automate information preparation, preprocessing, and collection of your drawback kind (for extra info, consult with Create, prepare, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML). You can too carry a mannequin beforehand educated in Amazon SageMaker into Amazon Redshift through Redshift ML for native inference. For native inference on fashions created in SageMaker, the ML mannequin kind have to be supported by Redshift ML. Nevertheless, distant inference is out there for mannequin varieties that aren’t natively out there in Redshift ML.
Over time, ML fashions develop previous, and even when nothing drastic occurs, small modifications accumulate. Widespread explanation why ML fashions must be retrained or audited embody:
- Knowledge drift – As a result of your information has modified over time, the prediction accuracy of your ML fashions could start to lower in comparison with the accuracy exhibited throughout testing
- Idea drift – The ML algorithm that was initially used could have to be modified attributable to totally different enterprise environments and different altering wants
Chances are you’ll have to refresh the mannequin frequently, automate the method, and reevaluate your mannequin’s improved accuracy. As of this writing, Amazon Redshift doesn’t assist versioning of ML fashions. On this put up, we present how you should use the carry your individual mannequin (BYOM) performance of Redshift ML to implement versioning of Redshift ML fashions.
We use native inference to implement mannequin versioning as a part of operationalizing ML fashions. We assume that you’ve an excellent understanding of your information and the issue kind that’s most relevant to your use case, and have created and deployed fashions to manufacturing.
Resolution overview
On this put up, we use Redshift ML to construct a regression mannequin that predicts the variety of individuals which will use town of Toronto’s bike sharing service at any given hour of a day. The mannequin accounts for numerous features, together with holidays and climate circumstances, and since we have to predict a numerical end result, we used a regression mannequin. We use information drift as a cause for retraining the mannequin, and use mannequin versioning as a part of the answer.
After a mannequin is validated and is getting used frequently for working predictions, you’ll be able to create variations of the fashions, which requires you to retrain the mannequin utilizing an up to date coaching set and presumably a unique algorithm. Versioning serves two major functions:
- You possibly can consult with prior variations of a mannequin for troubleshooting or audit functions. This permits you to make sure that your mannequin nonetheless retains excessive accuracy earlier than switching to a more recent mannequin model.
- You possibly can proceed to run inference queries on the present model of a mannequin through the mannequin coaching technique of the brand new model.
On the time of this writing, Redshift ML doesn’t have native versioning capabilities, however you’ll be able to nonetheless obtain versioning by implementing a couple of easy SQL methods through the use of the BYOM functionality. BYOM was launched to assist pre-trained SageMaker fashions to run your inference queries in Amazon Redshift. On this put up, we use the identical BYOM method to create a model of an current mannequin constructed utilizing Redshift ML.
The next determine illustrates this workflow.
Within the following sections, we present you can create a model from an current mannequin after which carry out mannequin retraining.
Conditions
As a prerequisite for implementing the instance on this put up, it’s good to arrange a Redshift cluster or Amazon Redshift Serverless endpoint. For the preliminary steps to get began and arrange your surroundings, consult with Create, prepare, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML.
We use the regression mannequin created within the put up Construct regression fashions with Amazon Redshift ML. We assume that it’s already been deployed and use this mannequin to create new variations and retrain the mannequin.
Create a model from the present mannequin
Step one is to create a model of the present mannequin (which suggests saving developmental modifications of the mannequin) so {that a} historical past is maintained and the mannequin is out there for comparability afterward.
The next code is the generic format of the CREATE MODEL command syntax; within the subsequent step, you get the data wanted to make use of this command to create a brand new model:
Subsequent, we acquire and apply the enter parameters to the previous CREATE MODEL code to the mannequin. We’d like the job title and the information forms of the mannequin enter and output values. We acquire these by working the present mannequin
command on our current mannequin. Run the next command in Amazon Redshift Question Editor v2:
Observe the values for AutoML Job Identify, Perform Parameter Sorts, and the Goal Column (trip_count
) from the mannequin output. We use these values within the CREATE MODEL command to create the model.
The next CREATE MODEL assertion creates a model of the present mannequin utilizing the values collected from our present mannequin
command. We append the date (the instance format is YYYYMMDD) to the tip of the mannequin and performance names to trace when this new model was created.
This command could take jiffy to finish. When it’s full, run the next command:
We will observe the next within the output:
- AutoML Job Identify is similar as the unique model of the mannequin
- Perform Identify reveals the brand new title, as anticipated
- Inference Kind reveals
Native
, which designates that is BYOM with native inference
You possibly can run inference queries utilizing each variations of the mannequin to validate the inference outputs.
The next screenshot reveals the output of the mannequin inference utilizing the unique model.
The next screenshot reveals the output of mannequin inference utilizing the model copy.
As you’ll be able to see, the inference outputs are the identical.
You will have now discovered create a model of a beforehand educated Redshift ML mannequin.
Retrain your Redshift ML mannequin
After you create a model of an current mannequin, you’ll be able to retrain the present mannequin by merely creating a brand new mannequin.
You possibly can create and prepare a brand new mannequin utilizing similar CREATE MODEL command however utilizing totally different enter parameters, datasets, or drawback varieties as relevant. For this put up, we retrain the mannequin on newer datasets. We append _new
to the mannequin title so it’s much like the present mannequin for identification functions.
Within the following code, we use the CREATE MODEL command with a brand new dataset out there within the training_data
desk:
Run the next command to test the standing of the brand new mannequin:
Change the present Redshift ML mannequin with the retrained mannequin
The final step is to interchange the present mannequin with the retrained mannequin. We do that by dropping the unique model of the mannequin and recreating a mannequin utilizing the BYOM method.
First, test your retrained mannequin to make sure the MSE/RMSE scores are staying steady between mannequin coaching runs. To validate the fashions, you’ll be able to run inferences by every of the mannequin features in your dataset and examine the outcomes. We use the inference queries supplied in Construct regression fashions with Amazon Redshift ML.
After validation, you’ll be able to substitute your mannequin.
Begin by gathering the main points of the predict_rental_count_new
mannequin.
Observe the AutoML Job Identify worth, the Perform Parameter Sorts values, and the Goal Column title within the mannequin output.
Change the unique mannequin by dropping the unique mannequin after which creating the mannequin with the unique mannequin and performance names to ensure the present references to the mannequin and performance names work:
The mannequin creation ought to full in a couple of minutes. You possibly can test the standing of the mannequin by working the next command:
When the mannequin standing is prepared
, the newer model predict_rental_count
of your current mannequin is out there for inference and the unique model of the ML mannequin predict_rental_count_20230706
is out there for reference if wanted.
Please consult with this GitHub repository for pattern scripts to automate mannequin versioning.
Conclusion
On this put up, we confirmed how you should use the BYOM function of Redshift ML to do mannequin versioning. This lets you have a historical past of your fashions as a way to examine mannequin scores over time, reply to audit requests, and run inferences whereas coaching a brand new mannequin.
For extra details about constructing totally different fashions with Redshift ML, consult with Amazon Redshift ML.
Concerning the Authors
Rohit Bansal is an Analytics Specialist Options Architect at AWS. He focuses on Amazon Redshift and works with clients to construct next-generation analytics options utilizing different AWS Analytics companies.
Phil Bates is a Senior Analytics Specialist Options Architect at AWS. He has greater than 25 years of expertise implementing large-scale information warehouse options. He’s keen about serving to clients via their cloud journey and utilizing the facility of ML inside their information warehouse.