{"id":29,"date":"2022-05-19T10:00:00","date_gmt":"2022-05-19T10:00:00","guid":{"rendered":"https:\/\/quadbase.com\/blog\/?p=29"},"modified":"2023-06-21T20:10:41","modified_gmt":"2023-06-21T20:10:41","slug":"an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models","status":"publish","type":"post","link":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/","title":{"rendered":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">One big challenge that faces casualty insurance companies is that they tend to be overwhelmed by the amount of claims. To streamline the workload, it would be desirable if the claims can be prioritized in some ways. In the case of automobile accidents, the claim adjuster would probably like to know the potential loss based on the make and model of the vehicles and prioritize his cases on that metric.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this blog, we are going to use a dataset from UCI. The main goal here is to illustrate a powerful feature called pipeline in Python&#8217;s Scikit Learn library where you can transform data on the fly while training your machine learning model. You may have noticed that the term pipeline is used extensively in machine learning. It may mean very different things when used elsewhere! But in this context, we&#8217;re using it to refer to pipeline objects in Scikit-Learn.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The data set and pertinent information can be found in the link below and will be not repeated here. Please read the description of the data set and refer to it when necessary.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/automobile\">https:\/\/archive.ics.uci.edu\/ml\/datasets\/automobile<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The original data file imports-85.data is comma delimited. So the extension was changed, column headings added and extension changed to csv. Rows with missing values in the \u201cnormalized-losses\u201d column were removed and the file was renamed to AutoInsuranceClaimNoMissingLoss.csv for training the model.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">#Import modules we'll need for this exercise\nimport pandas as pd\nfrom sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor\nfrom sklearn.metrics import mean_squared_error, r2_score\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n#Load and train data set\nclaim_data = pd.read_csv(&quot;c:\/doc\/AutoInsuranceClaimNoMissingLoss - 3_31_22.csv&quot;)\nclaim_data.head()<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"995\" height=\"154\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\" alt=\"\" class=\"wp-image-51\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png 995w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1-300x46.png 300w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1-768x119.png 768w\" sizes=\"auto, (max-width: 995px) 100vw, 995px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Examine the data types in the data frame.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;file&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;Data Frame&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">claim_data.dtypes\nsymboling              int64\nnormalized.losses      int64\nmake                  object\nfuel.type             object\naspiration            object\nnum.of.doors          object\nbody.style            object\ndrive.wheels          object\nengine.location       object\nwheel.base           float64\nlength               float64\nwidth                float64\nheight               float64\ncurb.weight            int64\nengine.type           object\nnum.of.cylinders      object\nengine.size            int64\nfuel.system           object\nbore                 float64\nstroke               float64\ncompression.ratio    float64\nhorsepower             int64\npeak.rpm               int64\ncity.mpg               int64\nhighway.mpg            int64\nprice                  int64\ndtype: object\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Identify the numerical features and categorical features and look at the statistics of the numerical features.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">numeric_features = ['symboling','wheel.base','length','width','height','curb.weight','engine.size','bore','stroke','compression.ratio','horsepower','peak.rpm','city.mpg','highway.mpg','price']\ncategorical_features = ['make',\t'fuel.type','aspiration','num.of.doors','body.style','drive.wheels','engine.location','engine.type','num.of.cylinders','fuel.system']\nclaim_data[numeric_features + ['normalized.losses']].describe().of.cylinders','fuel.system']claim_data[numeric_features + ['normalized.losses']].describe()\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1007\" height=\"258\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic3.png\" alt=\"\" class=\"wp-image-52\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic3.png 1007w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic3-300x77.png 300w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic3-768x197.png 768w\" sizes=\"auto, (max-width: 1007px) 100vw, 1007px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Notice that columns bore and stroke have 0\u2019s. According to the document, bore should have values&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from 2.54 to 3.94 and stroke should have values from 2.07 to 4.17. We will replace 0 with the mean of the feature.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">bore_mean=claim_data[claim_data[&quot;bore&quot;] != 0][&quot;bore&quot;].mean()\nclaim_data['bore'] = np.where(claim_data['bore'].eq(0),bore_mean,claim_data['bore'])\nstroke_mean=claim_data[claim_data[&quot;stroke&quot;] != 0][&quot;stroke&quot;].mean() \nclaim_data['stroke'] = np.where(claim_data['stroke'].eq(0),stroke_mean,claim_data['stroke'])\nclaim_data[numeric_features + ['normalized.losses']].describe()\n<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1012\" height=\"257\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic8.png\" alt=\"\" class=\"wp-image-58\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic8.png 1012w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic8-300x76.png 300w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic8-768x195.png 768w\" sizes=\"auto, (max-width: 1012px) 100vw, 1012px\" \/><\/figure>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Separate features and labels\n# After separating the dataset, we now have numpy arrays named **X** containing the features, and **y** containing the labels.\n\nX, y = claim_data[numeric_features + categorical_features ].values, claim_data['normalized.losses'].values\n\n# Split data 70%-30% into training set and test set\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)\n\nprint ('Training Set: %d rows\\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">Training Set: 114 rows\nTest Set: 50 rows<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">So far, we are looking at the data that is virtually loaded straight from a source file with only a little preprocessing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, it&#8217;s common to perform much more preprocessing of the data to make it easier for the algorithm to fit a model to it. There&#8217;s a huge range of preprocessing transformations you can perform to get your data ready for modeling. In fact according to surveys, data scientists spend about 80% of their time organizing data, doing feature engineering and preprocessing transformations of data. But we&#8217;ll limit ourselves to a few common techniques for this short demo of how pipeline works.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scaling numeric features<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Normalizing numeric features so they&#8217;re on the same scale is important. It prevents features with large values from producing coefficients that disproportionately affect the predictions.&nbsp; When all features are in the same scale, it also helps algorithms to understand the relative relationship better.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There are multiple ways you can scale numeric data, such as calculating the minimum and maximum values for each column and assigning a proportional value between 0 and 1, or by using the mean and standard deviation of a normally distributed variable to maintain the same spread of values on a different scale.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More info can be found in the link below.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.kdnuggets.com\/2020\/09\/feature-engineering-numerical-data.html\">https:\/\/www.kdnuggets.com\/2020\/09\/feature-engineering-numerical-data.html<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encoding categorical variables<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Many machine learning models do not work with text values. Therefore, you generally need to convert categorical features into numeric representations. There are many ways to encode text values to numerical values, such as ordinal encoding which substitutes a unique integer value for each category, and one hot encoding that creates individual binary (0 or 1) features for each possible category value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can learn more about it in the following link.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.kdnuggets.com\/2019\/07\/categorical-features-machine-learning.html\">https:\/\/www.kdnuggets.com\/2019\/07\/categorical-features-machine-learning.html<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To apply these preprocessing transformations to the insurance claims data, we&#8217;ll make use of a Scikit-Learn feature called pipelines. These enable us to define a set of preprocessing steps that end with an algorithm. You can then fit the entire pipeline to the data, so that the model encapsulates all of the preprocessing steps as well as the regression algorithm. This is useful, because when we want to use the model to predict values from new data, we need to apply the same transformations (based on the same statistical distributions and category encodings used with the training data).<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Train the model\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\n\n# Train the model\nfrom sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor\n\nimport numpy as np\n\n# Define preprocessing for numeric columns (scale them)\nnumeric_features = [0,1,2,3,4,5,6,7,8,9,1,11,12,13]\nnumeric_transformer = Pipeline(steps=[\n    ('scaler', StandardScaler())])\n\n# Define preprocessing for categorical features (encode them)\ncategorical_features = [14,15,16,17,18,19,20,21,22,23]\ncategorical_transformer = Pipeline(steps=[\n    ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n\n# Combine preprocessing steps\npreprocessor = ColumnTransformer(\n    transformers=[\n        ('num', numeric_transformer, numeric_features),\n        ('cat', categorical_transformer, categorical_features)])\n\n# Create preprocessing and training pipeline\npipeline = Pipeline(steps=[('preprocessor', preprocessor),\n                           ('regressor', GradientBoostingRegressor())])\n\n\n# fit the pipeline to train a linear regression model on the training set\nmodel = pipeline.fit(X_train, (y_train))\nprint (model)\n\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">Pipeline(steps=[('preprocessor',\n                 ColumnTransformer(transformers=[('num',\n                                                  Pipeline(steps=[('scaler',\n                                                                   StandardScaler())]),\n                                                  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n                                                   1, 11, 12, 13]),\n                                                 ('cat',\n                                                  Pipeline(steps=[('onehot',\n                                                                   OneHotEncoder(handle_unknown='ignore'))]),\n                                                  [14, 15, 16, 17, 18, 19, 20,\n                                                   21, 22, 23])])),\n                ('regressor', GradientBoostingRegressor())])\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The model is trained with GradientBoostingRegressor, including the preprocessing steps. The following code shows how it performs with the validation data.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Get predictions\npredictions = model.predict(X_test)\n\n# Display metrics\nmse = mean_squared_error(y_test, predictions)\nprint(&quot;MSE:&quot;, mse)\nrmse = np.sqrt(mse)\nprint(&quot;RMSE:&quot;, rmse)\nr2 = r2_score(y_test, predictions)\nprint(&quot;R2:&quot;, r2)\n\n# Plot predicted vs actual\nplt.scatter(y_test, predictions)\nplt.xlabel('Actual Labels')\nplt.ylabel('Predicted Labels')\nplt.title('Insurance Claim Predictions')\nz = np.polyfit(y_test, predictions, 1)\np = np.poly1d(z)\nplt.plot(y_test,p(y_test), color='magenta')\nplt.show()\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">MSE: 266.7675806643773\nRMSE: 16.3330211738177\nR2: 0.7351873720799795\n<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"390\" height=\"278\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic5.png\" alt=\"\" class=\"wp-image-53\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic5.png 390w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic5-300x214.png 300w\" sizes=\"auto, (max-width: 390px) 100vw, 390px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The final pipeline is composed of two pipelines that do the transformations (preprocessor) and the algorithm used to train the model. To try an alternative algorithm you can just change the final pipeline to include a different kind of estimator. Code example below shows that the final pipeline uses the RandomForestRegressor.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Use a different estimator in the pipeline\npipeline = Pipeline(steps=[('preprocessor', preprocessor),\n                           ('regressor', RandomForestRegressor())])\n\n\n# fit the pipeline to train a linear regression model on the training set\nmodel = pipeline.fit(X_train, (y_train))\nprint (model, &quot;\\n&quot;)\n\n# Get predictions\npredictions = model.predict(X_test)\n\n# Display metrics\nmse = mean_squared_error(y_test, predictions)\nprint(&quot;MSE:&quot;, mse)\nrmse = np.sqrt(mse)\nprint(&quot;RMSE:&quot;, rmse)\nr2 = r2_score(y_test, predictions)\nprint(&quot;R2:&quot;, r2)\n\n# Plot predicted vs actual\nplt.scatter(y_test, predictions)\nplt.xlabel('Actual Labels')\nplt.ylabel('Predicted Labels')\nplt.title('nsurance Claim Predictions - Preprocessed')\nz = np.polyfit(y_test, predictions, 1)\np = np.poly1d(z)\nplt.plot(y_test,p(y_test), color='magenta')\nplt.show()\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">Pipeline(steps=[('preprocessor',\n                 ColumnTransformer(transformers=[('num',\n                                                  Pipeline(steps=[('scaler',\n                                                                   StandardScaler())]),\n                                                  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n                                                   1, 11, 12, 13]),\n                                                 ('cat',\n                                                  Pipeline(steps=[('onehot',\n                                                                   OneHotEncoder(handle_unknown='ignore'))]),\n                                                  [14, 15, 16, 17, 18, 19, 20,\n                                                   21, 22, 23])])),\n                ('regressor', RandomForestRegressor())]) \n\nMSE: 228.05054600000003\nRMSE: 15.101342523100389\nR2: 0.7736206767162103\n<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"390\" height=\"278\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic6.png\" alt=\"\" class=\"wp-image-54\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic6.png 390w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic6-300x214.png 300w\" sizes=\"auto, (max-width: 390px) 100vw, 390px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now we have seen how to use pipeline to transform data and train models. The question is can we also include hyperparameter tuning in the pipeline. The following code shows a way to do just that.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">from sklearn.model_selection import GridSearchCV\nfrom sklearn.metrics import make_scorer, r2_score\n\n# Use a Gradient Boosting algorithm\n\nalg = GradientBoostingRegressor()\n\n# Try these hyperparameter values\nparams = {\n 'learning_rate': [0.1,0.3, 0.5,0.8, 1.0],\n 'n_estimators' : [50, 75, 100, 125,150]\n}\n\n# Find the best hyperparameter combination to optimize the R2 metric\nscore = make_scorer(r2_score)\ngridsearch = GridSearchCV(alg, params, scoring=score, cv=3, return_train_score=True)\n#gridsearch.fit(X_train, y_train)\n\n\n\n\n# Define preprocessing for numeric columns (scale them)\nnumeric_features = [0,1,2,3,4,5,6,7,8,9,1,11,12,13]\nnumeric_transformer = Pipeline(steps=[\n    ('scaler', StandardScaler())])\n\n# Define preprocessing for categorical features (encode them)\ncategorical_features = [14,15,16,17,18,19,20,21,22,23]\ncategorical_transformer = Pipeline(steps=[\n    ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n\n# Combine preprocessing steps\npreprocessor = ColumnTransformer(\n    transformers=[\n        ('num', numeric_transformer, numeric_features),\n        ('cat', categorical_transformer, categorical_features)])\n\n# Create preprocessing and training pipeline\npipeline = Pipeline(steps=[('preprocessor', preprocessor),\n                           ('regressor', gridsearch)])\n\n\n# fit the pipeline to train a linear regression model on the training set\nmodel = pipeline.fit(X_train, (y_train))\nprint (model)\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">Pipeline(steps=[('preprocessor',\n                 ColumnTransformer(transformers=[('num',\n                                                  Pipeline(steps=[('scaler',\n                                                                   StandardScaler())]),\n                                                  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n                                                   1, 11, 12, 13]),\n                                                 ('cat',\n                                                  Pipeline(steps=[('onehot',\n                                                                   OneHotEncoder(handle_unknown='ignore'))]),\n                                                  [14, 15, 16, 17, 18, 19, 20,\n                                                   21, 22, 23])])),\n                ('regressor',\n                 GridSearchCV(cv=3, estimator=GradientBoostingRegressor(),\n                              param_grid={'learning_rate': [0.1, 0.3, 0.5, 0.8,\n                                                            1.0],\n                                          'n_estimators': [50, 75, 100, 125,\n                                                           150]},\n                              return_train_score=True,\n                              scoring=make_scorer(r2_score)))]\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:true,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\"># Get predictions\npredictions = model.predict(X_test)\n\n# Display metrics\nmse = mean_squared_error(y_test, predictions)\nprint(&quot;MSE:&quot;, mse)\nrmse = np.sqrt(mse)\nprint(&quot;RMSE:&quot;, rmse)\nr2 = r2_score(y_test, predictions)\nprint(&quot;R2:&quot;, r2)\n\n# Plot predicted vs actual\nplt.scatter(y_test, predictions)\nplt.xlabel('Actual Labels')\nplt.ylabel('Predicted Labels')\nplt.title('Insurance Claim Predictions')\nz = np.polyfit(y_test, predictions, 1)\np = np.poly1d(z)\nplt.plot(y_test,p(y_test), color='magenta')\nplt.show()\n<\/pre><\/div>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:false,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;base16-light&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:true,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">MSE: 292.23372823226\nRMSE: 17.094845077749607\nR2: 0.709907848070147\n<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"390\" height=\"278\" src=\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic9.png\" alt=\"\" class=\"wp-image-59\" srcset=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic9.png 390w, https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic9-300x214.png 300w\" sizes=\"auto, (max-width: 390px) 100vw, 390px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">That concludes the introduction to pipeline in Scikit Learn We have shown code examples on how to build pipelines to transform data and train models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can download the notebook and the data set from the links below.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.quadbase.com\/upload\/Predict_Insurance_Claim.ipynb\">https:\/\/www.quadbase.com\/upload\/Predict_Insurance_Claim.ipynb<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.quadbase.com\/upload\/AutoInsuranceClaimNoMissingLoss_3_31_22.csv\">https:\/\/www.quadbase.com\/upload\/AutoInsuranceClaimNoMissingLoss_3_31_22.csv<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One big challenge that faces casualty insurance companies is that they tend to be overwhelmed by the amount of claims. To streamline the workload, it would be desirable if the &#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","footnotes":""},"categories":[7],"tags":[3,4],"class_list":["post-29","post","type-post","status-publish","format-standard","hentry","category-software","tag-linux","tag-windows"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.\" \/>\n<meta property=\"og:description\" content=\"One big challenge that faces casualty insurance companies is that they tend to be overwhelmed by the amount of claims. To streamline the workload, it would be desirable if the ...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Quadbase Systems Inc.\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/QuadbaseSystemsInc\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-19T10:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-21T20:10:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"995\" \/>\n\t<meta property=\"og:image:height\" content=\"154\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"quadbase\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Quadbase\" \/>\n<meta name=\"twitter:site\" content=\"@Quadbase\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"quadbase\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\"},\"author\":{\"name\":\"quadbase\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/#\/schema\/person\/547fc659bc4b72e45049ed279a4fadc8\"},\"headline\":\"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models\",\"datePublished\":\"2022-05-19T10:00:00+00:00\",\"dateModified\":\"2023-06-21T20:10:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\"},\"wordCount\":883,\"publisher\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\",\"keywords\":[\"Linux\",\"Windows\"],\"articleSection\":[\"Software\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\",\"url\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\",\"name\":\"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.\",\"isPartOf\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\",\"datePublished\":\"2022-05-19T10:00:00+00:00\",\"dateModified\":\"2023-06-21T20:10:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage\",\"url\":\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\",\"contentUrl\":\"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"HOME\",\"item\":\"https:\/\/www.quadbase.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/#website\",\"url\":\"https:\/\/www.quadbase.com\/blog\/\",\"name\":\"Quadbase Systems Inc.\",\"description\":\"Company blog about enterprise reporting, java charts, business intelligence.\",\"publisher\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.quadbase.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/#organization\",\"name\":\"Quadbase Systems Inc.\",\"url\":\"https:\/\/www.quadbase.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2023\/09\/Method-Draw-Image27.png\",\"contentUrl\":\"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2023\/09\/Method-Draw-Image27.png\",\"width\":199,\"height\":90,\"caption\":\"Quadbase Systems Inc.\"},\"image\":{\"@id\":\"https:\/\/www.quadbase.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/QuadbaseSystemsInc\/\",\"https:\/\/x.com\/Quadbase\",\"https:\/\/www.youtube.com\/user\/QuadbaseSystems\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.quadbase.com\/blog\/#\/schema\/person\/547fc659bc4b72e45049ed279a4fadc8\",\"name\":\"quadbase\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g\",\"caption\":\"quadbase\"},\"url\":\"https:\/\/www.quadbase.com\/blog\/author\/quadbase\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/","og_locale":"en_US","og_type":"article","og_title":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.","og_description":"One big challenge that faces casualty insurance companies is that they tend to be overwhelmed by the amount of claims. To streamline the workload, it would be desirable if the ...","og_url":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/","og_site_name":"Quadbase Systems Inc.","article_publisher":"https:\/\/www.facebook.com\/QuadbaseSystemsInc\/","article_published_time":"2022-05-19T10:00:00+00:00","article_modified_time":"2023-06-21T20:10:41+00:00","og_image":[{"width":995,"height":154,"url":"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png","type":"image\/png"}],"author":"quadbase","twitter_card":"summary_large_image","twitter_creator":"@Quadbase","twitter_site":"@Quadbase","twitter_misc":{"Written by":"quadbase","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#article","isPartOf":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/"},"author":{"name":"quadbase","@id":"https:\/\/www.quadbase.com\/blog\/#\/schema\/person\/547fc659bc4b72e45049ed279a4fadc8"},"headline":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models","datePublished":"2022-05-19T10:00:00+00:00","dateModified":"2023-06-21T20:10:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/"},"wordCount":883,"publisher":{"@id":"https:\/\/www.quadbase.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage"},"thumbnailUrl":"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png","keywords":["Linux","Windows"],"articleSection":["Software"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/","url":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/","name":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models - Quadbase Systems Inc.","isPartOf":{"@id":"https:\/\/www.quadbase.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage"},"image":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage"},"thumbnailUrl":"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png","datePublished":"2022-05-19T10:00:00+00:00","dateModified":"2023-06-21T20:10:41+00:00","breadcrumb":{"@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#primaryimage","url":"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png","contentUrl":"https:\/\/quadbase.com\/blog\/wp-content\/uploads\/2022\/05\/pic1-1.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.quadbase.com\/blog\/an-example-of-using-scikit-learns-pipeline-in-python-to-build-machine-learning-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"HOME","item":"https:\/\/www.quadbase.com\/blog\/"},{"@type":"ListItem","position":2,"name":"An example of using Scikit-Learn\u2019s pipeline in Python to build machine learning models"}]},{"@type":"WebSite","@id":"https:\/\/www.quadbase.com\/blog\/#website","url":"https:\/\/www.quadbase.com\/blog\/","name":"Quadbase Systems Inc.","description":"Company blog about enterprise reporting, java charts, business intelligence.","publisher":{"@id":"https:\/\/www.quadbase.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.quadbase.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.quadbase.com\/blog\/#organization","name":"Quadbase Systems Inc.","url":"https:\/\/www.quadbase.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.quadbase.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2023\/09\/Method-Draw-Image27.png","contentUrl":"https:\/\/www.quadbase.com\/blog\/wp-content\/uploads\/2023\/09\/Method-Draw-Image27.png","width":199,"height":90,"caption":"Quadbase Systems Inc."},"image":{"@id":"https:\/\/www.quadbase.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/QuadbaseSystemsInc\/","https:\/\/x.com\/Quadbase","https:\/\/www.youtube.com\/user\/QuadbaseSystems"]},{"@type":"Person","@id":"https:\/\/www.quadbase.com\/blog\/#\/schema\/person\/547fc659bc4b72e45049ed279a4fadc8","name":"quadbase","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/14148ecbe810a872c3aa322d49f590829742ea41bc80e6bbd234b131fcfa0746?s=96&d=mm&r=g","caption":"quadbase"},"url":"https:\/\/www.quadbase.com\/blog\/author\/quadbase\/"}]}},"_links":{"self":[{"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/posts\/29","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/comments?post=29"}],"version-history":[{"count":1,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/posts\/29\/revisions"}],"predecessor-version":[{"id":465,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/posts\/29\/revisions\/465"}],"wp:attachment":[{"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/media?parent=29"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/categories?post=29"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.quadbase.com\/blog\/wp-json\/wp\/v2\/tags?post=29"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}