Autogluon AWS AutoML: Intoduction & Tabular Prediction

Hi everyone, this is Esar from GeekFeed.

Amazon Web Services recently launched the new open source automated machine learning (AutoML) library called AutoGluon. It is easy-to-use and machine learning beginner friendly. Although AutoGluon has been around for almost a couple year, it is not very well known yet. This article will give you a brief introduction about this powerful tool and an easy guide to quickly get started with AutoGluon.

 

Introduction

AutoGluon is a relatively new open source AutoML library launched by Amazon Web Services (AWS) that automates machine learning (ML) and deep learning (DL) involving text, image and tabular datasets.

Designed to be easy-to-use and easy-to-extend, AutoGluon is suitable for both machine learning beginners and experts. Experienced or new to ML, you can develop and enhance a state-of-the-art ML model using a few lines of Python code with AutoGluon.

Why use AutoGluon

Traditionally, achieving state-of-the-art ML performance required extensive background knowledge such as data preparation, missing data handling and selection of hyperparameters – as one of the most difficult tasks.

With AutoGluon the hyperparameters of each task are automatically selected using advanced tuning algorithms. you don’t have to have any familiarity with the underlying models, as all hyperparameters will be automatically tuned within default ranges that are known to perform well for the particular task and model.

Predicting Column in a Table with Tabular Prediction

AutoGluon offers several application such as Image Prediction, Object Detection, and Text Prediction.

In this article, we will demonstrate how to simply get started with AutoGluon using Tabular Prediction.

Installation

AutoGluon requires Python version 3.6, 3.7 or 3.8.

Linux and Mac are the only operating systems fully supported for now (Windows version might be available soon). If you are a Windows user, for a quick start with AutoGluon, we recommend using Jupyter Notebook or Google Colab.

Import AutoGluon Classes and Load Train Dataset

Using a sample dataset from AWS, let’s use AutoGluon to produce a classification model that predicts whether or not a person’s income exceeds $50,000.

First, import AutoGluon’s TabularPredictor and TabularDataset classes then load training data from a CSV file into AutoGluon Dataset object. 

This time we load a sample dataset file provided by AutoGluon stored in AWS S3 bucket.

And let’s view 5 first rows of the dataset.

Note: this object is equivalent to a Pandas DataFrame and the same methods can be applied to both.

For simplicity, let’s specify the column’s name that we are about to train as label.

Train Column

With fit() call, AutoGluon can produce highly-accurate models to predict the values in one column of a data table based on the rest of the columns’ values. Now let’s specify a folder to store trained models, then use AutoGluon fit() to train multiple models in train_data.

Note: training time might vary depending on the dataset’s size and complexity.

Load Test Dataset

Next, load a separate test dataset for prediction.

Since we want to compare the original value and prediction value, we use the same dataset from AWS S3 bucket. But this time we drop/delete the class column to prove that AutoGluon is not cheating.

Now we should see the same dataset without the class column.

Predict Column

Use our trained model train_data to make predictions on the new dataset test_data , then evaluate them.

 

We can see the 5 first and last index prediction for class column and the evaluated accuracy is 0.8. 

Just for fun we can also take 5 first index and compare the original data with the prediction result as below.

Predict Column Probability

Instead of predicting the data, wan can output predicted probabilities of each row.

Regression (Predicting Numeric Table Columns)

To demonstrate that fit() can also automatically handle regression tasks, we now try to predict the numeric age variable in the same table based on the other features:

We again call fit(), imposing a time-limit this time (in seconds), and also demonstrate a shorthand method to evaluate the resulting model on the test data (which contain labels):

Note that we didn’t need to tell AutoGluon this is a regression problem, it automatically inferred this from the data and reported the appropriate performance metric (RMSE by default).

 

That’s how quickly and easily we can use AutoGluon to predict column in a table using Tabular Prediction.
For more detail about Tabular Prediction and other AutoGluon applications, please refer to AutoGluon Official Documentation Website.

 

この記事が気に入ったら
いいね ! しよう

Twitter で
The following two tabs change content below.
アバター
Esar Suwandi
Software engineer based in Tokyo focused on Web Development. 4x AWS Certified: CLF, SAA, SAP, DOP. Let's build something awesome! ― Web開発に特化したソフトウェアエンジニアです。 4x AWS認定者です。素晴らしいものを作りましょう!

【採用情報】一緒に働く仲間を募集しています

採用情報
ページトップへ