Skip to main content

Multi Output Image Classification with Tensorflow

In this tutorial, you will discover:

  • How to create a multi-output model in Python using TensorFlow.
  • How to perform image classification with transfer learning using TensorFlow Hub.

The CelebFaces Attributes Dataset contains 202,599 images of face images of various celebrities with 40 binary attribute annotations per image. The task is to take one of these images as input and perform a binary classification for each of the attributes Male, Smiling and Young.

Getting Started #

We recommend that you run this code on Kaggle. It’s the simplest way to get started. Just open the CelebFaces Attributes Dataset page and click “New Notebook”.

We also recommend to make sure you have a GPU available. This will speed up training your deep learning model. On Kaggle, you can activate a GPU in your notebook “Settings” section by choosing “GPU” as Accelerator.

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pylab as plt

print("TF version:", tf.__version__)
print("Hub version:", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

Get the Data #

In this section, we prepare our data and we define a few model parameters.

We are going to use a pretrained EfficientNet V2 from TensorFlow Hub which is relatively light-weight. It expects an input size of 224 by 224 pixels, which is defined by IMAGE_SIZE. The pretrained model is defined by MODEL_HANDLE.

You can check out the website of TensorFlow Hub to explore hundreds of pretrained models. Choosing a different model than the currently selected one may lead to better results but it may also be more computationally expensive.

ATTR_PATH = "/kaggle/input/celeba-dataset/list_attr_celeba.csv"  # contains the image attributes (Male, Smiling, Young, ...)
PARTITION_PATH = "/kaggle/input/celeba-dataset/list_eval_partition.csv"  # contains the recommended partitioning of images into training, validation and testing sets.
IMAGES_PATH = "/kaggle/input/celeba-dataset/img_align_celeba/img_align_celeba/"  # contains the .jpg images

BATCH_SIZE = 128  # try a smaller batch size if you have limited computational resources
MODEL_HANDLE = "https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_ft1k_b0/feature_vector/2"  # we use this pretrained model to obtain a feature vector from each image
IMAGE_SIZE = (224, 224)  # input size required by our pretrained model

We merge the .csv files to a single dataframe containing all the required attributes.

df = pd.merge(pd.read_csv(PARTITION_PATH), pd.read_csv(ATTR_PATH), on="image_id")
df.head()

Next, we define some helper functions for preparing the data.

The dataframe originally contains the values -1 and 1 for each binary attribute. However, we would like to use a sigmoid activation function for the binary classifications. A sigmoid activation function always outputs a value between 0 and 1. Therefore, to get the values into the right range, we apply .replace(-1, 0) on each of the dataframe columns.

def preprocess_image(image):
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, IMAGE_SIZE)
    return image

def load_and_preprocess_image(path):
    image = tf.io.read_file(path)
    return preprocess_image(image)

def load_and_preprocess_from_path_label(path, male, smiling, young):
    images = load_and_preprocess_image(path)
    return images, male, smiling, young

def build_dataset_from_df(df):
    ds = tf.data.Dataset.from_tensor_slices((
        [IMAGES_PATH + image_id for image_id in df["image_id"]],
        list(df["Male"].replace(-1, 0)),
        list(df["Smiling"].replace(-1, 0)),
        list(df["Young"].replace(-1, 0))
    ))
    ds = ds.map(load_and_preprocess_from_path_label)
    ds = ds.shuffle(buffer_size=1000)
    ds = ds.repeat()
    ds = ds.batch(BATCH_SIZE)
    ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
    return ds

Using the “partition” attribute, we can split our dataframe into a training set and a validation set and build Tensorflow Datasets with our helper function.

Note that there is also a third partition (for the test set) but we are not going to use it for now.

train_df = df.loc[df["partition"] == 0]
# if you're in a rush, add: train_df = train_df.sample(n=5000)
train_ds = build_dataset_from_df(train_df)

val_df = df.loc[df["partition"] == 1]
# if you're in a rush, add: val_df = val_df.sample(n=1000)
val_ds = build_dataset_from_df(val_df)

Explore TensorFlow Dataset #

Let’s take a look at the dimensionality of the training Dataset. train_ds.take(1) returns a dataset of not 1 but 128 samples because that is our batch size. You can see this by looking at the first value of the respective shapes. The image shape is 224 by 224 pixels and the last value 3 shows that each pixel contains RGB (red, green, blue) values. Male, Smiling and Young have no further dimensionality because they are scalars.

for image, male, smiling, young in train_ds.take(1):
    print("Image shape: ", image.numpy().shape)
    print("Label gender: ", male.shape)
    print("Label smiling: ", smiling.shape)
    print("Label young: ", young.shape)

Output:

Image shape:  (128, 224, 224, 3)
Label gender:  (128,)
Label smiling:  (128,)
Label young:  (128,)

Let’s take a look at some example images with their respective attributes.

image, male, smiling, young = next(iter(train_ds))
plt.figure(figsize=(10, 10))
for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(image[i].numpy().astype("uint8"))
    s = f"Male: {male[i].numpy()}, Smiling: {smiling[i].numpy()}, Young: {young[i].numpy()}"
    plt.title(s)
    plt.axis("off")

Celebrities

Normalization and Data Augmentation #

Now, we add a normalization layer which scales each value of the input down to a range of 0 to 1. For this, we divide the values by 255.

This is also where we could add data augmentation (set do_data_augmentation = True if you want to try). Data augmentation can improve model performance but we deactivate it to save computational resources.

normalization_layer = tf.keras.layers.Rescaling(1. / 255)
preprocessing_model = tf.keras.Sequential([normalization_layer])
do_data_augmentation = False
if do_data_augmentation:
    preprocessing_model.add(tf.keras.layers.RandomRotation(0.2))
    preprocessing_model.add(tf.keras.layers.RandomTranslation(0, 0.2))
    preprocessing_model.add(tf.keras.layers.RandomTranslation(0.2, 0))
    preprocessing_model.add(tf.keras.layers.RandomZoom(0.2, 0.2))
    preprocessing_model.add(tf.keras.layers.RandomFlip(mode="horizontal"))
train_ds = train_ds.map(lambda images, male, smiling, young:
                        (preprocessing_model(images), (male, smiling, young)))

val_ds = val_ds.map(lambda images, male, smiling, young:
                    (normalization_layer(images), (male, smiling, young)))

Build the Model #

We use a pretrained model defined by MODEL_HANDLE to obtain a feature vector from each image and build a classification network which takes the feature vector as input.

Our task is multi-output classification, so for each of the outputs we create a Dense layer with a sigmoid activation function.

do_fine_tuning = False
input = tf.keras.Input(shape=IMAGE_SIZE + (3,))
x = hub.KerasLayer(MODEL_HANDLE, trainable=do_fine_tuning)(input)
x = tf.keras.layers.Dropout(rate=0.2)(x)
x = tf.keras.layers.Dense(128, activation="relu")(x)

out_male = tf.keras.layers.Dense(1, kernel_regularizer=tf.keras.regularizers.l2(0.0001), activation="sigmoid", name='male')(x)
out_smiling = tf.keras.layers.Dense(1, kernel_regularizer=tf.keras.regularizers.l2(0.0001), activation="sigmoid", name='smiling')(x)
out_young = tf.keras.layers.Dense(1, kernel_regularizer=tf.keras.regularizers.l2(0.0001), activation="sigmoid", name='young')(x)

model = tf.keras.Model( inputs = input, outputs = [out_male, out_smiling, out_young])
model.summary()

Output:

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
keras_layer (KerasLayer)        (None, 1280)         5919312     input_1[0][0]                    
__________________________________________________________________________________________________
dropout (Dropout)               (None, 1280)         0           keras_layer[0][0]                
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          163968      dropout[0][0]                    
__________________________________________________________________________________________________
male (Dense)                    (None, 1)            129         dense[0][0]                      
__________________________________________________________________________________________________
smiling (Dense)                 (None, 1)            129         dense[0][0]                      
__________________________________________________________________________________________________
young (Dense)                   (None, 1)            129         dense[0][0]                      
==================================================================================================
Total params: 6,083,667
Trainable params: 164,355
Non-trainable params: 5,919,312
__________________________________________________________________________________________________

For each of our outputs, we use binary crossentropy as loss function and accuracy as metrics.

model.compile(
    loss = {
        "male": tf.keras.losses.BinaryCrossentropy(),
        "smiling": tf.keras.losses.BinaryCrossentropy(),
        "young": tf.keras.losses.BinaryCrossentropy()
    },
    metrics = {
        "male": 'accuracy',
        "smiling": 'accuracy',
        "young": 'accuracy'
    },
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)

Train the Model #

Let’s start training!

steps_per_epoch = len(train_df) // BATCH_SIZE
validation_steps = len(val_df) // BATCH_SIZE
hist = model.fit(
    train_ds,
    epochs=3, steps_per_epoch=steps_per_epoch,
    validation_data=val_ds,
    validation_steps=validation_steps).history

Evaluate Accuracy #

We plot the loss and the accuracy to see how our model improved over the course of the epochs.

fig, ax = plt.subplots(3, 2, figsize=(15, 12))
for i, c in enumerate(["male", "smiling", "young"]):
    ax[i, 0].plot(hist[f"{c}_loss"], label="train")
    ax[i, 0].plot(hist[f"val_{c}_loss"], label="val")
    ax[i, 0].set_title(f"Loss ({c})")
    ax[i, 0].legend()
    ax[i, 1].plot(hist[f"{c}_accuracy"], label="train")
    ax[i, 1].plot(hist[f"val_{c}_accuracy"], label="val")
    ax[i, 1].set_title(f"Accuracy ({c})")
    ax[i, 1].legend()
plt.show()

Loss and Accuracy

Make Predictions #

Let’s test the model on an example image.

x, y = next(iter(val_ds))
image = x[0, :, :, :]
plt.imshow(image)
plt.axis('off')
plt.show()

prediction_scores = model.predict(np.expand_dims(image, axis=0))
for i, label in enumerate(["Male", "Smiling", "Young"]):
    pred = prediction_scores[i][0][0]
    print(f"{label}: actual {y[i][0]}, predicted {1 if pred > 0.5 else 0} ({format(pred, '.4f')})")

Example Prediction

Conclusion #

In this tutorial, you learned how to create a multi-output model in Python using Tensorflow and how to perform image classification with transfer learning using Tensorflow Hub.

If you like to improve the accuracy of the model presented in this tutorial, here are a number of things you can try:

  • Apply Data Augmentation (set do_data_augmentation = True).
  • Adjust the number of epochs.
  • Try a different model architecture, e.g. use a different pretrained model.
  • Fine tune the pretrained model (set do_fine_tuning = True).
  • Perform Hyperparameter Optimization and make use of the dataset’s third partition as the actual validation set.

If you like this tutorial or if you have suggestions for improvements, please let me know! :)