Hoppa till innehåll

Neural stilöverföring i Python

Jag skapar en AI-konstnär som tillämpar neural stilöverföring i denna handledning, detta för att kunna skapa en ny bild utifrån en kombination av två bilder. Neural stilöverföring (NST) är en maskininlärningsalgoritm som adapterar en visuell stil till en annan bild eller video. NST används för att skapa konstgjorda konstverk genom att kombinera en innehållsbild och en stilreferensbild.

Neural stilöverföring introducerades år 2015 av Leon A. Gatys, Alexander S. Ecker och Matthias Bethge, algoritmen publicerades i A Neural Algorithm of Artistic Style. Författarna använde ett neuralt faltningsnätverk (CNN) med en VGG19-arkitektur, modellen hade förtränats på bilder från ImageNet-projektet.

Datauppsättning och bibliotek

Jag använder en förtränad VGG19-modell med vikter från ImageNet i denna handledning. Datauppsättningen består av ett fotografi och en stilreferensbild, bilderna visas nedan. Jag valde att använda 256×256 bilder för att få en kort träningstid. Jag använder följande bibliotek: os, time, argparse, numpy, keras and scipy.

Neural stilöverföring

Träning

Jag valde att sätta vikten till 30 % för innehållsbilden och vikten för stilreferensbilden till 70 %, målstorleken för den kombinerade bilden är 256 rader gånger 256 kolumner. Jag har kört koden i 200 iterationer (10, 10, 80, 100) och slutbilden visas till höger i bilden ovan. Resultatet från en körning visas nedanför koden.

# Import libraries
import os
import time
import argparse
import numpy as np
import keras
import keras.preprocessing
import scipy.optimize

# Evaluator class that makes it possible to compute loss and gradients in one pass
class Evaluator(object):

    # Initialize the class
    def __init__(self, rows:int, cols:int, outputs:[]):
        self.loss_value = None
        self.grads_values = None
        self.rows = rows
        self.cols = cols
        self.outputs = outputs

    # Calculate loss
    def loss(self, x):
        loss_value, grad_values = eval_loss_and_grads(x, self.rows, self.cols, self.outputs)
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    # Calculate gradients
    def grads(self, x):
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

# The gram matrix of an image tensor (feature-wise outer product)
def gram_matrix(x):
    
    # Turn a nD tensor into a 2D tensor with same 0th dimension
    if keras.backend.image_data_format() == 'channels_first':
        features = keras.backend.batch_flatten(x)
    else:
        features = keras.backend.batch_flatten(keras.backend.permute_dimensions(x, (2, 0, 1)))

    # Return gram matrix
    return keras.backend.dot(features, keras.backend.transpose(features))

# Preprocess an image
def preprocess_image(path:str, rows:int, cols:int):

    # Load the image
    x = keras.preprocessing.image.load_img(path, target_size=(rows, cols))

    # Convert to array
    x = keras.preprocessing.image.img_to_array(x)
    x = np.expand_dims(x, axis=0)

    # Proprocess with a VGG19 model
    x = keras.applications.vgg19.preprocess_input(x)

    # Return the image
    return x

# Deprocess an image
def deprocess_image(x, rows:int, cols:int):

    # Reshape image
    if keras.backend.image_data_format() == 'channels_first':
        x = x.reshape((3, rows, cols))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((rows, cols, 3))

    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    
    # Convert BGR to RGB
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')

    # Return the image
    return x

# Calculate style loss
def style_loss(style, combination, rows:int, cols:int):

    # Calculate input values
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = rows * cols

    # Return style loss
    return keras.backend.sum(keras.backend.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

# Calculate content loss
def content_loss(base, combination):
    return keras.backend.sum(keras.backend.square(combination - base))

# Calculate total variation loss
def total_variation_loss(x, rows:int, cols:int):

    # Element-wize squaring
    if keras.backend.image_data_format() == 'channels_first':
        a = keras.backend.square(x[:, :, :rows - 1, :cols - 1] - x[:, :, 1:, :cols - 1])
        b = keras.backend.square(x[:, :, :rows - 1, :cols - 1] - x[:, :, :rows - 1, 1:])
    else:
        a = keras.backend.square(x[:, :rows - 1, :cols - 1, :] - x[:, 1:, :cols - 1, :])
        b = keras.backend.square(x[:, :rows - 1, :cols - 1, :] - x[:, :rows - 1, 1:, :])

    # Return the total loss
    return keras.backend.sum(keras.backend.pow(a + b, 1.25))

# Evaluate loss and grads
def eval_loss_and_grads(x, rows:int, cols:int, outputs:[]):

    # Reshape image
    if keras.backend.image_data_format() == 'channels_first':
        x = x.reshape((1, 3, rows, cols))
    else:
        x = x.reshape((1, rows, cols, 3))

    # Get loss value
    outs = outputs([x])
    loss_value = outs[0]

    # Get gradient values
    if len(outs[1:]) == 1:
        grad_values = outs[1].flatten().astype('float64')
    else:
        grad_values = np.array(outs[1:]).flatten().astype('float64')

    # Return loss and gradient values
    return loss_value, grad_values

# The main entry point for this module
def main():

    # Variables
    base_image_path = 'C:\\DATA\\Python-data\\neural-style-transfer\\images\\giana256x256.jpg'
    style_image_path = 'C:\\DATA\\Python-data\\neural-style-transfer\\styles\\abstract-asymmetry-brown-cement.jpg'
    output_image_path = 'C:\\DATA\\Python-data\\neural-style-transfer\\images\\giana-cement-style.jpg'
    total_variation_weight = 1.0
    style_weight = 0.7
    content_weight = 0.3
    iterations = 100

    # Get base image size and set target size
    width, height = keras.preprocessing.image.load_img(base_image_path).size
    rows = 256
    cols = int(width * rows / height)

    # Preprocess images
    base_image = keras.backend.variable(preprocess_image(base_image_path, rows, cols))
    style_image = keras.backend.variable(preprocess_image(style_image_path, rows, cols))
    output_image = None

    # The output_image will contain our generated image
    if keras.backend.image_data_format() == 'channels_first':
        output_image = keras.backend.placeholder((1, 3, rows, cols))
    else:
        output_image = keras.backend.placeholder((1, rows, cols, 3))

    # Combine 3 images into a single Keras tensor
    input_tensor = keras.backend.concatenate([base_image, style_image, output_image], axis=0)

    # Build the VGG19 network with 3 images as input
    model = keras.applications.vgg19.VGG19(input_tensor=input_tensor, weights='imagenet', include_top=False)
    print('VGG19-model has been loaded!')

    # Get the symbolic outputs of each layer (we gave them unique names)
    outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

    # Combine loss functions into a single scalar
    loss = keras.backend.variable(0.0)
    layer_features = outputs_dict['block5_conv2']
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(base_image_features, combination_features)
    feature_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']

    # Loop layers and calculate loss
    for layer_name in feature_layers:
        layer_features = outputs_dict[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss(style_reference_features, combination_features, rows, cols)
        loss = loss + (style_weight / len(feature_layers)) * sl

    # Get total loss
    loss = loss + total_variation_weight * total_variation_loss(output_image, rows, cols)

    # Get the gradients of the generated image
    grads = keras.backend.gradients(loss, output_image)

    # Get outputs
    outputs = [loss]
    if isinstance(grads, (list, tuple)):
        outputs += grads
    else:
        outputs.append(grads)

    # Create an evaluator
    evaluator = Evaluator(rows, cols, keras.backend.function([output_image], outputs))

    # Get input image
    if(os.path.isfile(output_image_path) == True):
        x = preprocess_image(output_image_path, rows, cols)
    else:
        x = preprocess_image(base_image_path, rows, cols)

    # Loop for a predefined number of iterations
    for i in range(iterations):

        # Print start
        print('Start of iteration', i + 1)

        # Get starting time
        start_time = time.time()

        # Run scipy-based optimization (L-BFGS)
        x, min_val, info = scipy.optimize.fmin_l_bfgs_b(evaluator.loss, x.flatten(), fprime=evaluator.grads, maxfun=20)

        # Print loss value
        print('Current loss value: ', min_val)
        
        # Deprocess image
        img = deprocess_image(x.copy(), rows, cols)

        # Save generated image
        keras.preprocessing.image.save_img(output_image_path, img)

        # Print iteration done
        print('Iteration {0} completed in {1} seconds'.format(i + 1, round(time.time() - start_time, 2)))

# Tell python to run main method
if __name__ == '__main__': main()
VGG19-model has been loaded!
Start of iteration 1
Current loss value:  297102530.0
Iteration 1 completed in 31.57 seconds
Start of iteration 2
Current loss value:  282029000.0
Iteration 2 completed in 30.82 seconds
Start of iteration 3
Current loss value:  278050500.0
Iteration 3 completed in 30.69 seconds
Start of iteration 4
Current loss value:  276365820.0
Iteration 4 completed in 30.83 seconds
Start of iteration 5
Current loss value:  275439400.0
Iteration 5 completed in 31.58 seconds
Start of iteration 6
Current loss value:  274867260.0
Iteration 6 completed in 31.47 seconds
Start of iteration 7
Current loss value:  274493700.0
Iteration 7 completed in 31.94 seconds
Start of iteration 8
Current loss value:  274209700.0
Iteration 8 completed in 32.48 seconds
Start of iteration 9
Current loss value:  273964220.0
Iteration 9 completed in 32.9 seconds
Start of iteration 10
Current loss value:  273742050.0
Iteration 10 completed in 32.52 seconds
Etiketter:

Lämna ett svar

Din e-postadress kommer inte publiceras. Obligatoriska fält är märkta *