FAQ#

How can I run Keras on GPU#

Kashgari will use GPU by default if available, but you need to setup the Tensorflow GPU environment first. You can check gpu status using the code below:

import tensorflow as tf
print(tf.test.is_gpu_available())

Here is the official document of TensorFlow-GPU

How to specify witch CPU or GPU for training and prediction#

TensorFlow allows to use specific device for train and predict.

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

# This will return the device list
"""[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 3933047686559574430
 physical_device_desc: "device: XLA_GPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 11330115994
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 16701328925727941592
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]"""

# Then specific which device to use, for example to use CPU for prediction

with tf.device('/device:CPU:0'):
    model.predict(x)

# Or use second GPU for training
with tf.device('/device:GPU:1'):
    model.fit(test_x, test_y, valid_x, valid_y, epochs=1)

How to save and resume training with ModelCheckpoint callback#

You can use tf.keras.callbacks.ModelCheckpoint for saving model during training.

from tensorflow.python.keras.callbacks import ModelCheckpoint

filepath = "saved-model-{epoch:02d}-{acc:.2f}.hdf5"
checkpoint_callback = ModelCheckpoint(filepath,
                                      monitor = 'acc',
                                      verbose = 1)

model = CNN_GRU_Model()
model.fit(train_x,
          train_y,
          valid_x,
          valid_y,
          callbacks=[checkpoint_callback])

ModelCheckpoint will save models struct and weights to target file, but we need token dict and label dict to fully restore the model, so we have to save model using model.save() function.

So, the full solution will be like this.

from tensorflow.python.keras.callbacks import ModelCheckpoint

filepath = "saved-model-{epoch:02d}-{acc:.2f}.hdf5"
checkpoint_callback = ModelCheckpoint(filepath,
                                      monitor = 'acc',
                                      verbose = 1)

model = CNN_GRU_Model()

# This function will build token dict, label dict and model struct.
model.build_model(train_x, train_y, valid_x, valid_y)
# Save full model info and initial weights to the full_model folder.
model.save('full_model')

# Start Training
model.fit(train_x,
          train_y,
          valid_x,
          valid_y,
          callbacks=[checkpoint_callback])


# Load Model
from kashgari.utils import load_model

# We only need model struct and dicts
new_model = load_model('full_model', load_weights=False)
# Load weights from ModelCheckpoint
new_model.tf_model.load_weights('saved-model-05-0.96.hdf5')

# Resume Training
# Only need to set {'initial_epoch': 5} when you wish to start new epoch from 6
# Otherwise epoch will start from 1
model.fit(train_x,
          train_y,
          valid_x,
          valid_y,
          callbacks=[checkpoint_callback],
          epochs=10,
          fit_kwargs={'initial_epoch': 5})