PatternsInitializerMaxCover
Keras initializer that uses a corpus of text to initialize the patterns as randomly chosen grams from the corpus, weighted by how common said grams are respectively.
tokenization_layer.PatternsInitilizerMaxCover(
text_corpus, chars,
gram_lens=[5, 6, 7, 8, 9, 10, 11, 12, 13],
filter_over=1
)Parameters
Example
import re
import nltk
nltk.download("gutenberg")
from nltk.corpus import gutenberg
corpus = gutenberg.raw("austen-emma.txt")
# Remove arbritray strings of "\\n"s and " "s
corpus = re.sub(r"[\\n ]+", " ", corpus.lower())
chars = "".join(pd.Series(list(corpus)).value_counts(sort=True).keys()) + "<UNK>"
init = tokenization_layer.PatternsInitilizerMaxCover(corpus, chars)
# Initialize patterns of shape `(num_chars, max_len, 1, num_neurons)`
# Where there are `num_neurons` patterns (one for each neuron), each
# with random length/number of characters (but padded to be `max_len`)
# and each character being a one-hot encoding with `num_chars`
# categories.
patterns = init((len(init.chars), max(init.gram_lens), 1, 200))Last updated