Hello
I am trying to vectorize text (wikipedia) using tfds load. I am trying to do something like [this](https://www.tensorflow.org/text/tutorials/text_classification_rnn)
This nlp example contains imdb reviews data and i was able to successfully follow it. But i am not able to do it for wikipedia dataset. Apparently there is some inherent difference between the types of datasets.
I have tried the following
This much runs without a problem. But when i fit the model
Then i get the error
What can i do?
thanks
I am trying to vectorize text (wikipedia) using tfds load. I am trying to do something like [this](https://www.tensorflow.org/text/tutorials/text_classification_rnn)
This nlp example contains imdb reviews data and i was able to successfully follow it. But i am not able to do it for wikipedia dataset. Apparently there is some inherent difference between the types of datasets.
I have tried the following
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
# Load Wikipedia dataset from tfds
dataset, info = tfds.load("wikipedia/20230601.ab", with_info=True, split=tfds.Split.TRAIN)
print(type(dataset))
for i in dataset:
print(i['text'].numpy().decode('utf-8'))
# Create a TextVectorization layer to convert text to vectors
vectorize_layer = TextVectorization(
max_tokens=100,
output_mode='int',
output_sequence_length=50
)
# Adapt the vectorization layer to the dataset
vectorize_layer.adapt(dataset.map(lambda x,y: x['text']))
model = tf.keras.Sequential([
vectorize_layer,
tf.keras.layers.Embedding(input_dim=len(vectorize_layer.get_vocabulary()), output_dim=64, mask_zero=True),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
This much runs without a problem. But when i fit the model
model.fit(dataset, epochs=5)
Then i get the error
>TypeError: Expected string passed to parameter 'input' of op 'StringLower', got {'text': <tf.Tensor 'IteratorGetNext:0' shape=() dtype=string>, 'title': <tf.Tensor 'IteratorGetNext:1' shape=() dtype=string>} of type 'dict' instead. Error: Expected string, got <tf.Tensor 'IteratorGetNext:0' shape=() dtype=string> of type 'Tensor' instead.
What can i do?
thanks