In this blog post, we will explore how to build a 3D commands classifier using the BERT model. This machine learning model is designed to handle requests within a 3D editor. Examples of such commands might include "add a cube", "change the color to red" or "move the sphere to the right". These commands fall into different categories - general and specific - based on the type of action requested and the specificity of the object involved.
By the end of this post, you'll understand how to set up, train, and evaluate a classifier using BERT that can understand these various commands and categorize them accordingly. This has numerous potential applications, such as providing improved user interaction within 3D design software, gaming environments, or even VR experiences.
To set up the project, we first installed necessary libraries using pip. These libraries included transformers, datasets, and accelerate.
!pip install -q transformers datasets accelerate
!pip install --upgrade accelerate sentence-transformers
We read this file using pandas, and we used LabelEncoder from scikit-learn to encode the general and specific labels. Then, we split our data into training and validation sets, with 90% of the data for training and 10% for validation.
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model_general = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=len(le_general.classes_))
model_specific = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=len(le_specific.classes_))
Subsequently, we tokenized the input data using the BERT tokenizer, taking care to pad and truncate the inputs to maintain a consistent sequence length. We wrapped the encoded inputs and corresponding labels into a CommandDataset object, preparing it for training.
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=30,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=100,
weight_decay=0.01,
)
Finally, we trained the models by calling the train method on each Trainer object.
# Train the models
trainer_general.train()
trainer_specific.train()
eval_result = trainer_general.evaluate(eval_dataset=val_dataset_general)
print(f"Validation loss: {eval_result['eval_loss']}")
eval_result = trainer_specific.evaluate(eval_dataset=val_dataset_specific)
print(f"Validation loss: {eval_result['eval_loss']}")
# Function for making predictions
def predict_command(input_sentence, k=10):
encoding = tokenizer(input_sentence, truncation=True, padding=True, return_tensors='pt').to(device)
# Get predictions
general_output = model_general(encoding['input_ids'], attention_mask=encoding['attention_mask'])[0]
specific_output = model_specific(encoding['input_ids'], attention_mask=encoding['attention_mask'])[0]
# Get top k values and indices
general_topk_values, general_topk_indices = general_output.topk(k)
specific_topk_values, specific_topk_indices = specific_output.topk(k)
# Move tensors back to cpu for numpy operations
general_topk_values = general_topk_values.detach().cpu().numpy().flatten()
general_topk_indices = general_topk_indices.detach().cpu().numpy().flatten()
specific_topk_values = specific_topk_values.detach().cpu().numpy().flatten()
specific_topk_indices = specific_topk_indices.detach().cpu().numpy().flatten()
# Use the inverse_transform method to get the original labels
general_commands = le_general.inverse_transform(general_topk_indices)
specific_commands = le_specific.inverse_transform(specific_topk_indices)
# Combine commands with their scores
general_predictions = [(command, float(score)) for command, score in zip(general_commands, general_topk_values)]
specific_predictions = [(command, float(score)) for command, score in zip(specific_commands, specific_topk_values)]
return general_predictions, specific_predictions
# Predict commands
general_predictions, specific_predictions = predict_command('move the cube to the left')
print(general_predictions)
print(specific_predictions)
# After training the model, save it to the specified folder
model_general.save_pretrained("./models/model_general")
model_specific.save_pretrained("./models/model_specific")
# Save the tokenizer as well
tokenizer.save_pretrained("./models/tokenizer")
We also saved the LabelEncoders to disk using joblib, which enables us to transform between the original labels and their encoded values.
import joblib
# Save the LabelEncoders
joblib.dump(le_general, './labels/le_general.pkl')
joblib.dump(le_specific, './labels/le_specific.pkl')
Future work may explore fine-tuning the model, using more complex command structures, or integrating this model into a 3D editing application for a hands-on user experience.