Hey 👋

Tell us a little about yourself....

Building a 3D Commands Classifier with BERT

Scroll to top
Rajesh Sharma
Building a 3D Commands Classifier with BERT

Introduction

In this blog post, we will explore how to build a 3D commands classifier using the BERT model. This machine learning model is designed to handle requests within a 3D editor. Examples of such commands might include "add a cube", "change the color to red" or "move the sphere to the right". These commands fall into different categories - general and specific - based on the type of action requested and the specificity of the object involved.

By the end of this post, you'll understand how to set up, train, and evaluate a classifier using BERT that can understand these various commands and categorize them accordingly. This has numerous potential applications, such as providing improved user interaction within 3D design software, gaming environments, or even VR experiences.

Setup

To set up the project, we first installed necessary libraries using pip. These libraries included transformers, datasets, and accelerate.

!pip install -q transformers datasets accelerate
!pip install --upgrade accelerate sentence-transformers
  1. Data Collection and Preparation Next, we loaded a CSV file 3d-commands-gpt.csv that contains the dataset generated using ChatGPT. The data comprises three columns, command, general, and specific, which signify the command input by the user, its general class, and its specific class, respectively.

We read this file using pandas, and we used LabelEncoder from scikit-learn to encode the general and specific labels. Then, we split our data into training and validation sets, with 90% of the data for training and 10% for validation.

  1. Model Selection and Preprocessing We selected BERT, a state-of-the-art transformer-based model, to classify the commands. We used the BertTokenizer and BertForSequenceClassification from the transformers library. We initialized two instances of the model to classify general and specific commands.
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model_general = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=len(le_general.classes_))
model_specific = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=len(le_specific.classes_))

Subsequently, we tokenized the input data using the BERT tokenizer, taking care to pad and truncate the inputs to maintain a consistent sequence length. We wrapped the encoded inputs and corresponding labels into a CommandDataset object, preparing it for training.

  1. Model Training Next, we created Trainer objects from the transformers library for each of the models. We specified training arguments, such as the number of training epochs, batch size, warmup steps, and weight decay.
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=30,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=100,
    weight_decay=0.01,
)

Finally, we trained the models by calling the train method on each Trainer object.

# Train the models
trainer_general.train()
trainer_specific.train()
  1. Model Evaluation We evaluated the trained models using the validation set. The evaluation result is a dictionary that contains metrics like loss and accuracy.
eval_result = trainer_general.evaluate(eval_dataset=val_dataset_general)  
print(f"Validation loss: {eval_result['eval_loss']}")
eval_result = trainer_specific.evaluate(eval_dataset=val_dataset_specific) 
print(f"Validation loss: {eval_result['eval_loss']}")
  1. Making Predictions We used the trained models to predict commands based on input sentences. The predict_command function tokenizes the input sentence, gets the predictions, and returns the top k predictions and their corresponding scores.
# Function for making predictions
def predict_command(input_sentence, k=10):
    encoding = tokenizer(input_sentence, truncation=True, padding=True, return_tensors='pt').to(device)

    # Get predictions
    general_output = model_general(encoding['input_ids'], attention_mask=encoding['attention_mask'])[0]
    specific_output = model_specific(encoding['input_ids'], attention_mask=encoding['attention_mask'])[0]

    # Get top k values and indices
    general_topk_values, general_topk_indices = general_output.topk(k)
    specific_topk_values, specific_topk_indices = specific_output.topk(k)

    # Move tensors back to cpu for numpy operations
    general_topk_values = general_topk_values.detach().cpu().numpy().flatten()
    general_topk_indices = general_topk_indices.detach().cpu().numpy().flatten()
    specific_topk_values = specific_topk_values.detach().cpu().numpy().flatten()
    specific_topk_indices = specific_topk_indices.detach().cpu().numpy().flatten()

    # Use the inverse_transform method to get the original labels
    general_commands = le_general.inverse_transform(general_topk_indices)
    specific_commands = le_specific.inverse_transform(specific_topk_indices)

    # Combine commands with their scores
    general_predictions = [(command, float(score)) for command, score in zip(general_commands, general_topk_values)]
    specific_predictions = [(command, float(score)) for command, score in zip(specific_commands, specific_topk_values)]

    return general_predictions, specific_predictions
# Predict commands
general_predictions, specific_predictions = predict_command('move the cube to the left')
print(general_predictions)
print(specific_predictions)
  1. Saving the Model After training, we saved both the models and the tokenizer for future use. This allows us to reload the model and tokenizer without retraining.
# After training the model, save it to the specified folder
model_general.save_pretrained("./models/model_general")
model_specific.save_pretrained("./models/model_specific")

# Save the tokenizer as well
tokenizer.save_pretrained("./models/tokenizer")

We also saved the LabelEncoders to disk using joblib, which enables us to transform between the original labels and their encoded values.

import joblib

# Save the LabelEncoders
joblib.dump(le_general, './labels/le_general.pkl')
joblib.dump(le_specific, './labels/le_specific.pkl')
  1. Conclusion In this blog post, we trained a BERT model to classify commands in the context of a 3D editor. This project demonstrates how a powerful NLP model like BERT can be used for classifying a specific domain's commands, opening up potential applications in software controls, voice assistants, and more.

Future work may explore fine-tuning the model, using more complex command structures, or integrating this model into a 3D editing application for a hands-on user experience.