15-494/694 Cognitive Robotics Lab 8:
Embeddings and Large Language Models

I. Software Update and Initial Setup

At the beginning of every lab you should update your copy of the vex=aim-tools package. Do this:

$ cd ~/vex-aim-tools
$ git pull

II. Play Semantris

You can do parts II through VII in a team of two if you wish.

Play the Semantris game in "Blocks" mode. Note: it's more fun with sound enabled.

How does Semantris know which words are related? It's using embeddings, and computing dot products to measure similarity.

Q1: Start a new game and take a screenshot of the initial state. You will use this screenshot in the next step.

III. Experiment With Word Embeddings

Run the WordEmbeddingDemo.
Try hovering over a word in the 3D plot to see the closest words.
You can add new words to the 3D plot by typing them in the text box below. Try that.
Press the "Clear all" button to erase all the words from the display.
Examine these slides to see how we can use the demo to explore the kind of matching that Semantris does.
Pick six words from your Semantris screenshot. Type in one word at a time to add it to the display. After adding the word, its dot is red. Click on one of the six slots on the right side of the screen to load the word into that slot. Continue until all six words have been loaded.
Pick one of the six words as your target word. Think of a one word prompt you could use to reach that target. Add the prompt word to the display by typing it in the text box.
Click on the newly added prompt word to turn it from red to black. Then click on it again to turn it back to red and display the similarity measures to the six words in the slots. Q2: Did you hit your target? Take a screenshot showing the similarity lines.

Part IV. What Do the Features Encode?

You saw from the lecture slides that feature 126 correlates with gender. What about some other semantic features?
Load the following words into the six slots: man, woman, king, boy, girl, princess.
Q3: At least three of the 300 features correlate with royalty. Find one of them by sliding the mouse across the columns and looking for a feature where man, woman, boy, and girl all have positive values for the feature while king and princess have negative values, or vice versa.
Q4: Do the same thing to find a feature that correlates with age: man, woman, and king should all have positive values and boy, girl, and princess should all have negative values, or vice versa.

Part V. Analogies by Vector Arithmetic

Reload the WordEmbeddingDemo and open the "Vector analogy arithmetic" panel at the bottom of the display.
Enter the classic "man is to king as woman is to ____" analogy and verify that the answer is "queen".
Q5: Make up an analogy of your own and try it out. Take a screenshot of the result.

Part VI. Defining Semantic Dimensions

Open the "Custom semantic dimensions" panel at the bottom of the display.
Open the first semantic dimension, "gender", and observe the word pairs used to define the dimension. By subtracting corresponding words ("king" minus "queen", "father" minus "mother", etc.) and averaging the result, we derive a vector that points along the "gender" axis. We can then display words as points in the 3D plot by calculating their position along the selected semantic axes.
The last two dimension slots are reserved for custom user-defined dimensions. Define a new dimension called "fruitiness", and give examples of fruits vs. vegetables. Try to pair each fruit with a vegetable of similar shape and color, e.g., "grapes" and "peas". The match doesn't have to be exact. Give at least five of these word pairs. Set this up as the Z-axis display and click Submit. Now the two semantic axes in the 3D display should be "gender" and "fruitiness".
Click on the "Clear" button to clear the 3D display. Enter all the words you used to define the "fruitiness" semantic dimension. Verify that they are clearly separated along this dimension. Then add some additional fruits and vegetables and see if they end up where you exepect on the fruitiness axis. Q6: Take screenshots illustrating your dimension definition (word pairs) and the 3D display of fruit and vegetable points.

VII. Experiment With BERT-insight

Load the BERT-insight demo. If the demo hangs with a loading icon, reload the page.
This demo uses the extractive question answering (QNA) version of BERT, as discussed in the lecture slides. Click on the Submit button and observe the answer to the question.
Q7: Make up three questions relating to Fred and/or his dog that the model can answer. Take screenshots of the results.
The BERT QNA model is a 24 layer encoder-only model. The input buffer uses some special tokens to separate the text passage from the question. The input buffer format is: "<CLS> question <SEP> text passage <SEP>", possibly followed by some <PAD> tokens. You can see this layout on the right side of the display. Q8: How is the word "superamazing" encoded as tokens?
Change the text passage to "John gave Mary a book." and the question to "What did John give Mary?" Run this query.
Scroll down to look at the attention head activation patterns for the four attention heads in layer 0 of the encoder stack. Q9: Which attention head seems particularly interested in the special marker tokens? What is your evidence?
Q10: Which attention head seems to be grouping words together into a question group. a text passage group, and a padding group? What is your evidence?

VIII. Homework: Flat Domino Detection

You can either do this homework alone, or as a team of 2 if you wish. But no more than 2 people on a team.

In this assignment you're going to train a YOLO model to segment dominoes in VEX AIM camera images using the Roboflow tool. Unlike the previous assignment, this time the dominoes will be lying flat on the table, so there will be strong foreshortening effects due to the low camera angle, and you'll need to capture dominoes at various orientations. We're going to use polygons instead of bounding boxes to more accurately segment the dominoes.

Go to Roboflow.com and create a free account using your Andrew email address.
Watch the coin detection tutorial to get an idea how Roboflow works.
Go to "Settings" in Roboflow and click the button to apply for a free academic/research upgrade to your account. Approval should be immediate.
Select several dominoes with different pip counts (e.g., 4-3 and 5-2) to use for your initial dataset.
Create a lab8 directory, cd to it, and run simple_cli. Bring up the particle viewer so you can drive the robot around, and type "s" in the camera viewer every time you want to take a snapshot. Move the robot and/or the dominoes to get images at a variety of distances, orientations, and lighting conditions. Including two dominoes in the image isn't mandatory but will give you more training data. Collect around 50 images to start. They will be stored in your lab8/snapshots folder.
Follow the Roboflow steps to create a new project and upload your images.
Follow the training steps to train a YOLO26 segmentation model on your dataset.
To test your segmenter, use DominoSegment.fsm. Edit the WEIGHTS_VERSION variable to point to the correct folder for your weights file. Type "tm" to take an image, and then hit Enter to cause the imshow windows to pop up. (This is a bug in imshow.)

Hand In

Hand in the following:

Your Semantris and WordEmbeddingDemo screenshots.
The answers to whatever questions were asked in Parts II-VII.
Your solution to the homework problem: some screenshots showing your segmentation results on novel images, and a link to your Roboflow dataset.

Dave Touretzky

15-494/694 Cognitive Robotics Lab 8:Embeddings and Large Language Models