15-494/694 Cognitive Robotics Lab 8: Embeddings and Large Language Models
I. Software Update and Initial Setup
At the beginning of every lab you should update your copy of the
vex=aim-tools package. Do this:
$ cd ~/vex-aim-tools
$ git pull
II. Play Semantris
You can do parts II through VII in a team of two if you wish.
Play
the Semantris
game in "Blocks" mode. Note: it's more fun with sound enabled.
How does Semantris know which words are related? It's using
embeddings, and computing dot products to measure similarity.
Q1: Start a new game and take a screenshot of the initial state. You will
use this screenshot in the next step.
III. Experiment With Word Embeddings
- Run
the WordEmbeddingDemo.
- Try hovering over a word in the 3D plot to see the closest words.
- You can add new words to the 3D plot by typing them in the text
box below. Try that.
- Press the "Clear all" button to erase all the words from the display.
- Examine these slides to see
how we can use the demo to explore the kind of matching that
Semantris does.
- Pick six words from your Semantris screenshot. Type in one word
at a time to add it to the display. After adding the word, its dot is
red. Click on one of the six slots on the right side of the screen to
load the word into that slot. Continue until all six words have been
loaded.
- Pick one of the six words as your target word. Think of a one
word prompt you could use to reach that target. Add the prompt word
to the display by typing it in the text box.
- Click on the newly added prompt word to turn it from red to
black. Then click on it again to turn it back to red and display the
similarity measures to the six words in the slots. Q2: Did you hit your
target? Take a screenshot showing the similarity lines.
Part IV. What Do the Features Encode?
- You saw from the lecture slides that feature 126 correlates
with gender. What about some other semantic features?
- Load the following words into the six slots: man, woman, king, boy, girl, princess.
- Q3: At least three of the 300 features correlate with royalty.
Find one of them by sliding the mouse across the columns and
looking for a feature where man, woman, boy, and girl all have
positive values for the feature while king and princess have
negative values, or vice versa.
- Q4: Do the same thing to find a feature that correlates
with age: man, woman, and king should all have positive values and
boy, girl, and princess should all have negative values, or vice
versa.
Part V. Analogies by Vector Arithmetic
- Reload the WordEmbeddingDemo and open the "Vector analogy arithmetic" panel at the bottom of the display.
- Enter the classic "man is to king as woman is to ____" analogy and verify that the answer is "queen".
- Q5: Make up an analogy of your own and try it out. Take a screenshot of the result.
Part VI. Defining Semantic Dimensions
- Open the "Custom semantic dimensions" panel at the bottom of the display.
- Open the first semantic dimension, "gender", and observe the
word pairs used to define the dimension. By subtracting
corresponding words ("king" minus "queen", "father" minus
"mother", etc.) and averaging the result, we derive a vector that
points along the "gender" axis. We can then display words as
points in the 3D plot by calculating their position along the
selected semantic axes.
- The last two dimension slots are reserved for custom
user-defined dimensions. Define a new dimension called
"fruitiness", and give examples of fruits vs. vegetables. Try to
pair each fruit with a vegetable of similar shape and color, e.g.,
"grapes" and "peas". The match doesn't have to be exact. Give at
least five of these word pairs. Set this up as the Z-axis display
and click Submit. Now the two semantic axes in the 3D display
should be "gender" and "fruitiness".
- Click on the "Clear" button to clear the 3D display. Enter all
the words you used to define the "fruitiness" semantic dimension.
Verify that they are clearly separated along this dimension. Then
add some additional fruits and vegetables and see if they end up
where you exepect on the fruitiness axis. Q6: Take screenshots
illustrating your dimension definition (word pairs) and the 3D display
of fruit and vegetable points.
VII. Experiment With BERT-insight
- Load the BERT-insight demo.
If the demo hangs with a loading icon, reload the page.
- This demo uses the extractive question answering (QNA) version
of BERT, as discussed in the lecture slides. Click on the Submit
button and observe the answer to the question.
- Q7: Make up three questions relating to Fred and/or his dog that
the model can answer. Take screenshots of the results.
- The BERT QNA model is a 24 layer encoder-only model. The input
buffer uses some special tokens to separate the text passage from
the question. The input buffer format is:
"<CLS> question <SEP> text passage
<SEP>", possibly followed by some <PAD> tokens. You
can see this layout on the right side of the display. Q8: How is the
word "superamazing" encoded as tokens?
- Change the text passage to "John gave Mary a book." and the
question to "What did John give Mary?" Run this query.
- Scroll down to look at the attention head activation patterns
for the four attention heads in layer 0 of the encoder stack. Q9:
Which attention head seems particularly interested in the special
marker tokens? What is your evidence?
- Q10: Which attention head seems to be grouping words
together into a question group. a text passage group, and a
padding group? What is your evidence?
VIII. Homework: Flat Domino Detection
You can either do this homework alone, or as a team of 2 if you
wish. But no more than 2 people on a team.
In this assignment you're going to train a YOLO model to segment
dominoes in VEX AIM camera images using the Roboflow tool. Unlike the
previous assignment, this time the dominoes will be lying flat on the
table, so there will be strong foreshortening effects due to the low
camera angle, and you'll need to capture dominoes at various
orientations. We're going to use polygons instead of bounding boxes
to more accurately segment the dominoes.
- Go to Roboflow.com and
create a free account using your Andrew email address.
- Watch the coin detection tutorial to get an idea how Roboflow works.
- Go to "Settings" in Roboflow and click the button to apply for a free
academic/research upgrade to your account. Approval should be immediate.
- Select several dominoes with different pip counts (e.g., 4-3 and
5-2) to use for your initial dataset.
- Create a lab8 directory, cd to it, and run simple_cli. Bring
up the particle viewer so you can drive the robot around, and type
"s" in the camera viewer every time you want to take a snapshot.
Move the robot and/or the dominoes to get images at a variety of
distances, orientations, and lighting conditions. Including two
dominoes in the image isn't mandatory but will give you more
training data. Collect around 50 images to start. They will be
stored in your lab8/snapshots folder.
- Follow the Roboflow
steps to create a new project and upload your images.
- Follow the training
steps to train a YOLO26 segmentation model on your dataset.
- To test your segmenter,
use DominoSegment.fsm. Edit the
WEIGHTS_VERSION variable to point to the correct folder for your
weights file. Type "tm" to take an image, and then hit Enter to
cause the imshow windows to pop up. (This is a bug in
imshow.)
Hand In
Hand in the following:
- Your Semantris and WordEmbeddingDemo screenshots.
- The answers to whatever questions were asked in Parts II-VII.
- Your solution to the homework problem: some screenshots showing
your segmentation results on novel images, and a link to your
Roboflow dataset.
|