HOMEWORK 4: Implementing Recognizers for Digital Ink

This is an INDIVIDUAL assignment.

Objective

In this assignment we'll shift gears away from low-level Swing stuff, and explore how to build recognizers for digital ink and gestures. You'll learn how to implement a simple recognizer that's robust enough to work for a wide range of gestures, and how to integrate such a recognizer into your application.

This assignment is to do an implementation of the SiGeR recognizer discussed in class and the lecture notes, and integrate it into the Photo Album application. You'll use the recognizer to detect command gestures that will let you do things like tag photos using the pen/mouse, and move and delete on-screen annotations, and delete photos, without having to use the regular GUI controls.

The learning goals for this assignment are:

Experience implementing a basic recognizer.
Experience creating recognition templates and defining custom gesture shapes.
Experience integrating a recognizer into a Swing application.

Please see the note flagged IMPORTANT in the deliverables section at the end of this document.

Description

In this homework, we'll implement the SiGeR recognizer and integrate it into your Photo Album application as a way to perform commands (meaning: certain gestures will be recognized as commands to control the application, rather than as simple digital ink for annotation).

You should extend your application to allow gestures to be performed on either PhotoComponents in either Photo View mode or Split View mode, as described below. Thumbnails and other parts of your UI do not need to be responsive to gestures. These gestures will allow you to move between photos, tag photos, delete photos, or move or delete annotations on the backs of photos.

We'll use a modal method of telling the system which strokes should be interpreted as commands: mouse strokes made with the "context/menu" (right) button down should be interpreted as command gestures, while mouse strokes with the "select" (left) mouse button down should be interpreted normally, meaning as they are currently in your application. You can tell which mouse button is being pressed by looking the modifiers in the mouse event to figure out whether or not to pass a stroke off to the recognizer.

Command gestures should have a different visual appearance than ink, and then disappear once the command has been completed. For example, if the ink in your annotations is black, you might draw command gestures in red, and then have them disappear once the gesture has been completed.

Your applications should recognize and act on the following set of gestures; these gestures should work in the "flip state" (flipped meaning showing annotation and unflipped meaning showing the photo) of the PhotoComponents as I've indicated here. These are listed in order from easiest to most complex:

Move Forward and Backward (unflipped photo side). A "right angle" gesture (>) should move to the next photo, and a "left angle" gesture (<) should move to the previous photo.
Delete Photo (unflipped photo side). A "pigtail" gesture (like a lowercase phi: a gesture in which you make a loop then draw downward to cross through the bottom of the loop) should delete the currently selected photo.
Tag Photos (unflipped photo side). you should be able to use gestures to accomplish the same thing as the toggle buttons in your UI (tagging a photo as " as "Vacation" by drawing a star gesture on the photo, for example). Tags added via gestures should interact well with tags added via the normal controls; for example, drawing the "Family" gesture on a photo should cause the corresponding toggle button to be set; drawing it again should remove the tag. Tags should be addable or removable via either gestures or the toggles, and these should stay consistent with each other. Note that if you have four types of tags, you'll need four distinct gestures that you recognize. You'll define the gestures and associated templates for these; be sure you clearly tell us what they are in your submission so we can test them!
Select Annotations (flipped annotation side). By drawing a closed circular loop around strokes and text on the back of the photo, you select the enclosed content for either movement or deletion. After this, the selected content should be rendered differently to indicate that it is selected (it could change color, or have a box drawn around it, or glow if you want to get really fancy). Once selected, a left click and drag indicates that the content should be interactively moved to a new location. Alternatively, making the pigtail command gesture on top of the selected region should delete the content in it.

Implementing the Recognizer

See the slides for details on how to implement SiGeR. Here are a few additional tips.

Decide on the representations you want to use first. By this I mean, figure out how you'll specify templates in your code, and how you'll represent the direction vector of input strokes.

I'd suggest defining a bunch of constants for the 8 ordinal directions SiGeR uses (N, S, E, W, NE, NW, SE, SW). Both direction vectors and templates will be defined as lists of these. You may also want to define some special "combination" constants (a "generally east" or "right" constant that means either E, SE, or NE for example, or a "generally north" or "up" constant that means either NW, N, or NE). These latter combination constants will only be used in the definition of templates, not in the vector strings you will produce from input gesture strokes. In other words, they allow you to define your templates a bit more loosely than just the specific 8 directions.

While defining such a set of human-readable constants isn't specifically necessary (you could just do everything in terms of the strings and regexp patterns described below), it can be very helpful for debugging to be able to write a template as a set of directions, rather than a raw regexp pattern.

Next, write a routine that takes an input gesture and produces the direction vector from it. In other words, given a series of points, it'll spit out a vector containing a list of the 8 ordinal direction constants. This direction vector represents the true shape of the input gesture.

Here's the only tricky part: you'll need to write a routine that turns the direction vector into an actual string of characters that contain the same information as in the vector, and another routine that takes the template info and produces a regexp pattern from it. The idea is that you'll see if the regexp pattern for the template actually matches the stringified representation of the direction vector.

There's a lot of flexibility in how you define the symbols in these strings. For the direction vector string, you'll probably just have a series of combinations of 8 separate letters, each representing one of the 8 ordinal directions.

For the regexp pattern, you'll want to generate a pattern that can match occurrences of any of these 8 letters, as well as "either-or" combinations of them ("generally east" for example, might be a pattern that matches either the letters representing E, SE, or NE). You'll also need to generate a pattern that can deal with "noise" at the first and end of the input string. The slides have some examples that show how to do this.

The actual matching process will then just compare an input stroke string to the list of template patterns, and report which (if any) it matches.

Defining the Templates

Your templates will be defined in your code, most likely as a set of declarations that look something like this:

int QUESTION_MARK = { UP, RIGHT, DOWN, LEFT, DOWN }
int UP_CARET = { NORTHEAST, SOUTHEAST }

You'll need to define templates for all of the required gestures. Additionally, you get to define four custom gesture shapes of your own choosing to use for tagging. It may take a bit of tweaking to come up with a gesture set that's distinguishable, and may also require some tweaks to define the templates at the proper level of specificity.

Integrating the Recognizer into Your Application

Remember that we're using a mode to distinguish ink (left mouse button) input versus gesture input (right mouse button). Gesture input should be drawn on screen while the gesture is being made, so that it provides feedback to the user. The gesture should disappear once the mouse is released.

One way to get this effect is to augment your PhotoComponent slightly, to keep a reference to the single current gesture being drawn, which may be null if no gesture is in progress. The paint code then draws the display list for strokes and text, then the current gesture (if there is one), so that the gesture appears over the top of everything else.

Note that since the gesture is only a transient feature of the photo, it should disappear once the user finishes drawing it: when the gesture is complete, it can be removed from the set of items to be displayed, and handed off to the recognizer to be processed.

If the gesture is not recognized, you should indicate this by displaying a message in the status bar that says "unrecognized gesture" or something.

If the gesture is recognized, what you do next depends on exactly what command was recognized. The next, previous, and delete photo gestures should work the same was as the Next Photo, Previous Photo, and Delete Photo controls in your application.

For the tagging gestures, you should just update the tag data associated with that photo; make sure that any changes are reflected in the state of the status buttons also. Making a tag gesture on a photo that already has that tag should remove it from the photo.

The select annotation gesture is perhaps the weirdest, because it introduces a new mode into the UI. When the loop gesture is made, you should first figure out what content it contains. The way to do this is to take the bounding box of the loop, and then iterate through your annotation content items, checking to see whether their bounding boxes are complete contained within the loop. If they are, you can consider the annotation to be a part of the selected set (you might even extend the data objects in your display list with a selected attribute). Once you know what items are selected, you can then draw them differently (bounding box, change color, etc.)

Next, if the user performs a delete annotation gesture over the selected area, any selected content should be removed. Deleted items should simply be taken out of the display list so that they do not appear.

Alternatively, once some content has been selected, the user may be preparing to move it. You detect "move mode" by looking for a mouse press and drag within the selected content area. As the user drags, the selected content should move along with the mouse pointer; this is just a matter of updating the X,Y coordinates of the content in the selected set, and redrawing. If the press happens outside of a selected item, you can "de-select" the selected stuff (take it out of the selected list and just draw it normally). This ends "move mode." The basic behavior here should be much like any paint program--when something is selected you can click into it to drag it; but as soon as you click outside or finish the move, the object is de-selected.

NOTE: You don't have to worry about what happens if the selection loop gesture cuts through one or more strokes or text blocks--in other words, you don't have to be concerned with splitting a single stroke or block of text. Objects that aren't fully contained within the gesture can be considered to be outside the selection area.

Extra Credit

As usual, there are a lot of ways to make this fancier than described.

Implement a richer command set than described here (cut/copy/paste, for instance, or changing view modes). Bonus points vary depending on how complex the new commands are, but probably +1 or +2 per new command. NOTE: if you implement additional gestures, be sure to include a description of what those gestures are so that we can test them!
Gratuitous graphical richness, such as having deleted objects crumple themselves up or vanish in a puff of smoke, or having selected items surrounded by a pulsating glow. Variable, but probably +2 up to +4 points.
Make it so that gestures can be drawn such that they don't have to be fully contained within the bounds of a PhotoComponent. This would likely involve handling the gestures in a GlassPane over the top of the components (see the Swing GlassPane documentation for details).
Allow the selection gesture to "split" objects in a clean way, or allow more complex loop shapes than just plain circles. +3 points or more.

As usual, if you do something else that's above and beyond the call of duty, let us know in your README file and we may assign some extra credit for it.

Deliverable

IMPORTANT: You'll create your own gesture shapes for the tagging part of this assignment. What this means though is that YOU MUST provide a graphical cheat-sheet description to us of what that gesture set is. Remember that SiGeR gestures are directional also (a square bracket started at the top is a different gesture than a square bracket started at the bottom). You need to provide us with enough detail that we're not having to reverse engineer your code to figure out how to make your gestures. We will deduct points from people that make us spend lots of time trying to figure out your gestures!

See here for instructions on how to submit your homework. These instructions will be the same for each assignment.