Human Interface Guidelines v1.7.0
Human Interface Guidelines v1.7.0
Interface
Guidelines
v1.7
©2013 Microsoft Corporation. All rights reserved.
21 Gesture
Learn interaction design considerations for various types of gesture
46 Voice
Learn interaction design considerations for voice, including language
and audio considerations
60 Feedback
Learn about interaction design considerations for gestural and audio
feedback, text prompts, and more
77 Basic Interactions
Learn about the areas of the Kinect screen and user space, plus
details on engaging, targeting, selecting, panning and scrolling,
zooming, and entering text
135 Conclusion
Introduction
Welcome to the world of Microsoft Kinect for
Windows–enabled applications. This document
is your roadmap to building exciting human-
computer interaction solutions you once
thought were impossible.
SDK, and Toolkit work camera, and audio data from the microphone array to the SDK.
The following sections cover what this team of products does to bring natural experiences to your
application. When we say “Kinect for Windows,” we mean the “sensor” and SDK working together,
unless otherwise specified.
Kinect for Windows is versatile, and can see • Sweet spot: 0.8m to 2.5m
people holistically, not just smaller hand
gestures. Six people can be tracked, including
two whole skeletons. The sensor has an RGB
2.5m/8.2ft
(red-green-blue) camera for color video, and
3m/9.8ft
an infrared emitter and camera that measure
depth. The measurements for depth are
returned in millimeters. Default mode depth
ranges
The Kinect for Windows sensor enables a wide
sweet spot
variety of interactions, but any sensor has • Physical limits: 0.8m to 4m 0.8m/2.6ft
physical limits
“sweet spots” and limitations. With this in mind, (default) 1.2m/4ft
have a large range of movement and need to • Sweet spot: 1.2m to 3.5m
be tracked with their arms or legs extended.
Angle of vision (depth
and RGB)
27°
Note that Near mode is an actual • Horizontal: 57.5 degrees 43.5°
setting for Kinect for Windows, and is
different from the various ranges we • Vertical: 43.5 degrees, with
27°
detail in Interaction Ranges, later in -27 to +27 degree tilt
this document. range up and down 57.5°
Skeleton tracking
Seated mode
Sound threshold
Directional microphone
Consider Sensor
Placement and
Environment
The situation in which your Kinect for
Windows–enabled application is used can
affect how users perceive its reliability and
usability. Don’t forget to test your application
early (and often) with the kind of environment
and setup you expect it to ultimately be used
in. Consider the following factors when you
design your application.
Interaction Design
Tenets for
Kinect for Windows
Kinect for Windows opens up a new world of
gesture design and voice design.
Overall Design
Principles
Before we go into to our design guidelines,
we recommend you first read this brief
section for some very important interaction
tenets. They reflect the best ways we’ve
found to employ gesture and voice, and to
avoid some thorny issues.
| Human
Kinect for Windows v1.7 | Human
Interface
Interface
Guidelines
Guidelines
v1.7 15
Interaction Design Tenets for Kinect for Windows > Overall Design Principles
Confident users are happy users. The strongest designs come after
• It’s important to keep interactions simple, user testing.
and easy to learn and master. • Kinect for Windows enables a lot of new
• Avoid misinterpreting user intent. interactions, but also brings new challenges.
• Give constant feedback so people • It’s especially hard to guess ahead of time
always know what’s happening and what will work and what won’t.
what to expect. • Sometimes minor adjustments can make a
huge difference.
• Conduct user tests often and early, and allow
time in your schedule for adjustments to
your design.
Strong Inputs
In order to provide a good experience and
not frustrate users, a strong voice and gesture
interaction design should fulfill a number of
requirements.
Intuitive, with easy “mental mapping.” Easy to back out of if mistakenly started,
rather than users having to complete the
action before undoing or canceling.
Efficient at a variety of distance ranges. Appropriate for the amount and type of
content displayed.
For example, for entering text, let people use their physical keyboard
or a touch interface instead of gesturing.
Do Don’t
Use each input mode for what it’s Require an input method that feels
naturally best at. forced, unnatural, awkward, or tedious.
Take user orientation and location Switch input modes for the sake of
into account so that input modes are variety or effect.
switched only when it’s convenient and
benefits productivity.
Gesture
This section begins by providing some
important gesture definitions and goes on to
recommend a number of important design
considerations for gesture interaction.
Basics
In this document we use the term gesture
broadly to mean any form of movement that
can be used as an input or interaction to
control or influence an application. Gestures
can take many forms, from simply using your
hand to target something on the screen, to
specific, learned patterns of movement, to
long stretches of continuous movement using
the whole body.
Examples:
• Pointing to aim
• Grabbing to pick up
• Pushing to select
Learned gestures
Examples:
• Waving to engage
• Making a specific pose to
cancel an action
Dynamic gesture
Slide to confirm Confirmed!
A defined movement that
allows the user to directly
manipulate an object
or control and receive
continuous feedback.
Pressing to select and gripping to move are examples of
dynamic gestures.
Continuous gesture
Prolonged tracking of
movement where no specific
pose is recognized but the
movement is used to interact
with the application.
Gesture
Interaction
Design
With Kinect for Windows, you can explore
the new and innovative field of gesture
interaction design. Here are some of our
key findings and considerations in making
gesture designs feel “magical.”
Accomplish gesture Users should agree with these statements as they use gesture
goals in your application:
• I quickly learned all the basic gestures.
The users’ goal is to accomplish their tasks
• Now that I learned a gesture, I can quickly and accurately perform it.
efficiently, easily, and naturally. Your goal is to
enable them to fulfill theirs. • When I gesture, I’m ergonomically comfortable.
• When I gesture, the application is responsive and provides both immediate
and ongoing feedback.
Design for reliability • If the gesture is too circumscribed, unique, or complex, there will be fewer “false
positives,” but it might be hard to perform.
Reliability should be a top priority.
Without reliability, your application will • If the gesture is too unspecific or simple, it will be easier to perform, but might
feel unresponsive and difficult to use, and have lots of false positives and/or conflicts with other gestures.
frustrate your users. Try to strike a good
reliability balance.
Do Don’t
For more information about false
positives, see Engagement, later in this
Teach users how to effectively perform Drive users to other modes of input/
document.
a gesture. interaction because of poor reliability.
Instill confidence so users can show Require such rigid gestures that users
others how to perform a gesture. develop superstitious behaviors, like
making up incorrect justifications for
reactions they don’t understand.
Teach users how to perform a gesture Use different gestures for similar actions
early in the experience so they can unless there is too much overlap
use it in similar contexts. between the gestures and natural
variation in users’ movement.
Consider the frequency and cost of Design multiple gestures that are
false activations. too similar.
UI mindset
Challenge is frustrating. If
a user is in UI mindset and
can’t perform a gesture, he
or she will be frustrated and
have low tolerance for any
learning curve.
interactions Allow people to interact from a distance. Try to force-fit gesture or voice on
existing UI that was designed for a
Gesture might provide a cool new method of different input method.
interacting with your application, but keep in
mind that its use should be purposeful.
Notes
Waving is one method for determining user intent. (For example, Xbox users must
wave back and forth at least three times, rotating at the elbow, for the movement to be
recognized.) Waving feels natural, yet it’s a unique movement not often misinterpreted.
It also carries meaning to people as a way to begin an interaction.
Other possible ways to recognize engagement involve how close the user is to the sensor,
or a particular pose, such as what direction he or she is facing.
You can also choose to recognize when someone disengages as a cue to explicitly turn
interactability off, rather than using a timeout or other passive method. For example, when
a user turns away from the screen, take that as a cue that they’re no longer engaged. (But
be careful not to disengage people accidentally in the middle of a task.)
Use logical movements that are easy Require abstract body movements that
to learn and remember. have no relationship to the task and are
hard to learn and remember.
Make the size or scope of the motion Require a big movement for a small
match the significance of the feedback result, like a whole arm swipe to move
or action. one item in a list.
Use big, easily distinguishable Use big movements for actions that
movements for important and less must be repeated many times through
frequent actions. an experience.
Design for complete Here are a few things to remember when defining a gesture set:
gesture sets • Make sure each gesture in an application’s gesture set feels related and cohesive.
• Keep your users’ cognitive load low; don’t overload them with gestures to remember.
The more gestures your application requires,
Research shows that people can remember a maximum of six gestures.
the harder it is to design a strong gesture
set. So, we recommend that you keep the • Take cues from existing gestures that have been established in other Kinect applications.
number of gestures small, both so that they’re • Test thoroughly for “false positive” triggers between gestures.
easy to learn and remember, and that they’re • Use obvious differentiators, such as direction and hand state, to make gestures
distinct enough from one another to avoid significantly different, reduce false positives, and avoid overlap.
gesture collisions. In addition, if you strive to
• Make sure similar tasks use the same gesture language, so that users can guess or
be consistent with other applications, people discover gestures through logical inference or pairing. For example, pair a swipe right
will feel at home with the gestures and you’ll (to move content right) to a swipe left (to move content left).
reduce the number of gestures they have to
learn. You’ll also reduce the training you have
to provide. IF: THEN:
Do Don’t
Differentiate progression or path. Have two gestures that follow the same
path, especially if they’re in the same
direction.
Have clear and different start and Have vague and similar start and end
end points. points that result in different actions.
Do Don’t
Notes
Think about the whole scenario. What does the user do after completing a gesture?
Might that action look like the start of an unintended gesture? Will it put them in a
natural position to begin the next logical gesture for your common scenarios?
Consider the movement of the whole Design the opposite gesture to resemble
repetition. Ignore the “return” portion the “return” portion of the first gesture.
of a repeated gesture if it will disrupt
the ability of the user to repeat the
gesture smoothly.
Use two-handed gestures for noncritical Require the user to switch between
tasks (for example, zooming) or for one- and two-handed gestures
advanced users. Two-handed gestures indiscriminately.
should be symmetrical because they’re
then easier to perform and remember.
Remember that
fatigue undermines
gesture
Your user shouldn’t get tired because of
gesturing. Fatigue increases messiness, which
leads to poor performance and frustration,
and ultimately a bad user experience.
Do Don’t
Tracking speed
Field of view
Tracking reliability
Iterate
Finally, getting a gesture to feel just right
might take many tweaks and iterations.
Create parameters for anything you can,
and (we can’t say it enough) conduct
frequent usability tests.
Do Don’t
Design a gesture that works reliably Design a gesture that works for you
for your whole range of users. but no one else.
Voice
Besides gesture, voice is another input
method that enables new and natural-feeling
experiences.
Basics
Using voice in your Kinect for Windows–
enabled application allows you to choose
specific words or phrases to listen for and
use as triggers. Words or phrases spoken as
commands aren’t conversational and might not
seem like a natural way to interact, but when
voice input is designed and integrated well, it
can make experiences feel fast and increase
your confidence in the user’s intent.
About confidence • Try to strike a balance between reducing false positive recognitions
levels and making it difficult for users to say the command clearly enough to
be recognized.
When you use Kinect for Windows voice- • Match the confidence level to the severity of the command. For
recognition APIs to listen for specific words, example, “Purchase now” should probably require higher confidence
confidence values are returned for each than “Previous” or “Next.”
word while your application is listening. You • It is really important to try this out in the environment where your
can tune the confidence level at which you application will be running, to make sure it works as expected.
will accept that the sound matches one of Seemingly small changes in ambient noise can make a big difference
in reliability.
your defined commands.
Always on,
active listening
Word length
Do Don’t
Be wary of one-syllable
keywords, because they’re PLAY THIS ONE PLAY
STOP VIDEO STOP
more likely to overlap with
SHOW MORE SONGS MORE
others.
SCROLL RIGHT SCROLL
GO BACK BACK
Simple vocabulary
Do Don’t
Use common words where
possible for a more natural TURN UP MAX OUT
RED CRIMSON
feeling experience and for
FIRST INITIAL
easier memorization.
Go home, Go back,
Next page, Previous page
Word alternatives
User prompts
You can require higher confidence levels (80% to 95%) – that is, having
Kinect for Windows respond only when it’s certain that the user has given
the correct trigger. This might make it harder to users to interact, but
reduce unintentional actions.
Acoustics
Test Test
Test your words and
phrases in an acoustic
environment similar to
VS
where you intend your
application to be used.
Triggering audio
Voice Interaction
Design
Generally, it’s best to avoid forcing people to
memorize phrases or discover them on their
own. Here are a few best practices for helping
users understand that they can use voice, and
learn what words or phrases are available.
Listening mode
User assistance
Visual notifications
Audio prompting
for voice noise (up to around 20dB). This means that if there’s other conversation in the room (usually
around 60-65dB), the accuracy of your speech recognition is reduced.
There are a few environmental
considerations that will have a significant
effect on whether or not you can
successfully use voice in your application.
60-65dB =
Amplify that to the sound level of a mall or cafeteria and you can imagine how much harder it
is to recognize even simple commands in such an environment. At some level, ambient noise is
unavoidable, but if your application will run in a loud environment, voice might not be the best
interaction choice. Ideally, you should only use voice if:
• The environment is quiet and relatively • There won’t be multiple people speaking
closed off. at once.
Ambient noise also plays a role in making it harder for the sensor to hear someone as they
get farther away. You might have to make adjustments to find a “sweet spot” for your given
environment and setup, where a voice of normal volume can be picked up reliably.
In an environment with low ambient noise and soft PC sounds, a user should be able to
comfortably speak at normal to low voice levels (49-55dB) at both near and far distances.
Social considerations
Keep in mind the social implications of your users needing to say commands loudly while using
your application. Make sure that the commands are appropriate for the environment, so you
don’t force your users to say things that will make them uncomfortable. Also make sure that
the volume at which they have to speak is appropriate for the environment. For example,
speaking loud commands in a cubicle-based office setup might be distracting and inappropriate.
Feedback
Whether you employ gesture, voice, or
both, providing good feedback is critical
to making users feel in control and helping
them understand what’s happening in the
application. This section covers some ways you
can make feedback as strong as possible.
Basics
It’s important, especially if your users are
standing at a distance and have little direct
contact with the interface, to take extra care
in showing them how their actions map to
your application. Also, because a gesture is
only effective and reliable when users are
in the correct visibility range, it’s important
to give feedback to them in case they don’t
know when they’re out of range.
• What does the sensor see? • How much of me can the sensor see?
• Where’s the field of view? • Is my head in view?
• How many people can the sensor see? • When and where can I gesture?
• How do I know it’s seeing me and not
someone else?
Notes
Many of these questions can be answered by displaying a small User Viewer (visualizing
depth) on the screen to show what Kinect for Windows sees at any given time.
Highlighting players, hands, or other joints in the depth viewer might also be helpful.
You can also prompt people to move into the appropriate field of view whenever they’re
cropped, too close, or too far back.
Feedback
Interaction
Design
This section gives tips for designing various
kinds of feedback, including selection states,
progress indicators, and other visuals, as well
as audio feedback.
Best practices Make it clear what content the user can take action on, and how
There are several best practices that apply
whether you’re designing for gesture or voice. Differentiated controls
BUTTON LINK NORMAL TEXT
Use iconography, colors, or
tutorials to show users how Lorem
to differentiate between ipsum dolor
For more information, see Targeting controls they can activate, Lorem Lorem ipsum dolor sit amet,
and Selecting, later in this document. consectetur
text prompts for voice input, adipiscing
and other text and content.
Input suggestions
Use iconography or
tutorials to show users
what input methods are
available to them.
Gesture suggestions
Visual feedback
Command suggestions
Audio notifications
Progress feedback
For example, for targeting and selecting, with a gesture that requires
Z-axis movement, using size changes to show depth helps users
understand the action required (see Kinect Button, later in this
document).
UI controls
Selection confirmation
Ergonomics
Clear visuals
User orientation
Layout continuity
For example, horizontal animation can show that the user has moved
horizontally in space in your application’s layout.
Use skeleton tracking Make sure users know whether the sensor sees them
feedback
User resets
Full-body skeleton tracking provides a wide
Please move left
range of new application possibilities. You can If you need to track a user
use feedback both to lead and confirm the but the sensor cannot see
them, let the user know
user’s movements.
where to stand so that it can.
Lead by example
Real-time movements
Realistic reactions
Smooth movements
Teaching cues
Instructional audio
Activity signals
Consider using sounds that match the action the user is taking, to
enforce his or her feeling of control and familiarity – for example,
a click noise when a button is pressed.
• It grows larger.
The result seems like a very “hands-on” experience, where the user
can almost feel the effects of his or her movements.
Basic Interactions
Although we leave it up to you to create
exciting and unique experiences with Kinect for
Windows, we’ve taken care of some of the basic
interactions and controls for you and included
them in the Developer Toolkit. Using our
interactions and controls saves you time and
also establishes some consistency that your users
will learn to expect as they encounter Kinect for
Windows experiences in various aspects of their
daily lives. We’ve spent a lot of time tuning and
testing this first set of interactions to give you
the best foundation possible, and we’ll add more
in our future releases.
Screen resolution
The Kinect for Windows controls we’ve built were designed for 1920x1080 resolution screens.
Whether you’re using these controls or creating your own, keep in mind that different screen
resolutions affect the intended size of the control. Also, because the Physical Interaction Zone
(or PHIZ, described in the following section) has a fixed size relative to the person, and doesn’t
adjust for the screen size or orientation, it might be helpful as you resize your controls to focus
on fine-tuning the ratio of control size to PHIZ size, to ensure that the experience is still reliable
and the control is still easily selectable.
The smallest button we’ve designed is 220 by 220px in 1920x1080 resolution. We’ve tested to
ensure that this button size and resolution combination makes it possible to have close to 100
percent accuracy for most users. If you have a different resolution, however, you need to resize
the button to make sure the ratio is maintained, to keep the reliability. The following chart
shows how this small button size translates for different resolutions.
SD 720 480 98
Setting the
Stage: the Kinect
Region, the
PHIZ, and the
Cursor
The Kinect Region, the Physical Interaction
Zone (PHIZ), and the cursor are all things
that you need to get started with your Kinect
for Windows–enabled interface. You can
easily set them up so they’ll work the best
for your scenario.
Notes
To try out how it would feel with another setup, use a mouse to click the Restore button
at the upper-right of the Interaction Gallery sample window, change the size of the
window, and drag it to different parts of the screen. Also, notice that all controls in the
sample are mouse-enabled, but are built to be used primarily with Kinect for Windows
interactions, with mouse as more of an administrative or testing input. If part of your
application is mouse- and keyboard-enabled and part is Kinect for Windows–enabled,
consider putting the non–Kinect-enabled controls inside the Kinect Region and then
hiding them, along with the mouse cursor, when the user interacts with Kinect for
Windows. For more information, see Multiple Inputs, later in this document.
Axis
We measure X and Y
dimensions as if the curved
surface were a rectangle. We
measure Z by arm extension,
because we’ve found that
this is not always the same
as Z space between the user
and the sensor.
Ergonomics
Default Targeting Progress Fully pressed Gripped hand Right hand vs. left
targeting over indication state detected hand cursors
state something (color fills
that is hand as the
actionable user presses
(grip or press) further)
The cursor moves freely within the Kinect Region only when the user’s hand is in the PHIZ; if
the user’s hand is anywhere above and to the sides of the PHIZ, the cursor sticks to the top, left,
or right of the window, and becomes semitransparent. This provides feedback for how to get
back into the area where the user can interact. When the user’s hand falls below the PHIZ, the
cursor drops off the bottom of the screen and is no longer visible, allowing the user to rest or
disengage.
Because we’ve built only one-handed interactions (see Vary one-handed and two-handed
gestures, earlier in this document), by default we show only one cursor for one user at any
given time. This cursor is determined by the hand of the primary user who first enters the PHIZ
(see the following section, Engagement). The primary user can switch hands at any time by
dropping one hand and raising the other.
Engagement
With most human-computer interactions, it’s
easy to know when users mean to interact with
the computer, because they deliberately move
the mouse, touch the keyboard, or touch the
screen. With Kinect for Windows, it’s harder
to distinguish between deliberate intent to
engage and mere natural movement in front
of the sensor. We leave the details of how to
handle this up to you, because your specific
scenario might have special requirements or
sensitivity.
The Kinect for Windows interaction model • Not tracked – No hands are being tracked.
considers the engaged user to be whoever is in • Tracked – Kinect is tracking at least one skeleton.
control of the application. • Active – Hand is tracked, and raised anywhere above the bottom of the PHIZ,
By default, the engaged user is the same as the at any reachable arm extension – for example, when the arm is reached all the way
up, out to the right, or out to the left, and anywhere in between.
primary user, and the primary user is selected
based solely on user hand states observed. • Interactive – Hand is tracked, active, and inside the PHIZ (a smaller rectangle that
These defaults can be overridden. is within the active area described above).
Primary user
Only one user can control the single cursor at any time; this person is the primary user. Default
primary user selection can be overridden by your application to fit your scenario:
• The primary user is designated when the first tracked person raises his or her hand.
• When the primary user drops his or her hand below the PHIZ (out of the active area), or
leaves the view of the sensor, he or she stops being the primary user, and another skeleton-
tracked user can raise a hand to become the new primary user.
• While the primary user has an active hand, another user cannot take over.
• Even though the sensor can see up to six people, only two of them will have tracked
skeletons. The primary user can only be one of these two fully tracked users.
Notes
It might make sense for your application to cycle through tracked skeletons, to enable
all (up to six) users to have the opportunity to be active.
We’ve chosen some generic colors as default for this control, but you can set them to
whatever’s appropriate for your design and brand.
Including the User Viewer in your application helps your users understand how to place
themselves at the right distance and orientation for the sensor to see them. It also reduces any
feeling that the application or Kinect for Windows is buggy, unresponsive, or unexplainably
jumpy, and helps users figure out what is happening and how to make their experience better.
As mentioned above, this feedback is especially important in Kinect for Windows interactions,
because users moving freely in space will quickly feel a frustrating lack of control if the system
does not behave as they expected. For more information, see Feedback, earlier in this document.
You can size and place the User Viewer anywhere in your user interface.
In the Interaction Gallery sample, we use the user viewer in two different ways:
Notes
With the high-barrier solution, after the low-barrier criteria are met, hinting could show
what else is required to complete engagement.
Common false Here are some common user behaviors that can
positives for potentially be misinterpreted:
• Holding something (such as a drink or mobile phone)
engagement • Moving a hand toward the body or resting it on the body
Your challenge is to design actions that • Touching the face or hair
feel easy to the user, but also don’t risk • Resting an arm on the back of a chair
false positives.
• Yawning and extending arms
• Talking and gesturing with other people
Notes
This is a fairly high barrier to entry, requiring a user to perform a deliberate action in
order to engage. Another option would be to skip this step and go directly to the home
page after a user raises his or her hand into the PHIZ.
User handoff The following occurs in the event that one user decides to
After initial engagement, the Interaction relinquish control to another:
Gallery sample demonstrates how users can • If, at any time, the primary (interacting) user drops her hand below
hand off to another user. the PHIZ, a second skeleton-tracked user has a chance to take over
control and is assigned a blue color in the User Viewer.
• Messaging comes out from the small User Viewer to inform the second
tracked user that he has an opportunity to raise a hand to take over.
• If the second user raises his hand, he has temporary control of the
cursor and is prompted to press a button to take over.
• If, at any time before the second user presses, the first user raises
her hand, she is still the primary user and will remain in control of
the application, removing the ability for the second user to take over.
• If the second user completes the press, he is now the primary user,
and has control over the cursor.
• If the primary user leaves, the application will give any other tracked
user the chance to confirm engagement and remain in the current view.
Targeting
After users have engaged with a Kinect for
Windows application, one of the first tasks is
to target an object (get to an area or item they
want to open or take action on).
Do Don’t
Make it clear which objects the user Make users guess which items they can
can take action on. interact with.
ACTIONABLE NON-ACTIONABLE
Lorem
ipsum dolor
Lorem
sit amet,
consectetur
Selecting
Typically, users target an item in order to take
action on it. For example, they may want to
select a button that triggers a reaction, such as
navigation, in an application. In this document,
when we refer to selecting, we mean selecting
an item and triggering an action.
With the Kinect Button and Kinect Cursor components in the Developer Toolkit, we are
introducing a new interaction for selecting with Kinect for Windows: pressing. We’ve
found that when presented with a Kinect for Windows control, users naturally think of
pressing with their hand in space; it maps nicely to their understanding of the physical
world where they press physical buttons or extend their arm to point at objects.
Currently, the components we’ve built support only the ability to select an item to
trigger an action, but you could extend and build on the interaction to make it work
for other selection scenarios, such as toggling multiple items in a list.
Along with the press interaction, we’ve provided you with two styles of Kinect Button
as components in the Developer Toolkit, designed specifically to provide a good
pressing experience as well as feedback. We’ll cover our new button styles and tips
later in this section.
Notes
Although we’ve built only buttons so far, we think it’s a good idea to use the press
interaction for other selectable controls where it feels natural. If you’re building your own
Kinect for Windows controls, consider using the press gesture where it translates well, to
keep experiences consistent and avoid forcing users to learn a large gesture set.
Kinect Button In designing pressing with our button controls, we decided to trigger on release, similarly
The Developer Toolkit provides you with two to standard touch and mouse behavior. This enables users to cancel a press at any time
styles of Kinect Button, a component we’ve by moving their hand up, left, or right. Moving their hand down will trigger a release,
built to make the most common selectable however, because we found that as people get confident with their presses, they often
control easy to add to your application and drop their hand as they release.
to customize for your experience. We’ve Although the buttons were designed for Kinect for Windows interactions, they also
built in feedback to help users understand respond to mouse, keyboard, and touch input.
that the control is targetable, which control
they’re targeting, and how to interact with it
in 3D space. A combination of the button’s
changing size and the cursor visuals helps
users understand that they need to press and
how far they need to press in order to select
it. It also shows them if they’re in the middle
of a press so they can cancel if they need to.
• The design is a circle surrounding a glyph of your choice; you can also add text
below or to the right of the circle.
• You can scale the size of the circle and text to fit your needs and resolution.
• You can replace the circle and glyph with an image of your choice.
• Make sure that they are easy to target and select on your resolution and screen size.
• The button and text are inside a rectangular area that is all hit-targetable – this
enables users to be less accurate and still hit the button. The area of the button is
larger than the visuals, which helps reduce clutter without making users struggle
to select the buttons.
• By default you can use them in black or white, or you can re-template them to
use a brand color of your choice.
Panning and
Scrolling
Scrolling enables users to navigate up and
down, or left and right; panning can enable
users to navigate freely in X and Y within a
canvas, like dragging a finger over a map
on a touchscreen. Experiences often allow
users to scroll and pan continuously, pixel by
pixel through content, at intervals through
a surface or page, or between consecutive
screens or views.
With the Kinect Scroll Viewer component, we’re introducing a new direct-manipulation,
continuous panning and scrolling interaction: grip and move. Kinect for Windows can detect
the user’s hand closing into a fist, called gripping. This interaction is a solution for scrolling
through lists or panning on large canvases, but might not be the strongest interaction for
navigation between discrete pages, screens, or views. The gesture allows for a high level of
control, but can be fatiguing to do repeatedly or for large jumps. You can place content or
controls inside a Kinect Scroll Viewer to add this experience to an application.
Notes
Although we’ve built grip recognition to work specifically for scrolling or panning,
consider using it for similar interactions, such as zooming, drag and drop, or rotating.
Gripping works best if the user is no more than 2m away from the sensor.
Gripping works best if users’ wrists are easily seen. Encourage users to remove large coats
or items on their wrists before interacting by gripping.
For paging or view-changing scrolling scenarios, consider using Kinect Buttons in the
scroll viewer, or above it, to help jump users to the place they’re looking for. When there
are discrete sections, it may be faster and less frustrating to navigate straight to them,
rather than scroll to them with direct manipulation.
Why is gripping to
scroll better than
hovering?
The Basic Interactions sample from the 1.6
version of the Developer Toolkit showed an
example of scrolling through a list of items by
targeting a large button and hovering over it,
making the canvas move at a constant pace.
1.6 hover model 1.7 grip model
Like using hovering to select, it was very easy
and reliable to use, but also frustrating and slow.
Although there are ways to make hovering to
scroll work better, such as allowing acceleration,
we’ve found that direct manipulation with grip is As we worked on this new interaction with panning and scrolling in
a fun interaction and allows users to control their mind, we had the following goals:
speed and distance more deliberately.
• Provide feedback when the user grips and releases.
• Enable users to successfully scroll short distances with precision.
• Enable users to scroll longer distances without frustration or fatigue.
DETAIL
Developer Options
• You can enable and disable scrolling in X or Y axes.
• The control allows free panning when both X and Y are enabled (imagine dragging
around a canvas).
• Use the Kinect Items Control to create a databound list of items to scroll through.
User Experience
• Users can grip anywhere within the Scroll Viewer and drag to directly manipulate
the canvas.
• Users can grip and fling to scroll longer distances, and the canvas will continue
moving while being slowed by a set amount of friction.
• The Kinect Scroll Viewer tries to correct for accidental scrolls in the wrong direction
as users repeatedly fling.
• Users can stop a moving canvas at any time by gripping or pressing on the scrolling area.
• When the end of a scrollable area is reached, it has a slight elastic bounce to provide
feedback to the user.
Users should be able to scroll or pan by gripping any portion of the screen that actually
moves when scrolled (any part of the Kinect Scroll Viewer). When the Kinect Cursor is over the
Kinect Scroll Viewer, by default the background color of the Kinect Scroll Viewer changes to
indicate the grippable area. We found that this color change makes the interaction easier to
understand and complete. The Kinect Scroll Viewer enables users to move their gripped fist
slowly for finer control, or “fling” the content if they want to traverse a longer distance. The
fling gesture is particularly helpful when users want to reach the beginning or end of a list
quickly. Kinect for Windows ignores pressing when the user’s hand is gripped, except to stop
scrolling, or unless the user has already completed a press when a grip is detected.
The visual padding at either end of the Kinect Scroll Viewer, along with the elastic effect
during scrolling, and the bounce when the end is hit from a fling, help to indicate to the user
that they’ve reached the beginning or end of a list.
We suggest that you avoid using long scrollable lists of content in applications, because
repeatedly doing any gesture can be fatiguing and frustrating. Try to ensure that most users
can reach either end of a list with no more than two or three repetitions. Grip-and-move to
scroll or pan can be a fun and novel experience, but grip recognition while users are quickly
moving their hands is not extremely reliable, so we suggest that you reserve the Kinect Scroll
Viewer for non-critical tasks. Combining grip-and-move with other gesture interactions might
also make them both slightly less reliable.
Ergonomically, horizontal scrolling is usually easier and more comfortable for people than
vertical scrolling. Where possible, structure your user interface to allow for horizontally
scrolling content.
Also, remember that, as with any new gesture, user education is important. Many users
have never experienced a grip-and-move interaction before. Grip recognition works best
when users are deliberate about their hand positions. Sometimes half-closed hands are
misrecognized as grips. Many users figure this out quickly, but having clear messaging can
help avoid initial confusion or frustration.
User Interface
• The color of the overlay that appears when the Kinect Cursor enters the Kinect Scroll Viewer
is configurable. By default, it is a semi-transparent gray.
• The only visible part of the Scroll Viewer is the scrollbar.
• A small panning indicator shows the users’ current location in the canvas and the amount of
canvas available. Users cannot grab the panning indicator; it is only provided as visual feedback.
The panning indicator only appears when the user is actively scrolling by gripping (not with
a mouse).
• If mouse movement is detected, the scrollbar appears, with a traditional thumb control and
two arrows. After mouse movement stops, the scrollbar times out and disappears, returning
to the panning indicator.
Notes
Horizontal scrolling is easier ergonomically than vertical scrolling. If you have vertical
scrolling, do not design it to span the entire height of the screen.
Kinect Scroll Viewer areas that take up larger screen space are easier to scroll through.
It is less fatiguing for users if they don’t have to reach across their body to scroll.
Be sure to provide clear user education when you include grip scrolling in an interface.
• In the Pannable Map view, for • In the Article view, for scrolling
2D panning the text vertically
Zooming (Z-Axis
Panning)
Zooming makes objects on the screen larger
or smaller, or displays more or less detail. Many
people are familiar with using a zoom control
with a mouse and keyboard, or pinching to
zoom on a touch screen. Zooming with Kinect
for Windows can be especially challenging
because it’s much harder to be precise about
distance, or start and end points, when users
aren’t directly touching the surface.
Zoom control UI
VUI
The goal of zooming is to manipulate an object on the screen and see the results. Here
are some ways to map user actions to zoom responses. As with anything that requires
direct manipulation, try to avoid lag, and keep the UI as responsive as possible.
Proportion of change in
two-handed zoom
This might feel most intuitive and familiar to users, because it’s similar
to the touch interaction. However, it can be a challenge to implement it
well. For example, you need to define the start and end of the gesture
accurately to make the zoom level stick at the appropriate location.
Z-axis zoom
This can be tedious and frustrating for big zooms, but can work well
for semantic zooming.
Text Entry
Voice and gesture aren’t strong input methods
for text entry, and you should avoid designing
for it. Keep in mind that text entry with gesture
is usually a series of targeting and selecting
actions, which can get frustrating and tiring if
the user must do it multiple times in sequence.
If you expect users to need to enter text, and
they have access to alternative inputs such as
a touchscreen, it’s best to direct them to that
input method.
Virtual keyboard
A virtual keyboard is a
text-entry UI that people
might be familiar with. It an
allows for brief text entry by
targeting and selecting from
a list of letters.
Most targeting and selecting enhancements we’ve described for other inputs can be combined
to make text entry easier. For example, it can be useful to increase collision volume (a specified
range beyond the visual boundary within which an object responds to input) based on predicted
words, or filter letters available based on completion. As always, be sensitive to being too forceful
or presumptuous about what your user’s intent is, and leave them in control.
Do Don’t
Enable text entry for searching, or filter Require long text entry with a gesture
through a small set where only a few that imitates a keyboard experience.
letters are required.
Enable voice text entry with a small Require voice text entry with individual
number of recognized words. letters (sounds are too similar: “B,” “P,” “V”).
Use voice for short phrases and for a Require long phrase dictation or
limited and appropriate set of tasks. conversational voice input.
Additional
Interactions
In addition to the basic interactions of
targeting and selecting, scrolling, zooming,
and entering text, Kinect for Windows
enables more complex distance-dependent
interactions, as well as multiple input modes
and multiple-user scenarios.
Distance-
Dependent
Interactions
With Kinect for Windows, users no longer
need to directly touch a computer in order to
interact with it. Of course, this introduces an
interesting set of considerations for designing
interactions and interfaces for larger distances.
This section describes how you can make your
interface more usable at any distance.
Users can interact with Kinect for Windows Most Kinect for Windows
interactions aren’t feasible
from a variety of distances. These distances
at this range. Your UI should
are divided into the following categories: focus on broadly informing
Out of Range, Far Range, Near Range, and users that an interesting
Tactile Range. (Note that Near Range does interaction is available, and
not refer to the Near mode setting detailed enticing them to move closer
in Meet the Kinect for Windows Sensor, with an Attract view.
Visuals must be very large and simple, and in some cases you can use
earlier in this document.) audio to get users’ attention. You could use this range to see general
shapes – for example, to see where movement is in the room.
Because the user is near to the sensor, you can have fairly detailed
visuals, longer phrases for speech, and finer gestures and depth data.
This is also a good range to work in if you plan to require object or
symbol recognition.
Because the user is close to the screen, this range can have the
highest level of detail for visuals and controls. (Many items, small
items, and fine detail.)
Multiple Inputs
Much as we often use both voice and
gesture to communicate with each other,
sometimes the most natural way to interact
with an application can involve multiple input
methods. Of course, with some options and
combinations, you risk creating a confusing
experience. This section outlines the ways to
make multimodal actions clear and usable.
Based on your use scenario, think about how you want your controls to
handle multiple inputs:
• Will users interact with the application by using more than one input
at any given time?
• Do you want users to be able to use any input method or input at any time?
Only with certain controls? Only at specified times in your task flow?
• Does one input method take priority over the others if multiple methods
are used at once?
If the Kinect Button and Kinect Scroll Viewer detect multiple inputs, they respond to
the first one detected.
In the Interaction Gallery sample, either the mouse cursor or Kinect Cursor can be
visible and used at any given time. The Kinect Cursor is on by default, but if mouse
movement is detected:
interactions
Add to
The user points to a product, cart
and then says “Add to cart.”
With multimodal interactions, the user employs
multiple input methods in sequence to
complete a single action.
Speech + touch
Send
photo
The user presses and holds a
photo, and then says “Send
photo.”
Speech + skeleton
Record
Send
Volume
Multiple Users
Kinect for Windows can track up to two
skeletons and, therefore, detailed movement
information from two users simultaneously.
This opens the way for some interesting
collaborative interactions, as well as new
challenges regarding control and inputs.
Tracking
Kinect for Windows can be aware of up to
six people, tracking two of them in skeleton
detail, returning full joint information. For the
other four users, Kinect for Windows returns
a small amount of information: a player mask
specifying the mask or volume of the person,
and a center position.
Your application needs to select the two people who will be primary. We recommend that
the first two skeletons detected become the primary users by default.
collaboration inputs. This means that any action taken by one user affects the state of the application
for both. There are two options for balancing control in this case:
Non-collaborative interactions
In a non-collaboratve interaction, each user has his or her own sub-experience within
the interface.
You can handle this with screen partitioning or sectioning. In general, this experience
should be very similar to the single-driver experience described above; however, it’s an
extra challenge to correctly map each person’s speech and actions to the corresponding
interface. Voice commands can be a challenge. Gestures and movement should have
separate visual feedback per user.
Conclusion
Kinect for Windows is at the forefront of
the revolution known as NUI—Natural User
Interface. The next generation of human-
computer interaction, NUI lets people
interact with any device, anywhere, using the
movements and language they use every day
in their lives. Microsoft Kinect for Windows–
enabled applications open a broad range of
new possibilities for people to interact with
computers in ways that feel natural. From
business to the arts, from education to gaming,
and beyond, NUI expands the horizons of
application development.