Sven Kratz

Sven Kratz

I am passionate about inventing new interactive technologies that improve the users’ experience and empowers them with new and exciting capabilities. I am a full-stack interactive systems engineer and who invents interactive artifacts from electronics up to the UI software. I have a keen interest in applying AI and machine learning to create new interactive applications and experiences, by allowing machines to understand human gestures, activities, mental and physical states. As a Human-Computer Interaction researcher, I value the importance improving the usability and UX of new technologies through studying actual users. My work has resulted in multiple patents and has been published at top venues in the field.

Prior to joining Snapchat as a Senior Research Engineer, I worked as a Senior UX engineer building UI for self-driving trucks at Kodiak Robotics. From 2017-2018, I was member of the Future Experience Team at Harman International. Before that, I worked as a research scientist at FX Palo Alto Laboratory (FXPAL). During my Ph. D. studies, I worked as a research assistant at Media Informatics and Human-Computer Interaction Group at the University of Munich , Germany, where I obtained my Ph.D. in Computer Science in 2012. From 2008–2011, I worked as a a junior researcher and Ph.D. student within the Quality and Usability Group at Telekom Innovation Laboratories, TU Berlin, Germany. I interned at the Microsoft Applied Sciences Group in the summers of 2010 and 2011. I hold a Diplom degree in Computer Science from RWTH Aachen University, Germany.

I have published 32 peer-reviewed scientific publications at top journals and conferences, comprising 1 best-paper and 2 honorable mentions. My technical contributions have resulted in 26 granted and a multitude of pending US and international patents.

My Résumé


Snapchat: Fragrances, Communications and Dog Emotions in AR Applications

At Snapchat, I had the opportunity to work on a set of research projects that trailblazed new types of AR experiences.

Olfactory Input for AR Applications

What if you could transmit scents from within an AR lens to your friends? For scent to work in AR, and specifically AR communications, olfactory input and output devices are required. As part of an effort to create an end-to-end olfactory-based AR communications platform, I focused on the input side initially and developed an olfactory sensor prototype that can detect a variety of different household scents. The sensor is designed as an IOT appliance that can transmit updates to the cloud for integration in scent-based applications (AR or otherwise).

The sensor is modeled on the human olfactory system, and uses an array of inexpensive metal-oxide gas sensors (MOGS) together with a decision tree ML model to detect scents. An Arduino is used to read the sensor output voltages, and a LePotato single board computer does input processing, runs the decision tree model and connects the sensor to the cloud via WiFi. Detection accuracy is excellent with an accuracy >97.5 %.


Wormholes: Using AR as a Communications Medium

Future mobile communication will most likely make use of AR elements, or happen exclusively in AR. To explore this theme, we created an AR-based messaging lens called Wormhole Teleporter. Through this lens, users can create clone images of objects in their environment (we used U2Net for segmentation), add an audio message to the object and send it to their friends "via a wormhole". Messages are thus relocated from a traditional inbox or chat stream into the user's physical environment.


PetPet: An Emotion-Sensing Companion for Your Dog

We created a lens called PetPet that provides dog owners with an AR-based companion for their pet. This companion constantly monitors the dog's emotional state and suggests activities to better engage with the dog. Based on input from researchers at Harvard, we created a set of criteria specific to certain dog emotional states (i.e., "playful", "alert", "appeasing" and "neutral"), and contracted over 50 photographers on Upwork to take pictures of dogs exhibiting these behaviors. Our final dataset comprised over 10,000 images. This lens was deployed as part of the Let's Play IRL suite of collocated lenses. PetPet has been started by millions of users on the Snapchat platform.

Autonomous Vehicle UI

At Kodiak Robotics, one of the leading self-driving truck startups, I spearheaded the development of the user interface for the autonomous vehicles. The Viewer, as we called it, catered to a number of different stakeholders in the company, from autonomous vehicle operators, QA engineers, machine learning engineers to the Kodiak executives for demo events. I enjoyed my time at Kodiak very much. I learned a lot about robotic vehicle stack design, React.js and Three.js, as well as user-centered UI/UX design and engineering in a real-world setting. Software updates I pushed were used immediately by various co-workers in at the company, and it was invigorating to have this level of daily impact.

[Image credit: Kodiak voluntary safety self-assesment (VSSA), 2020]

ThermoTouch: Rethinking Thermal Haptic Displays

I noticed that the majority of thermal haptic output devices in the existing HCI literature only have one thermal pixel. Why not have an entire grid of them? Seizing this opportunity to contribute to the body of haptic UI research, I engineered a novel thermal haptic output device ThermoTouch. ThermoTouch is a haptic hardware prototype that provides a grid of thermal pixels, with an overlaid video projection. Unlike previous devices, which mainly use Peltier elements for thermal output, ThermoTouch uses liquid cooling and electro-resistive heating to output thermal feedback at arbitrary grid locations, which will potentially provide faster temperature switching times and a higher temperature dynamic range. Furthermore, the PCB-based design allows us to incorporate capacitive touch sensing directly on a thermal pixel.


ThermoTouch was presented as a full paper at ACM Interactive Surfaces and Spaces (ISS) 2017, and late-breaking work at CHI 2016.

GestureSeg: Using Crowd Labeling for Reliable Motion Gesture Segmentation

Most current mobile and wearable devices are equipped with inertial measurement units (IMU) that allow the detection of motion gestures, which can be used for interactive applications. A difficult problem to solve, however, is how to separate ambient motion from an actual motion gesture input. We explore the use of motion gesture data labeled with gesture execution phases for training supervised learning classifiers for gesture segmentation. We believe that using gesture execution phase data can significantly improve the accuracy of gesture segmentation algorithms. We define gesture execution phases as the start, middle and end of each gesture. Since labeling motion gesture data with gesture execution phase information is work intensive, we used crowd workers to perform the labeling.

Using this labeled data set, we trained SVM-based classifiers to segment motion gestures from ambient movement of the device. Our main results show that training gesture segmentation classifiers with phase-labeled data substantially increases the accuracy of gesture segmentation: we achieved a gesture segmentation accuracy of 0.89 for simulated online segmentation using a sliding window approach.

A full paper about GestureSeg will be presented at EICS 2016.

Improving User Interfaces for Robot Teleoperation

The FXPAL robotics research group has recently explored technologies for improving the usability of mobile telepresence robots. We evaluated a prototype head-tracked stereoscopic (HTS) teleoperation interface for a remote collaboration task. The results of this study indicate that using a HTS systems reduces task errors and improves the perceived collaboration success and viewing experience.


We also developed a new focus plus context viewing technique for mobile robot teleoperation. This allows us to use wide-angle camera images that proved rich contextual visual awareness of the robot's surroundings while at the same time preserving a distortion-free region in the middle of the camera view.

To this, we added a semi-automatic robot control method that allows operators to navigate the telepresence robot via a pointing and clicking directly on the camera image feed. This through-the-screen interaction paradigm has the advantage of decoupling operators from the robot control loop, freeing them for other tasks besides driving the robot.

As a result of this work, we presented two papers at the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

The paper Look Where You're Going: Visual Interfaces for Robot Teleoperation won best paper award at the conference!


AirAuth is a prototype authentication system that is intended to improve the usability of authentication by replacing password entry through mid-air gestures. AirAuth uses an Intel short-range depth camera to track the user's fingertip locations and the location of the hand center during gesture entry. Under controlled conditions we obtained a high EER-based authentication accuracy using just a few enrollment gestures and DTW matching for the gesture input. Being touchless, AirAuth is resistant to smudge attacks. We also evaluated AirAuth's resistance to shoulder surfing (visual forgery) via a camera-based study. We presented AirAuth as a work-in progress at CHI 2014 and a full paper is to appear at MobileHCI 2014.


The expressiveness of touch input can be increased by detecting additional finger pose information at the point of touch such as finger rotation and tilt. Our PointPose prototype performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. Our approach does not require complex external tracking hardware, and external computation is unnecessary as the finger pose extraction algorithm runs directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.


Mouse-based interaction on displays with large sizes and high resolutions can be problematic. The size of an unscaled mouse cursor diminishes so much that it can hardly be located on the screen, when the screen is viewed at a comfortable distance, and the default tracking speed of regular mice makes it tedious to manipulate content on the screen. At FXPAL, we are exploring full-body gestural interfaces as an alternative to mouse-based interactions on large displays. The advantages of gesture-based interaction is that gestures can be simple to perform, and cover larger spatial distances. Hence, smaller control-display gains can be used. Gestures can be intuitive, for instance when the UI is designed such that it follows the Natural User Interface (NUI) principles, where interactive objects expose their functionality during interaction. Finally, we feel that gestural interfaces will promote movement and activity at otherwise sedentary workplaces, with the effect of increasing the users' health and well-being.

VPoint Prototype

The VPoint prototype aims to explore the use of a large display for collaborative content presentation and manipulation. It uses gesture-based input tracked by a Kinect sensor, and is directly integrated with the Windows 7 desktop.


What if mobile phones were equipped with depth imaging cameras? PalmSpace envisions the use of such cameras to facilitate interaction with 3D content using hand gestures. We developed a technique that maps the pose of the user's palm directly to 3D object rotation. Our user study shows that the users could manipulate the 3D objects significantly faster than with a standard virtual trackball on the touch screen.


Protractor3D is a tilt-invariant, data-driven gesture recognizer for 3D motion gestures from data which can be obtained, for example, from 3D accelerometers on smart phones.