webcam -> face sketch -> marble bust
A working implementation of my real-time activity recognition system:
How does it work?
The set up involves two iPhones: one is worn on my forearm (iPhone 4s) to capture motion, and the other (iPhone 6 - right corner of the video) displays the predicted activity.
The iPhone 4s is the brains of the operation. It captures my motion through the accelerometer and gyroscope sensors, transforms this raw data into meaningful input vectors, feeds them into a multi-layer neural network which then outputs a prediction. This value is then sent and displayed on the iPhone 6 via bluetooth. This whole process repeats every second to produce a local and real-time* activity recognition system.
*The recognition is not perfectly real-time. There is an inherent lag of ~3 to 4 seconds. This is because the system uses a 4-second window (with a sliding length of 1 second) to compute the input vectors for the neural network. Thus the system predicts the correct label approximately 3 to 4 seconds after an activity has been started.
What’s with the random predictions at times?
These random predictions have a pattern. They occur during the transitional periods between exercises. The neural network does not know about transitions, so it tries to fit an activity to the observed motion. As the motion during these periods is sporadic, the predictions jump from activity to activity.
The video shows the raw output of the neural network. In my project, I have addressed this problem by employing a simple accumulator strategy. The raw prediction of the neural network is fed into an accumulator which requires a threshold (i.e. a streak of x consistent predictions) to be met before changing its prediction to a new activity.
The iOS application has three major functions:
- Viewer: This view looks for devices running the app in 'Tracker' mode so it can connect and display the input it receives.
- Tracker: Once the user hits 'Starts Tracking', this view performs the 'activity-recognition' part and then sends its output to devices running in the 'Viewer' mode.
- Trainer: Allows the user to perform additional training on any activity. Once the user's motion has been captured, the neural network learns from this data to better adapt to the user's form.
Rationale for using a neural network
Neural networks provide a level of malleability that is very important for this project. The multi-layer network is implemented using online learning and this enables a level of personalisation. The neural network comes with a base training set (my training data) but the user is able to build on this by performing additional training. With this extra training the network can adapt and change to fit and match user's form.
This level of malleability is not available in decision trees. A decision tree would have to be re-built each time to accomodate additional training from the user. This is a relatively computationally expensive solution as the data set grows.
I experimented with a naive Bayes classifier and although it's quicker to build and has no optimisation step, the neural network had higher accuracy in almost all of the test scenarios.
The neural network has high accuracy on my activity data. It achieves accuracy levels of ~93% on a test circuit containing 5 exercises — with transitional periods removed. Such high accuracies are to be expected as it’s trained on my data. To test it out on another person, I recruited one of my friends to complete the same test circuit. The system achieved accuracy levels of 78% without additional training, and 92% with 30 seconds worth of additional training per activity. These results look promising but I’m going to conduct more tests with additional participants over the next week to see if the results are reproducible.
Some random/highly specific questions you may have:
Can it count repetitions for gym exercises?
Not yet. I would love to add that functionality but I haven't had time to tackle that problem yet.
Wouldn’t a better approach be to initially train the neural network on more than just one person?
That is a great point! In fact Microsoft’s research arm published a paper last year doing exactly that. Although they achieved great results, their initial training cohort consisted of 94 participants! I, as a one man team, can’t possibly duplicate that. This is why I created a system that can adapt, eliminating the need of Microsoft level resources.
Last month, a couple of friends and I participated in a 24 hour hackathon (UNIHACK). We were called the Swifites. Named after both the programming language and obviously ms. taylor swift.
We wanted to be a prepared team so we decided on an idea before the competition. This all changed 10 minutes into the hackathon after one of the mentors informed us that our idea had already been done. The app we planned on making was already out on the app store with basically the same core features and design aesthetics we had planned. We did Google our idea, obviously not well enough.
The moment (or 2 hours) of panic that followed, resulted in a much more fun and organic idea: Gifme. Starting from the premise that "photos are hip", we landed on the idea of creating photo mosaics with infinite freedom. The idea for the app was simple, take a selfie, tell us what you love and we will re-create your picture from the thing you love. It's better demonstrated by a video:
Weird that the app is called Gifme, right? When we started out, we wanted to eventually add GIF support. Instead of having a still representation, it'd be a livelier animating version of you! It turns out that downloading + displaying even a low dense GIF mosaic (a grid of 24 by 40 = 960 images) is not a trivial task. So we had to settle with regular old non-animating photos.
During the process we naturally hit a few challenges. Sourcing the photos was quite a task. We had to use Bing as our source because Google is very restrictive with their APIs. With Bing's (terribly documented) API we weren't able to search for images by colour. Our workaround was to request monochrome images from Bing, and then tint the images according to the low-dense representation we had formed. This admittedly did not achieve the desired effect we were after, but it was the best we could do.
Even though our start to the hackathon was a bit of a mess, the end result was a cool and quirky app.
Next time, I'll personally Google the shit out the idea we decide on.
Apple's operations chief confirmed that they'll be releasing the native Apple Watch SDK at the upcoming WWDC. He revealed that the SDK will provide direct access to the Watch sensors. This is extremely relevant to me, given my final year project (I promise to do an update on progress soon). Now, I expected Apple to eventually roll this out, but the fact that I'll have something to play with in less than two weeks is extremely exciting!
Although i'm excited, I'm also realistic with my expectations. Given Apple's record with app functionality (*cough* *cough* background processes), I have a lot of questions about the Watch SDK. Will developers have access to the heart-rate sensor or just accelerometer and gyroscope? Given the limitations in battery, will apps have any background functionality? Can apps continuously access sensors output? Are apps allowed to be Watch only (pretty sure I know the answer to this one)?
So here's hoping for the best (a big fat YES to all my questions), but I'll settle for basically anything.
Bring on WWDC!
I was reading the Swift programming book published by Apple, and in the closures chapter, I came across a series of code snippets. The snippets relate to the closure expression syntax used in Swift, but that is not important for the purpose of this post. Putting aside the programmer perspective (code readability etc. etc.), let's just focus on the aesthetics of the collection. Each iteration does away with a seemingly 'necessary' element, and only through this process does it arrive at the bare essentials.
Although these are just snippets of code, the process of iterating, evaluating and doing away with the unnecessary applies to all aspect of our lives.
I am currently reading this essay turned book by Elle Luna. With her colourful illustrations, she challenges the reader to think about what they have in their life - a job ("something typically done from 9 to 5 for pay"), a career ("system of advancements and promotions where rewards are used to optimise behaviour") or a calling ("something we feel compelled to do regardless of fame or fortune"). I wouldn't do the book justice by describing it, so I'll share a passage where Elle writes about the difference between should and must.
Elle on must:
If you haven't read the essay that inspired it all, do yourself a favour, read it now (then go read the book!).
The intersection between health and technology is my current obsession. Well, current is not exactly correct, I've been interested in this idea for quite a while. The idea of using technology to measure, record and analyse my activities and health (i.e. the movement of quantified self) is very attractive to me. My interest extends beyond recording steps or having devices tell me when to stand up/move, although that's a nice start. I want to monitor the complete state of my body, from cholesterol levels to gym routines. This idea of health-tech extends much further, what if we could help monitor the behaviour of patients with neurological disorders? or help elderly people increase mobility? The point here is not replace doctors or nurses or health assistants, but to provide them with the complete picture.
I know the area of health-tech is picking up pace and I want to be part of the effort towards making it a reality. But, you know, the journey of a thousand miles beings with ... etc etc.
So to begin this journey, I've focussed my honours/4th year on building an activity recognition system that works real-time on an iPhone. Now, this idea is not new. There are fitness bands and smartwatches that accomplish most of what I aim to do but that is not the point. There is no single solution for this problem, so the point is to create my implementation and to go through the process of solving a non-trivial problem.
Activity recognition can be separated into three main parts: sensing module, feature analysis and classification. The sensing module is responsible for collecting the sensor data, and I've implemented this part, so here are some pretty pictures (data collected through an iPhone strapped to my wrist, visualising specifically the x-axis acceleration measured by the gyroscope):
How can you not be giddy with excitement after looking at these! We can distinguish most of these quite easily, and can even count the repetitions of the pull-ups! But just because it's relatively easily to identify based on inspection, building a recognition model isn't as trivial. So my first goal is to take these four activities and try to classify them as activity vs non-activity. Then i'll progress to identification of specific activities and just keep building on it.
That's the current state of the project, and as I hit milestones/roadblocks/insights with the project, I'll share the progress!
"I want to be with those who know secret things or else alone" - Rainer Maria Rilke
Gelato place of choice : Grom.
Flavours of choice: Nocciola (hazlenut), Cafe & Crema di Grom with extra meliga (biscuits).
I have an obsession with street lamps. The photo above is from my recent trip to Italy and it's my favourite capture of the trip.
(The picture was taken in Venice, it was an overcast day and I just couldn't walk past these radiant lamps without capturing them.)