A working implementation of my real-time activity recognition system: 

Times at which I perform the various activities: Squats - 0:20, Pushups- 0:40, Bicep curls - 1:04, Tricep extensions - 1:20, Walking - 1:41, Rest - 2:00.

How does it work?
The set up involves two iPhones: one is worn on my forearm (iPhone 4s) to capture motion, and the other (iPhone 6 - right corner of the video) displays the predicted activity.

The iPhone 4s is the brains of the operation. It captures my motion through the accelerometer and gyroscope sensors, transforms this raw data into meaningful input vectors, feeds them into a multi-layer neural network which then outputs a prediction. This value is then sent and displayed on the iPhone 6 via bluetooth. This whole process repeats every second to produce a local and real-time* activity recognition system.

*The recognition is not perfectly real-time. There is an inherent lag of ~3 to 4 seconds. This is because the system uses a 4-second window (with a sliding length of 1 second) to compute the input vectors for the neural network. Thus the system predicts the correct label approximately 3 to 4 seconds after an activity has been started.

What’s with the random predictions at times?
These random predictions have a pattern. They occur during the transitional periods between exercises. The neural network does not know about transitions, so it tries to fit an activity to the observed motion. As the motion during these periods is sporadic, the predictions jump from activity to activity.

The video shows the raw output of the neural network. In my project, I have addressed this problem by employing a simple accumulator strategy. The raw prediction of the neural network is fed into an accumulator which requires a threshold (i.e. a streak of x consistent predictions) to be met before changing its prediction to a new activity.

The app

The iOS application has three major functions:

  1. Viewer: This view looks for devices running the app in 'Tracker' mode so it can connect and display the input it receives.
  2. Tracker: Once the user hits 'Starts Tracking', this view performs the 'activity-recognition' part and then sends its output to devices running in the 'Viewer' mode.
  3. Trainer: Allows the user to perform additional training on any activity. Once the user's motion has been captured, the neural network learns from this data to better adapt to the user's form.

Rationale for using a neural network
Neural networks provide a level of malleability that is very important for this project. The multi-layer network is implemented using online learning and this enables a level of personalisation. The neural network comes with a base training set (my training data) but the user is able to build on this by performing additional training. With this extra training the network can adapt and change to fit and match user's form.

This level of malleability is not available in decision trees. A decision tree would have to be re-built each time to accomodate additional training from the user. This is a relatively computationally expensive solution as the data set grows.

I experimented with a naive Bayes classifier and although it's quicker to build and has no optimisation step, the neural network had higher accuracy in almost all of the test scenarios.

Test results
The neural network has high accuracy on my activity data. It achieves accuracy levels of ~93% on a test circuit containing 5 exercises — with transitional periods removed. Such high accuracies are to be expected as it’s trained on my data. To test it out on another person, I recruited one of my friends to complete the same test circuit. The system achieved accuracy levels of 78% without additional training, and 92% with 30 seconds worth of additional training per activity. These results look promising but I’m going to conduct more tests with additional participants over the next week to see if the results are reproducible.

Some random/highly specific questions you may have:

Can it count repetitions for gym exercises?
Not yet. I would love to add that functionality but I haven't had time to tackle that problem yet.

Wouldn’t a better approach be to initially train the neural network on more than just one person?
That is a great point! In fact Microsoft’s research arm published a paper last year doing exactly that. Although they achieved great results, their initial training cohort consisted of 94 participants! I, as a one man team, can’t possibly duplicate that. This is why I created a system that can adapt, eliminating the need of Microsoft level resources. 


Last month, a couple of friends and I participated in a 24 hour hackathon (UNIHACK). We were called the Swifites. Named after both the programming language and obviously ms. taylor swift.

We wanted to be a prepared team so we decided on an idea before the competition. This all changed 10 minutes into the hackathon after one of the mentors informed us that our idea had already been done. The app we planned on making was already out on the app store with basically the same core features and design aesthetics we had planned. We did Google our idea, obviously not well enough.

The moment (or 2 hours) of panic that followed, resulted in a much more fun and organic idea: Gifme. Starting from the premise that "photos are hip", we landed on the idea of creating photo mosaics with infinite freedom. The idea for the app was simple, take a selfie, tell us what you love and we will re-create your picture from the thing you love. It's better demonstrated by a video:

That's my face being re-created with pictures of Taylor Swift, all in real-time.

Weird that the app is called Gifme, right? When we started out, we wanted to eventually add GIF support. Instead of having a still representation, it'd be a livelier animating version of you! It turns out that downloading + displaying even a low dense GIF mosaic (a grid of 24 by 40 = 960 images) is not a trivial task. So we had to settle with regular old non-animating photos.

During the process we naturally hit a few challenges. Sourcing the photos was quite a task. We had to use Bing as our source because Google is very restrictive with their APIs. With Bing's (terribly documented) API we weren't able to search for images by colour. Our workaround was to request monochrome images from Bing, and then tint the images according to the low-dense representation we had formed. This admittedly did not achieve the desired effect we were after, but it was the best we could do.

Even though our start to the hackathon was a bit of a mess, the end result was a cool and quirky app. 

Next time, I'll personally Google the shit out the idea we decide on.