A working implementation of my real-time activity recognition system: 

Times at which I perform the various activities: Squats - 0:20, Pushups- 0:40, Bicep curls - 1:04, Tricep extensions - 1:20, Walking - 1:41, Rest - 2:00.

How does it work?
The set up involves two iPhones: one is worn on my forearm (iPhone 4s) to capture motion, and the other (iPhone 6 - right corner of the video) displays the predicted activity.

The iPhone 4s is the brains of the operation. It captures my motion through the accelerometer and gyroscope sensors, transforms this raw data into meaningful input vectors, feeds them into a multi-layer neural network which then outputs a prediction. This value is then sent and displayed on the iPhone 6 via bluetooth. This whole process repeats every second to produce a local and real-time* activity recognition system.

*The recognition is not perfectly real-time. There is an inherent lag of ~3 to 4 seconds. This is because the system uses a 4-second window (with a sliding length of 1 second) to compute the input vectors for the neural network. Thus the system predicts the correct label approximately 3 to 4 seconds after an activity has been started.

What’s with the random predictions at times?
These random predictions have a pattern. They occur during the transitional periods between exercises. The neural network does not know about transitions, so it tries to fit an activity to the observed motion. As the motion during these periods is sporadic, the predictions jump from activity to activity.

The video shows the raw output of the neural network. In my project, I have addressed this problem by employing a simple accumulator strategy. The raw prediction of the neural network is fed into an accumulator which requires a threshold (i.e. a streak of x consistent predictions) to be met before changing its prediction to a new activity.

The app

The iOS application has three major functions:

  1. Viewer: This view looks for devices running the app in 'Tracker' mode so it can connect and display the input it receives.
  2. Tracker: Once the user hits 'Starts Tracking', this view performs the 'activity-recognition' part and then sends its output to devices running in the 'Viewer' mode.
  3. Trainer: Allows the user to perform additional training on any activity. Once the user's motion has been captured, the neural network learns from this data to better adapt to the user's form.

Rationale for using a neural network
Neural networks provide a level of malleability that is very important for this project. The multi-layer network is implemented using online learning and this enables a level of personalisation. The neural network comes with a base training set (my training data) but the user is able to build on this by performing additional training. With this extra training the network can adapt and change to fit and match user's form.

This level of malleability is not available in decision trees. A decision tree would have to be re-built each time to accomodate additional training from the user. This is a relatively computationally expensive solution as the data set grows.

I experimented with a naive Bayes classifier and although it's quicker to build and has no optimisation step, the neural network had higher accuracy in almost all of the test scenarios.

Test results
The neural network has high accuracy on my activity data. It achieves accuracy levels of ~93% on a test circuit containing 5 exercises — with transitional periods removed. Such high accuracies are to be expected as it’s trained on my data. To test it out on another person, I recruited one of my friends to complete the same test circuit. The system achieved accuracy levels of 78% without additional training, and 92% with 30 seconds worth of additional training per activity. These results look promising but I’m going to conduct more tests with additional participants over the next week to see if the results are reproducible.

Some random/highly specific questions you may have:

Can it count repetitions for gym exercises?
Not yet. I would love to add that functionality but I haven't had time to tackle that problem yet.

Wouldn’t a better approach be to initially train the neural network on more than just one person?
That is a great point! In fact Microsoft’s research arm published a paper last year doing exactly that. Although they achieved great results, their initial training cohort consisted of 94 participants! I, as a one man team, can’t possibly duplicate that. This is why I created a system that can adapt, eliminating the need of Microsoft level resources. 


The intersection between health and technology is my current obsession. Well, current is not exactly correct, I've been interested in this idea for quite a while. The idea of using technology to measure, record and analyse my activities and health (i.e. the movement of quantified self) is very attractive to me. My interest extends beyond recording steps or having devices tell me when to stand up/move, although that's a nice start. I want to monitor the complete state of my body, from cholesterol levels to gym routines. This idea of health-tech extends much further, what if we could help monitor the behaviour of patients with neurological disorders? or help elderly people increase mobility? The point here is not replace doctors or nurses or health assistants, but to provide them with the complete picture.

I know the area of health-tech is picking up pace and I want to be part of the effort towards making it a reality. But, you know, the journey of a thousand miles beings with ... etc etc.

So to begin this journey, I've focussed my honours/4th year on building an activity recognition system that works real-time on an iPhone. Now, this idea is not new. There are fitness bands and smartwatches that accomplish most of what I aim to do but that is not the point. There is no single solution for this problem, so the point is to create my implementation and to go through the process of solving a non-trivial problem.

Activity recognition can be separated into three main parts: sensing module, feature analysis and classification. The sensing module is responsible for collecting the sensor data, and I've implemented this part, so here are some pretty pictures (data collected through an iPhone strapped to my wrist, visualising specifically the x-axis acceleration measured by the gyroscope):

How can you not be giddy with excitement after looking at these! We can distinguish most of these quite easily, and can even count the repetitions of the pull-ups! But just because it's relatively easily to identify based on inspection, building a recognition model isn't as trivial. So my first goal is to take these four activities and try to classify them as activity vs non-activity. Then i'll progress to identification of specific activities and just keep building on it.

That's the current state of the project, and as I hit milestones/roadblocks/insights with the project, I'll share the progress!