Life in Silicon Valley: AWS

Showing posts with label AWS. Show all posts

Tuesday, May 31, 2016

Efficient Apple Watch CMSensorRecorder transport to AWS

Well, I have come full circle. A long time ago, this experiment ran through Rube Goldberg system #1:

Dequeue from CMSensorRecorder
Pivot the data
Send it via WCSession to the iPhone
iPhone picked up the data and queued it locally for Kinesis
Then the Kinesis client transported data to Kinesis
Which had a Lambda configured to dequeue the data
And write it to DynamoDB

Then, I flipped the data around in an attempt to have the Watch write directly to DynamoDB:

Dequeue from CMSensorRecorder
Pivot the data to a BatchPutItem request for DynamoDB
Put the data to DynamoDB
(along the way run the access key Rube Goldberg machine mentioned earlier)

The problems with both of these approaches are the cost to execute on the Watch and the Watch's lack of background processing. This meant it was virtually impossible to get data dequeued before the Watch app went to sleep.

I did a little benchmarking over the weekend and found that brute force dequeue from CMSensorRecorder is fairly quick. And the WCSession sendFile support can run in the background, more or less. So, I will now attempt an alternate approach:

Dequeue from CMSensorRecorder
Minimal pivot of data including perhaps raw binary to a local file
WCSession:sendFile to send the file to the iPhone
Then iPhone gets the file and sends it itself to AWS (perhaps a little pivot, perhaps S3 instead of DynamoDB, etc.)
(along the way a much simpler access key machine will be needed)

The theory is that this'll get the data out of the Watch quickly during its limited active window.

We'll see...

Saturday, May 28, 2016

The limits of AWS Cognito

Well, after a slight hiatus, I spent a little time understanding how to use AWS Cognito in an application. I've now got a more or less running Cognito-as-STS-token-generator for the Apple Watch. Features:

Wired to any or all of Amazon, Google, Twitter or Facebook identity providers
Cognito processing occurs on the iPhone (hands STS tokens to Watch as Watch can't yet run the AWS SDK)
Leverage Cognito's ability to 'merge' identities producing a single CognitoID from multiple identity providers
Automatic refresh of identity access tokens

Here's the iPhone display showing all the identity providers wired:

Ok, the good stuff. Here what the access key flow now looks like:

There are a lot of actors in this play. They key actor for this article is the IdP. Here, as a slight generalization across all the IdPs, we have a token exchange system. The iPhone maintains the long lived IdP session key from the user's last login. Then, the iPhone performs has the IdP exchange the session key for a short-lived access key to present to Cognito. For IdP like Amazon and Google, the access key is only good for an hour and must be refreshed...

Let me say that again; today, we need to manually refresh this token for Cognito before asking Cognito for an updated STS token! Cognito can't do this! FYA read Amazon's description here: "Refreshing Credentials from Identity Service"

Especially in our case, where our credentials provider (Cognito) is merely referenced by the other AWS resources, we need to intercept the Cognito call to make sure that on the other side of Cognito, the 'logins' are up to date.

So, I replumbed the code to do just this (the 'opt' section in the above diagram). Now, a user can log in once on the iPhone application and then each time the Watch needs a token, the whole flow tests whether or not an accessKey needs to be regenerated.

For reference, here's the known lifetimes of the various tokens and keys:

The Watch knows its Cognito generated STS Token is good for an hour
Amazon accessTokens are good for an hour (implied expire time)
Google accessToken is good until an expire time (google actually returns a time!)
Twitter doesn't have expire so its accessKey is unlimited
Facebook's token is good for a long time (actually the timeout is 60 days)
TODO: do any of the IdPs enforce idle timeouts? (e.g. a sessionKey has to be exchanged within a certain time or it is invalidated...)

So, with all these constants, and a little lead time, the Watch->iPhone->DynamoDB flow looks pretty robust. The current implementation is still limited to having the Watch ask the iPhone for the STS since I haven't figured out how to get the various SDKs working in the Watch. I don't want to rewrite all the IdP fetch codes, along with manual calls to Cognito.

Plus, I'm likely to move the AWS writes back to the iPhone as the Watch is pretty slow.

The code for this release is here. The operating code is also in TestFlight (let me know if you want to try)

Known bugs:

Google Signin may not work when the app is launched from the Watch (app crashes)
Facebook login/logout doesn't update the iPhone status section
The getSTS in Watch is meant to be pure async -- I've turned this off until its logic is a bit more covering of various edge cases.
The webapp should also support all 4 IdP (only Amazon at the moment)

Tuesday, February 23, 2016

Wow: Multiple Identity Providers and AWS Cognito

I've finally found time to experiment with multiple identity providers for Cognito. Mostly to understand how a CognitoId is formed, merged, invalidated. It turns out this is a significant finding, especially when this Id is used, say, as a primary key for data storage!

Recall, the original sensor and sensor2 projects were plumbed with Login With Amazon as the identity provider to Cognito. This new experiment adds GooglePlus as a second provider. Here you can see the test platform on the iPhone:

Keep in mind that for this sensor2 application, the returned CognitoId is used as the customer's key into the storage databases. Both for access control and as the DynamoDB hash key.

The flow on the iPhone goes roughly as follows:

A user can login via one or both of the providers
A user can logout
A user can also login using same credentials on a different devices (e.g. another iPhone with the application loaded)

Now here's the interesting part. Depending on the login ordering, the CognitoId returned to the application (on the watch in this case) can change! Here's how it goes with my test application (which includes "Logins" merge)

Starting from scratch on a device
Login via Amazon where user's Amazon identity isn't known to this Cognito pool:

User will get a new CognitoId allocated

If user logs out and logs back in via Amazon, the same Id will be returned
If the user now logs into a second device via Amazon, the same Id will be returned
(so far this makes complete sense)
Now, if the user logs out and logs in via Google, a new Id will be returned
Again, if the user logs out and in again and same on second device, the new Id will continue to be returned
(this all makes sense)
At this point, the system thinks these are two users and those two CognitoIds will be used as different primary keys into the sensor database...
Now, if the user logs in via Amazon and also logs in via Google, a CognitoId merge will occur

One, or the other of those existing Ids from above will be returned
And, the other Id will be marked via Cognito as disabled
This is a merge of the identities
And this new merge will be returned on other devices from now on, regardless of whether they log in solely via Amazon or Google
(TODO: what happens if user is logged into Amazon, has a merged CognitoId and then they log in using a second Google credential?)

This is all interesting and sort of makes sense -- if a Cognito context has a map of logins that have been associated, then Cognito will do the right thing. This means that some key factors have to be considered when building an app like this:

As with my application, if the sensor database is keyed by the CognitoId, then there will be issues of accessing the data indexed by the disabled CognitoId after a merge
TODO: will this happen with multiple devices going through an anonymous -> identified flow?
It may be that additional resolution is needed to help with the merge -- e.g. if there is a merge, then ask the user to force a join -- and then externally keep track of the merged Ids as a set of Ids -> primary keys for this user...

Anyway, I'm adding in a couple more providers to make this more of a ridiculous effort. After which I'll think about resolution strategies.

Sunday, February 7, 2016

sensor2 code cleanup -- you can try it too

After a bit of field testing, I've re-organized the sensor2 code to be more robust. Release tag for this change is here. Major changes include:

The Watch still sends CMSensorRecorder data directly to DynamoDB
However, the Watch now asks the iPhone for refreshed AWS credentials (since the AWS SDK isn't yet working on Watch, this avoids having to re-implement Cognito and login-with-amazon). This means that with today's code, the Watch can be untethered from the iPhone for up to an hour and can still dequeue records to DynamoDB (assuming the Watch has Wi-Fi access itself)
If the Watch's credentials are bad, empty or expired and Watch can't access the iPhone or the user is logged out of the iPhone part of the app, then Watch's dequeuer loop is stopped
Dependent libraries (LoginWithAmazon) are now embedded in the code
A 'logout' on the phone will invalidate the current credentials on the Watch

This code should now be a bit easier to use for reproducing my experiments. Less moving parts, simpler design. I'll work on the README.md a bit more to help list the steps to set up.

And finally, this demonstrates multi-tenant isolation of the data in DynamoDB. Here's the IAM policy for logged in users:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "mobileanalytics:PutEvents",
                "cognito-sync:*",
                "cognito-identity:*"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "Stmt1449552297000",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateItem",
                "dynamodb:Query"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:499918285206:table/sensor2"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "${cognito-identity.amazonaws.com:sub}"
                    ]
                }
            }
        }
    ]
}

In the above example the important lines are the condition -- this condition entry enforces that only rows with HashKey the same as the logged in user's cognitoId will be returned. This is why we can build applications with direct access to a data storage engine like DynamoDB!

You can read the details of IAM+DynamoDB here.

Anyway, back to performance improvements of the dequeue process. Everything is running pretty good, but the Watch still takes a long time to get its data moved.

Wednesday, January 6, 2016

A note on sensor2 dequeue performance

I've examined sensor2 dequeue performance. Some interesting observations indeed!

A single dequeue loop (1250 samples for 25 seconds) time takes a bit over 7 seconds
A little under 1 second of this time is getting data from CMSensorRecorder
Around 4 seconds is required to prepare this data
The time to send the samples to DynamoDB depends on the network configuration:

3 - 6 seconds when the Watch is proxying the network through iPhone using LTE network (with a few bars of signal strength)
2 - 4 seconds when the Watch is proxying the network through iPhone (6s plus) and my home WiFi
Around 1.5 seconds when the Watch is directly connecting to network using home WiFi

Speeding up the data preparation will help some. I will set a goal of 1 second:

Hard coded JSON serializer
Improvements to the payload signer
Reduce the HashMap operations (some clever pivoting of the data)

Monday, January 4, 2016

New "serverless" site to explore sensor data

I have updated the UI to parse and render the data from the new data model. You can try it out here.

Recall the data flow is:

CMSensorRecorder is activated directly on the Watch
When the application's dequeue is enabled, the dequeued events are:

Parsed directly into our DynamoDB record format
Directly sent to DynamoDB from the Watch

And this pure static website directly fetches those records and pivots the data in a vis.js and d3.js format for display.

Next up:

Get AWS Cognito into the loop to get rid of the long lived AWS credentials
Work on the iOS framework memory leaks
Speed up the dequeue (or, resort to a Lambda raw data processor)

Sunday, January 3, 2016

Progress: CMSensorRecorder directly to DynamoDB

Relative to before, the pendulum has swung back to the other extreme: a native WatchOS application directly writing to AWS DynamoDB. Here, we see a screen grab with some events being sent:

This has been an interesting exercise. Specifically:

With iOS 9.2 and WatchOS 2.1, development has improved
However, I can't yet get the AWS iOS SDK to work on the Watch directly
So, I have instead written code that writes directly to DynamoDB

Including signing the requests
Including implementing low level API for batchWriteItem and updateItem

I have also redone the application data model to have a single DynamoDB row represent a single second's worth of data with up to 50 samples per row

Initially, samples are indexed using named columns (named by the fraction of a second the sample is in)
Later this should be done as a more general documentDB record
This approach is a more efficient use of DynamoDB -- provisioning required is around 2 writes/second per Watch that is actively dequeuing (compared to 50 writes/second when a single sample is stored in a row)

This application also uses NSURLSession directly
This means that the Watch can send events to DynamoDB using configured WiFi when the iPhone is out of range!
I have also redone the command loop using GCD dispatch queues (instead of threads)

Anyway, it appears to be doing the right thing. Data is being recorded in CMSensorRecorder, the dequeue loop is processing data and transmitting up to 1250 samples (25 seconds) of data per network call. The custom request generator and call signing are doing the right thing. Perhaps a step in the right direction? Not quite sure:

I see that the actual on-Watch dequeue processing takes about 6 seconds for 25 seconds worth of data. Since all of the data preparation must occur on the Watch (there is no middle man), the additional work of pivoting the data, preparing the DynamoDB request are borne by the Watch.
Profiling shows the bulk of this processing time is in JSON serialization!
Another approach would be minimal processing on the Watch. e.g. "dump the raw data to S3" and let an AWS Lambda take care of the detailed processing. This is probably the best approach although not the cheapest for an application with many users.
I'm now running tests long enough to see various memory leaks! I've been spending a bit of time with the memory allocator tools lately...

I have run into a few with the NSURLSession object
The JSON serializer also appears to leak memory
Possibly NSDateFormatter also is leaking memory

Here's what a dequeue loop looks like in the logs. You can see the blocks of data written and the loop processing time:

Jan 3 21:05:11 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: dequeueLoop(1)

Jan 3 21:05:11 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: flush itemCount=23, minDate=2016-01-04T05:01:54.557Z, maxDate=2016-01-04T05:01:54.998Z, length=2621

Jan 3 21:05:12 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: data(Optional("{}"))

Jan 3 21:05:13 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: commit latestDate=2016-01-04 05:01:54 +0000, itemCount=23

Jan 3 21:05:13 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: dequeueLoop(2)

Jan 3 21:05:13 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: flush itemCount=49, minDate=2016-01-04T05:01:55.018Z, maxDate=2016-01-04T05:01:55.980Z, length=5343

Jan 3 21:05:14 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: data(Optional("{}"))

Jan 3 21:05:14 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: commit latestDate=2016-01-04 05:01:55 +0000, itemCount=72

Jan 3 21:05:15 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: dequeueLoop(3)

Jan 3 21:05:20 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: flush itemCount=1250, minDate=2016-01-04T05:01:56.000Z, maxDate=2016-01-04T05:02:20.988Z, length=88481

Jan 3 21:05:23 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: data(Optional("{\"UnprocessedItems\":{}}"))

Jan 3 21:05:23 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: commit latestDate=2016-01-04 05:02:20 +0000, itemCount=1322

Jan 3 21:05:23 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: dequeueLoop(4)

Jan 3 21:05:30 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: flush itemCount=1249, minDate=2016-01-04T05:02:21.008Z, maxDate=2016-01-04T05:02:45.995Z, length=88225

Jan 3 21:05:32 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: data(Optional("{\"UnprocessedItems\":{}}"))

Jan 3 21:05:32 Gregs-AppleWatch sensor2 WatchKit Extension[152] <Warning>: commit latestDate=2016-01-04 05:02:45 +0000, itemCount=2571

And here is what a record looks like in DynamoDB. This shows the columnar encoding of a few of the X accelerometer samples:

I have a checkpoint of the code here. Note that this code is somewhat hard coded for writing only to my one table with only AWS authorizations to write.

TODO:

Update the UI to help explore this data
See if there is a more efficient use of the JSON serializer
Examine some of the framework memory leaks
Try to speed up the dequeue to be better than 6 seconds of wall clock for 25 seconds of data.

Thursday, November 19, 2015

CMSensorRecorder from Watch to AWS

I've done a couple of experiments with the direct networking from the WatchOS 2.0 using NSURLSession. Based on the results, I now have a strategy for getting the iPhone out of the loop.

Recall the current plumbing looks like this:

CMSensorRecorder data collected in Watch
Dequeued by a native application on the Watch
Sent via WCSession to iPhone
iPhone uses AWS KinesisRecorder to buffer received events
KinesisRecorder sends to AWS (using Cognito credentials)
Kinesis is dequeued to an AWS Lambda
The Lambda stores the records to DynamoDB (using a fixed IAM role)

This is fine, but has a lot of moving parts. And more importantly, Kinesis is provisioned at expected throughput and you pay by the hour for this capacity.

My next experiment will look like this:

CMSensorRecorder data collected in Watch
Dequeued by a native application on the Watch
Sent directly to AWS IoT (using on-watch local long lived IAM credentials)
An IoT rule will send these events to Lambda
The Lambda stores the records to DynamoDB (using a fixed IAM role)

The important difference here is that IoT is charged based on actual use, not provisioned capacity. This means the effective costs are more directly related to use and not expected use!

Also, the Watch will be able to directly transmit to AWS instead of going through the iPhone (well, it does proxy through its paired phone if in range, otherwise it goes direct via WIFI if a known network is near). This feature will be an interesting proof of concept indeed.

Anyway, for this first experiment the writes to IoT are via http POST (not via MQTT protocol). And authentication is via long lived credentials loaded into the Watch, not Cognito. This is primarily because I haven't yet figured out how to get the AWS IoS SDK to work on the Watch (and the iPhone at the same time). So, I am instead using low-level AWS Api calls. I expect this will change with a future release of the SDK.

I have toyed with the idea of having the Watch write directly to DynamoDB or Lambda instead of going through IoT. Since the Watch is already buffering the sensor data, I don't really need yet another reliable queue in place. Tradeoffs:

Send via IoT

+ Get Kinesis-like queue for reliable transfer
+ Get some runtime on IoT and its other capabilities
- Paying for another buffer in the middle

Send direct to Lambda

+ One less moving part
- To ensure sending data, need to make a Synchronous call to Lambda which can be delayed when writing to DynamoDB, not sure how well this will work on networks in the field

Send direct to DynamoDB

+ The lowest cost and least moving parts (and lowest latency)
- DynamoDB batch writes can only handle 25 items (0.5 seconds) of data

Note: on DynamoDB, earlier I had discussed a slightly denormalized data storage scheme. One where each second of data is recorded on one dynamoDB row (with separately named columns per sub-second event). Since DynamoDB can do no-clobber updates, this is a nice tradeoff of rows vs data width. This would change the data model and the reader would need to take this into account, but this may make the most sense no matter what. Basically doing this gets better utilization of a DynamoDB 'row' by compacting the data as much as possible. This probably reduces the overall cost of using DynamoDB too as the provisioning would, in general, be reduced. So, I may just re-do the data model and go direct to DynamoDB for this next POC.

Stay tuned!

Thursday, October 29, 2015

Sensor: You Can Try Out Some Real Data

I've set up a rendering of some actual sensor data in a couple of formats:

A line chart with X as time and Y as 3 lines of x, y, z acceleration
A 3d plot of x, y, z acceleration with color being the sample time

Is interesting to see the actual sensor fidelity in a visual form. CMSensorRecorder records at 50 samples per second and the visualizations are 400 samples or 8 seconds of data.

You can try out the sample here at http://test.accelero.com There are a couple of suggested start times shown on the page. Enter a time and hit the Fetch button. Recall this fetch button allows the browser to directly query DynamoDB for the sample results. In this case anonymously and hard coded to this particular user's Cognito Id...

Once the results are shown you should be able to drag around on the 3d plot to see the acceleration over time.

The above timeslice is a short sample where the watch starts flat and is rotated 90 degrees in a few steps. If you try out the second sample you will see a recording of a more circular motion of the watch.

Note that d3.js is used for the line charts and vis.js is used for the interactive 3d plot.

Sunday, October 25, 2015

Apple Watch Accelerometer displayed!

There you have it! A journey started in June has finally rendered the results intended. Accelerometer data from the Watch is processed through a pile of AWS services to a dynamic web page.

Here we see the very first rendering of a four second interval where the watch is rotated around its axis. X, Y and Z axes are red, green, blue respectively. Sample rate is 50/second.

The accelerometer data itself is mildly interesting. Rendering it on the Watch or the iPhone were trivial exercises. The framework in place is what makes this fun:

Ramping up on WatchOS 2.0 while it was being developed
Same with Swift 2.0
Getting data out of the Watch
The AWS iOS and Javascript SDKs
Cognito federated identity for both the iPhone app and the display web page
A server-less data pipeline using Kinesis, Lambda and DynamoDB
A single-page static content web app with direct access to DynamoDB

No web servers, just a configuration exercise using AWS Paas resources. This app will likely be near 100% uptime, primarily charged per use, will scale with little intervention, is logged, AND is a security first design.

Code for this checkpoint is here.

Friday, October 23, 2015

Amazon's iOS SDK KinesisRecorder: bug found!

Recall earlier posts discussing 50% extra Lambda->DynamoDB event storage. It turns out the problem is the AWS SDK KinesisRecorder running in the iPhone. Unlike the sample code provided, I actually have concurrent saveRecord() and submitAllRecords() flows -- sort of like real world. And this concurrency exposed a problem in the way KinesisRecorder selects data for submit to Kinesis.

Root Cause: rowid is not a stable handle for selecting and removing records.

Anyway, I made a few changes to KinesisRecorder:submitAllRecords(). These changes are mostly to index records by their partition_key. This seems to work ok for me. However, it may not scale for cases where the KinesisRecorder winds up managing a larger number of rows. This needs some benchmarking.

Pull request is here. And here's updated iPhone code to do the right thing.

As they say "now we're cookin' with gas!"

Here we see the actual storage rate is around the expected 50 per second. The error and retry rates are minimal.

Sooo, back to now analyzing the data that is actually stored in DynamoDB!

Sunday, October 18, 2015

AWS Lambda: You can't improve what you don't measure

Now that there is a somewhat reliable pipeline of data from the Watch-iPhone out to AWS, I have a chance to measure the actual throughput through AWS. Interesting results indeed.

As of this moment, Lambda is performing 50% more work than is needed.

Here's a plot of DynamoDB write activity:

Here we see that during a big catchup phase when a backlog of events are being sent at a high rate, the writes are globally limited to 100/second. This is good and expected. However, the last part is telling. Here we have caught up and only 50 events/second are being sent through the system. But DynamoDB is showing 75 write/second!

Here's a filtered log entry from one of the Lambda logs:

Indeed, individual batches are being retried. See job 636 for example. Sometimes on the same Lambda 'instance'. Sometimes on a different instance. This seems to indicate some sort of visibility timeout issue (assuming the Lambda queuer even has this concept).

Recall, the Watch is creating 50 accelerometer samples per second through CMSensorRecorder. And the code on the Watch-iPhone goes through various buffers and queues and winds up sending batches to Kinesis. Then the Kinesis->Lambda connector buffers and batches this data for processing by Lambda. This sort of a pipeline will always have tradeoffs between latency, efficiency and reliability. My goal is to identify some top level rules of thumb for future designs. Again my baseline settings:

Watch creates 50 samples/second
Watch dequeuer will dequeue up to 200 samples per batch
These batches are queued on the iPhone and flushed to Kinesis every 30 seconds
Kinesis->Lambda will create jobs of up to 5 of these batches
Lambda will take these jobs and store them to DynamoDB (in batches of 25)

There are some interesting observations:

These jobs contain up to 1000 samples and take a little over 3 seconds to process
The latency pipeline from actual event to DynamoDB store should be:

Around 2-3 minute delay in CMSensorRecorder
Transmit from Watch to iPhone is on the order of 0.5 seconds
iPhone buffering for Kinesis is 0-30 seconds
Kinesis batching will be 0-150 seconds (batches of 5)
Execution time in the Lambda of around 3 seconds

Some basic analysis of the Lambda flow shows items getting re-dispatched often!

This last part is interesting. In digging through the Lambda documentation and postings from other folks, there is very little definition of the retry contract. Documents say things like "if the lambda throws an exception.." or "items will retry until they get processed". Curiously, nothing spells out what constitutes processed, when it decides to retry, etc.

Unlike SQS, exactly what is the visibility timeout? How does Lambda decide to start up an additional worker? Will it ever progress past a corrupted record?

This may be a good time to replumb the pipeline back to my traditional 'archive first' approach: S3->SQS->worker. This has defined retry mechanisms, works at very high throughput, and looks to be a bit cheaper anyway!

Thursday, October 15, 2015

Apple Watch Accelerometer -> iPhone -> Kinesis -> Lambda -> DynamoDB

I've been cleaning up the code flow for more and more of the edge cases. Now, batches sent to Kinesis include Cognito Id and additional instrumentation. This will help when it comes time to troubleshoot data duplication, dropouts, etc. in the analytics stream.

For this next pass, the Lambda function records the data in DynamoDB -- including duplicates. The data looks like this:

The Lambda function (here in source) deserializes the event batch and iterates through each record, one DynamoDB put at a time. Effective throughput is around 40 puts/second (on a table provisioned at 75/sec).

Here's an example run from the Lambda logs (comparing batch size 10 and batch size 1):

START RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Version: $LATEST
2015-10-16T04:10:46.409Z d0e5b23a-54f1-4be8-b100-3a4eaabfbced Records: 10 pass: 2000 fail: 0
END RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced
REPORT RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Duration: 51795.09 ms Billed Duration: 51800 ms Memory Size: 128 MB Max Memory Used: 67 MB
START RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Version: $LATEST
2015-10-16T04:13:22.468Z 6f430920-1789-43e1-a3b9-21aa8f79218e Records: 1 pass: 200 fail: 0
END RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e
REPORT RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Duration: 5524.53 ms Billed Duration: 5600 ms Memory Size: 128 MB Max Memory Used: 67 MB

START RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Version: $LATEST

2015-10-16T04:10:46.409Z d0e5b23a-54f1-4be8-b100-3a4eaabfbced Records: 10 pass: 2000 fail: 0

END RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced

REPORT RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Duration: 51795.09 ms Billed Duration: 51800 ms Memory Size: 128 MB Max Memory Used: 67 MB

START RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Version: $LATEST

2015-10-16T04:13:22.468Z 6f430920-1789-43e1-a3b9-21aa8f79218e Records: 1 pass: 200 fail: 0

END RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e

REPORT RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Duration: 5524.53 ms Billed Duration: 5600 ms Memory Size: 128 MB Max Memory Used: 67 MB

Recall, the current system configuration is:

50 events/second are created by the Watch Sensor Recorder
These events are dequeued in the Watch into batches of 200 items
These batches are sent to the iPhone on the fly
The iPhone queues these batches in the onboard Kinesis recorder
This recorder flushes to Amazon every 30 seconds
Lambda will pick up these flushes in batches (presently a batch size of 1)
These batches will be written to DynamoDB [async.queue concurrency = 8]

The Lambda batch size of 1 is an interesting tradeoff. This results in the lowest latency processing. The cost appears to be around 10% more work (mostly a lot more startup/dispatch cycles).

Regardless, this pattern needs to write to DB faster than the event creation rate...

Next steps to try:

Try dynamo.batchWriteItem -- this may help, but will be more overhead to deal with failed items and provisioning exceptions
Consider batching multiple sensor events into a single row. The idea here is to group all 50 events in a particular second into the same row. This will only show improvement if the actual length of an event record is a significant fraction of 1kb record size
Shrink the size of an event to the bare minimum
Consider using Avro for the storage scheme
AWS IoT

Other tasks in the queue:

Examine the actual data sent to DynamoDB -- what is are the actual latency results?
Any data gaps or duplication?
How does the real accelerometer data look?
(graph the data in a 'serverless' app)

Sunday, September 27, 2015

Cognito based credentials finally refreshing

It turns out I had it wrong all along. See here's the flow:

Cognito is mapping an identity to a STS based role
We need to ask Cognito to refresh the credentials directly (not just the provider refresh)

Now, there is some debate as to whether this part of the SDK is obeying the refresh contract. So, for now I have this construct in the 'flush to kinesis' flow:

if (self.credentialsProvider.expiration == nil | self.credentialsProvider.expiration.timeIntervalSinceNow < AppDelegate.CREDENTIAL_REFRESH_WINDOW_SEC) {

let delegate = AuthorizeUserDelegate(parentController: self.viewController)

delegate.launchGetAccessToken()

NSLog("refreshd Cognito credentials")

}

This winds up trigger the usual Cognito flow. And if a persistent identity is in the app, then this finally does the right thing. Simulator based transmit now does token refresh reliably over many hours, or many versions of STS tokens.

Also, this version of the code is updated based on the release versions of Xcode 7, iOS 9 and watchOS 2. Everything is running fairly smoothy. There are still a couple of areas I'm investigating:

The WCSession:sendMessage seems to get wedged in a certain sequence. Watch sends a message, is waiting for a reply, phone gets message, then watch goes to sleep. The phone has processed the message and is blocked on the reply to the watch. This doesn't seem to get unwedged any way other than waiting for a 2 or 5 minute timeout.
This particular code does get into an initial block state if the phone is locked. This looks to be something where the accelerometer sensor needs to check with the phone to see if user has granted access to sensor.

Both of the above are a bit more than minor inconveniences. The first means that even IF living with the watch app going to sleep often, you still can't reliably transfer a bunch of data to the phone using the sendMessage method. The second means it is not clean for starting the app on the watch when the phone is locked or out of range. Maybe there is a reason. But really, we are at a point where getting the sensor data out of the watch for anywhere close to near-realtime processing isn't yet realized.

Sunday, September 13, 2015

Sensor: running well on iOS 9 Seed and WatchOS 2

I've made a checkpoint of the sensor code that corresponds to the iOS9 GM seed and WatchOS 2.0. The release tag is here. Note, this code is configured to generate synthetic data, even on the hardware. I'm using this to prove the robustness of the Watch -> iPhone -> AWS connections across noisy connections.

I've cleaned up the transport a bit to send JSON directly from the Watch. This goes across the WCSession to the iPhone. The iPhone does parse the data to examine it and update it's engineering display. But, really this raw JSON payload is sent directly to Kinesis.

Here's a screen dump of a AWS Lambda parsing the Kinesis flow. This Lambda simple prints the JSON, enough to show what is being sent:

This code runs pretty well in background mode on the iPhone. The data flow continues even while the phone is locked, or working on another application. This key concept shows iPhone as a buffered proxy to AWS.

Next up, handling a few error cases a bit better:

When the watch goes in and out of range
When the phone goes in and out of being able to reach AWS
And of course, when the watch goes to sleep (real goal is to keep being able to dequeue from CMSensorRecorder while watch is asleep)

Sunday, August 30, 2015

CMSensorRecorder data reliably flowing to Kinesis

I've refactored the iPhone side of the code a bit to better represent background processing of the sensor data. The good thing is that WCSession:sendMessage handles iPhone background processing properly. This is at the expense of having to handle reachability errors in code. The checkpoint of code is here.

Now the flow is roughly:

On Watch

CMSensorRecorder is activated and is recording records locally regardless of the application state of the Watch
When the sensor application is in the foreground, a thread attempts to dequeue data form the recorder
And when this data is received a WCSession:sendMessage is used to send the data to the iPhone in the background
Iff a valid reply comes back from this message, the CMSensorRecorder fetch position is updated to fetch the next unprocessed sensor data

On iPhone

A background thread is always ready to receive messages from the Watch
Those messages are saved to a local Kinesis queue
A timer based flush will submit this Kinesis queue to AWS
AWS credentials from Cognito are now manually refreshed by checking the credentials expire time
The send and submit kinesis calls are now asynchronous tasks

So this is pretty close to continuous feed on the iPhone side.

Some areas of durability to re-explore next:

How to build a Watch dequeue that can run when the application isn't in foreground?
Is there another way for WCSession to send to a background task other than sendMessage?
How reliable is the sendMessage call?

When the iPhone is out of range
When the iPhone is locked
When it is busy running another application
I do see some transient 'not paired' exceptions when sending volume

While this does allow for automatic background processing, is there a simpler way of transferring data that doesn't require the application handling reachability errors?
How reliable is the Kinesis send-retry when the iPhone can't reach AWS?

I will next be building more quantitative checks of the actual data sent through the system to understand where data get sent more than once, or where it is lost.

Wednesday, August 26, 2015

Xcode 7 beta6: Bitcode issues between WatchOS and iPhone solved!

Getting there! A quick upgrade to Xcode 7 beta6 fixed this issue. We now have data transfer from Watch Accelerometer to CMSensorRecorder to Watch app to iPhone to Kinesis -- yes data is flowing, mostly resiliently too; with intermittent focus, connectivity, etc. Here is the code.

And some screen dumps (explained below):

The Watch screen dump is pretty much as before. You will see the Cognito integration (using Amazon as an identity provider). The first 4 lines are the identity details. The next 4 lines are information regarding the Kinesis storage, the STS token expire time, the amount of local storage consumed, how many flushes to Kinesis have occurred, and when.

Of course the current code still relies on these transfer operations being in focus, an ongoing area of research as to how to make this a background operation on both the Watch and on the iPhone. But still, real data is finally in Kinesis.

TODO: build an auto-refreshing STS token, as this appears to be a known problem.

Next up, write an AWS Lambda function to read from Kinesis, parse the records and then put them into DynamoDB. Once that is there, a visualization example both on iOS and on a Server-less web service...

Sunday, August 23, 2015

Marketing: make something look like what is intended, not what it is

Well, this has been a depressing past couple of days. This was the time to re-integrate the AWS SDK back into the application in preparation for sending data to Kinesis. I had basic Cognito and Kinesis hello world working back in June on WatchOS 1.0. I'd mistakenly assumed that some sort of compatibility over time would be in order. Not to be the case. Summary:

Starting with beta5, mandatory enforcement of TLS1.2 network stacks (this negotiation fails with the current AWS SDK calls to their cloud services)
bitcode enabled applications are required (the current AWS SDK doesn't support this)

Yes, it is possible to disable the enforcement of the TLS1.2 requirement. And this I did, I am able to get a set of temporary keys for calls to AWS services. How many applications are going to have to do this? All of them?

Worse, it doesn't look possible to use the current AWS SDK with a Watch application. This looks like a pretty ugly show stopper:

3rd party library doesn't have bitcode support. While you can disable this on the iPhone,
Watch and iPhone have to have the same bitcode support level. And the Watch requires bitcode enabled.

Think about what this means! 7-8 years worth of iPhone 3rd party library support out there and probably used by a few applications. And these libraries will NOT work with any application that wants to bundle with WatchOS 2.0. The proverbial 'what were they thinking?' comes to mind.

So, I'm stuck: can't integrate with 3rd party library until rebuilt...

The calendar looks rough for Apple:

September 9th announcements; WatchOS2 and some new hardware
Then they turn on the holiday season marketing; "buy our watch, it has apps"
In the mean time, a mountain of developers are trying to figure out how to ship anything on the Watch
New message "trust us, our developers will eventually catch up"

Tuesday, June 9, 2015

Federated Login Enhancements

With a little more reading and experimenting I think I've refactored the pattern into a slightly more general solution that covers startup, login and logout states properly (github updated). Code seems robust enough -- need to test a bit under various network error states.

Next up is to wire in another social provider or two (google+ and twitter). And then to experiment with identity merge -- how to recognize multiple authenticated userIds and then join them into a single entity.

As of this point, I'm fairly certain the auto-refreshing temporary AWS credentials are loaded and working -- seems ready to use with a sensor stream to Kinesis for example.

As a footnote, my first dabbling in Swift is encouraging. Yes, some sort of a cross of C, Java, ObjectiveC, Javascript, heck, even Gosu. I only know enough to be dangerous, so time to read the language spec to see what I'm missing.

Monday, June 8, 2015

Federated Login Working

Well, as expected, it is quite trivial to establish the mapped identity provider callback. Once a login to the provider completes, our service calls out to establish a login mapping to initialize the Cognito to STS mapping (GetID is the AWS api call). Then later, as needed, a refreshing token is fetched on the fly by the Cognito credentialsProvider. Nice.

The git repo for this experiment is here.

Pages