Showing posts with label Cognito. Show all posts
Showing posts with label Cognito. Show all posts

Tuesday, May 31, 2016

Efficient Apple Watch CMSensorRecorder transport to AWS

Well, I have come full circle. A long time ago, this experiment ran through Rube Goldberg system #1:

  • Dequeue from CMSensorRecorder
  • Pivot the data
  • Send it via WCSession to the iPhone
  • iPhone picked up the data and queued it locally for Kinesis
  • Then the Kinesis client transported data to Kinesis
  • Which had a Lambda configured to dequeue the data
  • And write it to DynamoDB
Then, I flipped the data around in an attempt to have the Watch write directly to DynamoDB:
  • Dequeue from CMSensorRecorder
  • Pivot the data to a BatchPutItem request for DynamoDB
  • Put the data to DynamoDB
  • (along the way run the access key Rube Goldberg machine mentioned earlier)
The problems with both of these approaches are the cost to execute on the Watch and the Watch's lack of background processing. This meant it was virtually impossible to get data dequeued before the Watch app went to sleep.

I did a little benchmarking over the weekend and found that brute force dequeue from CMSensorRecorder is fairly quick. And the WCSession sendFile support can run in the background, more or less. So, I will now attempt an alternate approach:
  • Dequeue from CMSensorRecorder
  • Minimal pivot of data including perhaps raw binary to a local file
  • WCSession:sendFile to send the file to the iPhone
  • Then iPhone gets the file and sends it itself to AWS (perhaps a little pivot, perhaps S3 instead of DynamoDB, etc.)
  • (along the way a much simpler access key machine will be needed)
The theory is that this'll get the data out of the Watch quickly during its limited active window.

We'll see...

Saturday, May 28, 2016

The limits of AWS Cognito

Well, after a slight hiatus, I spent a little time understanding how to use AWS Cognito in an application. I've now got a more or less running Cognito-as-STS-token-generator for the Apple Watch. Features:

  • Wired to any or all of Amazon, Google, Twitter or Facebook identity providers
  • Cognito processing occurs on the iPhone (hands STS tokens to Watch as Watch can't yet run the AWS SDK)
  • Leverage Cognito's ability to 'merge' identities producing a single CognitoID from multiple identity providers
  • Automatic refresh of identity access tokens
Here's the iPhone display showing all the identity providers wired:

Ok, the good stuff. Here what the access key flow now looks like:


There are a lot of actors in this play. They key actor for this article is the IdP. Here, as a slight generalization across all the IdPs, we have a token exchange system. The iPhone maintains the long lived IdP session key from the user's last login.  Then, the iPhone performs has the IdP exchange the session key for a short-lived access key to present to Cognito. For IdP like Amazon and Google, the access key is only good for an hour and must be refreshed...

Let me say that again; today, we need to manually refresh this token for Cognito before asking Cognito for an updated STS token! Cognito can't do this! FYA read Amazon's description here: "Refreshing Credentials from Identity Service" 

Especially in our case, where our credentials provider (Cognito) is merely referenced by the other AWS resources, we need to intercept the Cognito call to make sure that on the other side of Cognito, the 'logins' are up to date.

So, I replumbed the code to do just this (the 'opt' section in the above diagram). Now, a user can log in once on the iPhone application and then each time the Watch needs a token, the whole flow tests whether or not an accessKey needs to be regenerated.

For reference, here's the known lifetimes of the various tokens and keys:
  • The Watch knows its Cognito generated STS Token is good for an hour
  • Amazon accessTokens are good for an hour (implied expire time)
  • Google accessToken is good until an expire time (google actually returns a time!)
  • Twitter doesn't have expire so its accessKey is unlimited
  • Facebook's token is good for a long time (actually the timeout is 60 days)
  • TODO: do any of the IdPs enforce idle timeouts? (e.g. a sessionKey has to be exchanged within a certain time or it is invalidated...)
So, with all these constants, and a little lead time, the Watch->iPhone->DynamoDB flow looks pretty robust. The current implementation is still limited to having the Watch ask the iPhone for the STS since I haven't figured out how to get the various SDKs working in the Watch. I don't want to rewrite all the IdP fetch codes, along with manual calls to Cognito.

Plus, I'm likely to move the AWS writes back to the iPhone as the Watch is pretty slow.

The code for this release is here. The operating code is also in TestFlight (let me know if you want to try)

Known bugs:
  • Google Signin may not work when the app is launched from the Watch (app crashes)
  • Facebook login/logout doesn't update the iPhone status section
  • The getSTS in Watch is meant to be pure async -- I've turned this off until its logic is a bit more covering of various edge cases.
  • The webapp should also support all 4 IdP (only Amazon at the moment)




Tuesday, February 23, 2016

Wow: Multiple Identity Providers and AWS Cognito

I've finally found time to experiment with multiple identity providers for Cognito. Mostly to understand how a CognitoId is formed, merged, invalidated. It turns out this is a significant finding, especially when this Id is used, say, as a primary key for data storage!

Recall, the original sensor and sensor2 projects were plumbed with Login With Amazon as the identity provider to Cognito. This new experiment adds GooglePlus as a second provider. Here you can see the test platform on the iPhone:

Keep in mind that for this sensor2 application, the returned CognitoId is used as the customer's key into the storage databases. Both for access control and as the DynamoDB hash key.

The flow on the iPhone goes roughly as follows:
  • A user can login via one or both of the providers
  • A user can logout
  • A user can also login using same credentials on a different devices (e.g. another iPhone with the application loaded)
Now here's the interesting part. Depending on the login ordering, the CognitoId returned to the application (on the watch in this case) can change! Here's how it goes with my test application (which includes "Logins" merge)
  • Starting from scratch on a device
  • Login via Amazon where user's Amazon identity isn't known to this Cognito pool:
    • User will get a new CognitoId allocated
  • If user logs out and logs back in via Amazon, the same Id will be returned
  • If the user now logs into a second device via Amazon, the same Id will be returned
  • (so far this makes complete sense)
  • Now, if the user logs out and logs in via Google, a new Id will be returned
  • Again, if the user logs out and in again and same on second device, the new Id will continue to be returned
  • (this all makes sense)
  • At this point, the system thinks these are two users and those two CognitoIds will be used as different primary keys into the sensor database...
  • Now, if the user logs in via Amazon and also logs in via Google, a CognitoId merge will occur
    • One, or the other of those existing Ids from above will be returned
    • And, the other Id will be marked via Cognito as disabled
    • This is a merge of the identities
    • And this new merge will be returned on other devices from now on, regardless of whether they log in solely via Amazon or Google
    • (TODO: what happens if user is logged into Amazon, has a merged CognitoId and then they log in using a second Google credential?)
This is all interesting and sort of makes sense -- if a Cognito context has a map of logins that have been associated, then Cognito will do the right thing. This means that some key factors have to be considered when building an app like this:
  • As with my application, if the sensor database is keyed by the CognitoId, then there will be issues of accessing the data indexed by the disabled CognitoId after a merge
  • TODO: will this happen with multiple devices going through an anonymous -> identified flow?
  • It may be that additional resolution is needed to help with the merge -- e.g. if there is a merge, then ask the user to force a join -- and then externally keep track of the merged Ids as a set of Ids -> primary keys for this user...
Anyway, I'm adding in a couple more providers to make this more of a ridiculous effort. After which I'll think about resolution strategies.


Sunday, February 7, 2016

sensor2 code cleanup -- you can try it too

After a bit of field testing, I've re-organized the sensor2 code to be more robust. Release tag for this change is here. Major changes include:
  • The Watch still sends CMSensorRecorder data directly to DynamoDB
  • However, the Watch now asks the iPhone for refreshed AWS credentials (since the AWS SDK isn't yet working on Watch, this avoids having to re-implement Cognito and login-with-amazon). This means that with today's code, the Watch can be untethered from the iPhone for up to an hour and can still dequeue records to DynamoDB (assuming the Watch has Wi-Fi access itself)
  • If the Watch's credentials are bad, empty or expired and Watch can't access the iPhone or the user is logged out of the iPhone part of the app, then Watch's dequeuer loop is stopped
  • Dependent libraries (LoginWithAmazon) are now embedded in the code
  • A 'logout' on the phone will invalidate the current credentials on the Watch
This code should now be a bit easier to use for reproducing my experiments. Less moving parts, simpler design. I'll work on the README.md a bit more to help list the steps to set up.

And finally, this demonstrates multi-tenant isolation of the data in DynamoDB. Here's the IAM policy for logged in users:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "mobileanalytics:PutEvents",
                "cognito-sync:*",
                "cognito-identity:*"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "Stmt1449552297000",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateItem",
                "dynamodb:Query"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:499918285206:table/sensor2"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "${cognito-identity.amazonaws.com:sub}"
                    ]
                }
            }
        }
    ]
}

In the above example the important lines are the condition -- this condition entry enforces that only rows with HashKey the same as the logged in user's cognitoId will be returned. This is why we can build applications with direct access to a data storage engine like DynamoDB!

You can read the details of IAM+DynamoDB here.

Anyway, back to performance improvements of the dequeue process. Everything is running pretty good, but the Watch still takes a long time to get its data moved.

Monday, January 4, 2016

New "serverless" site to explore sensor data

I have updated the UI to parse and render the data from the new data model. You can try it out here.


Recall the data flow is:

  • CMSensorRecorder is activated directly on the Watch
  • When the application's dequeue is enabled, the dequeued events are:
    • Parsed directly into our DynamoDB record format
    • Directly sent to DynamoDB from the Watch
And this pure static website directly fetches those records and pivots the data in a vis.js and d3.js format for display.

Next up:

  • Get AWS Cognito into the loop to get rid of the long lived AWS credentials
  • Work on the iOS framework memory leaks
  • Speed up the dequeue (or, resort to a Lambda raw data processor)

Thursday, October 29, 2015

Sensor: You Can Try Out Some Real Data

I've set up a rendering of some actual sensor data in a couple of formats:

  • A line chart with X as time and Y as 3 lines of x, y, z acceleration
  • A 3d plot of x, y, z acceleration with color being the sample time
Is interesting to see the actual sensor fidelity in a visual form. CMSensorRecorder records at 50 samples per second and the visualizations are 400 samples or 8 seconds of data.

You can try out the sample here at http://test.accelero.com There are a couple of suggested start times shown on the page.  Enter a time and hit the Fetch button. Recall this fetch button allows the browser to directly query DynamoDB for the sample results. In this case anonymously and hard coded to this particular user's Cognito Id...


Once the results are shown you should be able to drag around on the 3d plot to see the acceleration over time.

The above timeslice is a short sample where the watch starts flat and is rotated 90 degrees in a few steps. If you try out the second sample you will see a recording of a more circular motion of the watch.

Note that d3.js is used for the line charts and vis.js is used for the interactive 3d plot.

Sunday, October 25, 2015

Apple Watch Accelerometer displayed!

There you have it! A journey started in June has finally rendered the results intended. Accelerometer data from the Watch is processed through a pile of AWS services to a dynamic web page.

Here we see the very first rendering of a four second interval where the watch is rotated around its axis. X, Y and Z axes are red, green, blue respectively. Sample rate is 50/second.

The accelerometer data itself is mildly interesting. Rendering it on the Watch or the iPhone were trivial exercises. The framework in place is what makes this fun:
  • Ramping up on WatchOS 2.0 while it was being developed
  • Same with Swift 2.0
  • Getting data out of the Watch
  • The AWS iOS and Javascript SDKs
  • Cognito federated identity for both the iPhone app and the display web page
  • A server-less data pipeline using Kinesis, Lambda and DynamoDB
  • A single-page static content web app with direct access to DynamoDB
No web servers, just a configuration exercise using AWS Paas resources. This app will likely be near 100% uptime, primarily charged per use, will scale with little intervention, is logged, AND is a security first design.

Code for this checkpoint is here.

Thursday, October 15, 2015

Apple Watch Accelerometer -> iPhone -> Kinesis -> Lambda -> DynamoDB

I've been cleaning up the code flow for more and more of the edge cases. Now, batches sent to Kinesis include Cognito Id and additional instrumentation. This will help when it comes time to troubleshoot data duplication, dropouts, etc. in the analytics stream.

For this next pass, the Lambda function records the data in DynamoDB -- including duplicates. The data looks like this:



The Lambda function (here in source) deserializes the event batch and iterates through each record, one DynamoDB put at a time. Effective throughput is around 40 puts/second (on a table provisioned at 75/sec).

Here's an example run from the Lambda logs (comparing batch size 10 and batch size 1):

START RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Version: $LATEST
2015-10-16T04:10:46.409Z d0e5b23a-54f1-4be8-b100-3a4eaabfbced Records: 10 pass: 2000 fail: 0
END RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced
REPORT RequestId: d0e5b23a-54f1-4be8-b100-3a4eaabfbced Duration: 51795.09 ms Billed Duration: 51800 ms Memory Size: 128 MB Max Memory Used: 67 MB
START RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Version: $LATEST
2015-10-16T04:13:22.468Z 6f430920-1789-43e1-a3b9-21aa8f79218e Records: 1 pass: 200 fail: 0
END RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e
REPORT RequestId: 6f430920-1789-43e1-a3b9-21aa8f79218e Duration: 5524.53 ms Billed Duration: 5600 ms Memory Size: 128 MB Max Memory Used: 67 MB

Recall, the current system configuration is:
  • 50 events/second are created by the Watch Sensor Recorder
  • These events are dequeued in the Watch into batches of 200 items
  • These batches are sent to the iPhone on the fly
  • The iPhone queues these batches in the onboard Kinesis recorder
  • This recorder flushes to Amazon every 30 seconds
  • Lambda will pick up these flushes in batches (presently a batch size of 1)
  • These batches will be written to DynamoDB [async.queue concurrency = 8]
The Lambda batch size of 1 is an interesting tradeoff.  This results in the lowest latency processing. The cost appears to be around 10% more work (mostly a lot more startup/dispatch cycles).

Regardless, this pattern needs to write to DB faster than the event creation rate...

Next steps to try:
  • Try dynamo.batchWriteItem -- this may help, but will be more overhead to deal with failed items and provisioning exceptions
  • Consider batching multiple sensor events into a single row. The idea here is to group all 50 events in a particular second into the same row. This will only show improvement if the actual length of an event record is a significant fraction of 1kb record size
  • Shrink the size of an event to the bare minimum
  • Consider using Avro for the storage scheme
  • AWS IoT
Other tasks in the queue:
  • Examine the actual data sent to DynamoDB -- what is are the actual latency results?
  • Any data gaps or duplication?
  • How does the real accelerometer data look?
  • (graph the data in a 'serverless' app)

Sunday, September 27, 2015

Cognito based credentials finally refreshing

It turns out I had it wrong all along. See here's the flow:
  • Cognito is mapping an identity to a STS based role
  • We need to ask Cognito to refresh the credentials directly (not just the provider refresh)
Now, there is some debate as to whether this part of the SDK is obeying the refresh contract. So, for now I have this construct in the 'flush to kinesis' flow:
    if (self.credentialsProvider.expiration == nil |  self.credentialsProvider.expiration.timeIntervalSinceNow < AppDelegate.CREDENTIAL_REFRESH_WINDOW_SEC) {
            let delegate = AuthorizeUserDelegate(parentController: self.viewController)
            delegate.launchGetAccessToken()
            NSLog("refreshd Cognito credentials")
  }
This winds up trigger the usual Cognito flow. And if a persistent identity is in the app, then this finally does the right thing. Simulator based transmit now does token refresh reliably over many hours, or many versions of STS tokens.

Also, this version of the code is updated based on the release versions of Xcode 7, iOS 9 and watchOS 2. Everything is running fairly smoothy. There are still a couple of areas I'm investigating:

  • The WCSession:sendMessage seems to get wedged in a certain sequence. Watch sends a message, is waiting for a reply, phone gets message, then watch goes to sleep. The phone has processed the message and is blocked on the reply to the watch. This doesn't seem to get unwedged any way other than waiting for a 2 or 5 minute timeout.
  • This particular code does get into an initial block state if the phone is locked. This looks to be something where the accelerometer sensor needs to check with the phone to see if user has granted access to sensor.
Both of the above are a bit more than minor inconveniences. The first means that even IF living with the watch app going to sleep often, you still can't reliably transfer a bunch of data to the phone using the sendMessage method. The second means it is not clean for starting the app on the watch when the phone is locked or out of range. Maybe there is a reason. But really, we are at a point where getting the sensor data out of the watch for anywhere close to near-realtime processing isn't yet realized.


Sunday, September 13, 2015

Sensor: running well on iOS 9 Seed and WatchOS 2

I've made a checkpoint of the sensor code that corresponds to the iOS9 GM seed and WatchOS 2.0. The release tag is here. Note, this code is configured to generate synthetic data, even on the hardware. I'm using this to prove the robustness of the Watch -> iPhone -> AWS connections across noisy connections.

I've cleaned up the transport a bit to send JSON directly from the Watch. This goes across the WCSession to the iPhone.  The iPhone does parse the data to examine it and update it's engineering display. But, really this raw JSON payload is sent directly to Kinesis.

Here's a screen dump of a AWS Lambda parsing the Kinesis flow. This Lambda simple prints the JSON, enough to show what is being sent:



This code runs pretty well in background mode on the iPhone. The data flow continues even while the phone is locked, or working on another application. This key concept shows iPhone as a buffered proxy to AWS.

Next up, handling a few error cases a bit better:
  • When the watch goes in and out of range
  • When the phone goes in and out of being able to reach AWS
  • And of course, when the watch goes to sleep (real goal is to keep being able to dequeue from CMSensorRecorder while watch is asleep)

Sunday, August 30, 2015

CMSensorRecorder data reliably flowing to Kinesis

I've refactored the iPhone side of the code a bit to better represent background processing of the sensor data. The good thing is that WCSession:sendMessage handles iPhone background processing properly. This is at the expense of having to handle reachability errors in code. The checkpoint of code is here.

Now the flow is roughly:
  • On Watch
    • CMSensorRecorder is activated and is recording records locally regardless of the application state of the Watch
    • When the sensor application is in the foreground, a thread attempts to dequeue data form the recorder
    • And when this data is received a WCSession:sendMessage is used to send the data to the iPhone in the background
    • Iff a valid reply comes back from this message, the CMSensorRecorder fetch position is updated to fetch the next unprocessed sensor data
  • On iPhone
    • A background thread is always ready to receive messages from the Watch
    • Those messages are saved to a local Kinesis queue
    • A timer based flush will submit this Kinesis queue to AWS
    • AWS credentials from Cognito are now manually refreshed by checking the credentials expire time
    • The send and submit kinesis calls are now asynchronous tasks
So this is pretty close to continuous feed on the iPhone side.

Some areas of durability to re-explore next:
  • How to build a Watch dequeue that can run when the application isn't in foreground?
  • Is there another way for WCSession to send to a background task other than sendMessage?
  • How reliable is the sendMessage call?
    • When the iPhone is out of range
    • When the iPhone is locked
    • When it is busy running another application
    • I do see some transient 'not paired' exceptions when sending volume
  • While this does allow for automatic background processing, is there a simpler way of transferring data that doesn't require the application handling reachability errors?
  • How reliable is the Kinesis send-retry when the iPhone can't reach AWS?
I will next be building more quantitative checks of the actual data sent through the system to understand where data get sent more than once, or where it is lost.

Wednesday, August 26, 2015

Xcode 7 beta6: Bitcode issues between WatchOS and iPhone solved!

Getting there!  A quick upgrade to Xcode 7 beta6 fixed this issue.  We now have data transfer from Watch Accelerometer to CMSensorRecorder to Watch app to iPhone to Kinesis -- yes data is flowing, mostly resiliently too; with intermittent focus, connectivity, etc.  Here is the code.

And some screen dumps (explained below):


The Watch screen dump is pretty much as before.  You will see the Cognito integration (using Amazon as an identity provider).  The first 4 lines are the identity details.  The next 4 lines are information regarding the Kinesis storage, the STS token expire time, the amount of local storage consumed, how many flushes to Kinesis have occurred, and when.

Of course the current code still relies on these transfer operations being in focus, an ongoing area of research as to how to make this a background operation on both the Watch and on the iPhone.  But still, real data is finally in Kinesis.

TODO: build an auto-refreshing STS token, as this appears to be a known problem.

Next up, write an AWS Lambda function to read from Kinesis, parse the records and then put them into DynamoDB.  Once that is there, a visualization example both on iOS and on a Server-less web service...


Sunday, August 23, 2015

Marketing: make something look like what is intended, not what it is

Well, this has been a depressing past couple of days. This was the time to re-integrate the AWS SDK back into the application in preparation for sending data to Kinesis. I had basic Cognito and Kinesis hello world working back in June on WatchOS 1.0. I'd mistakenly assumed that some sort of compatibility over time would be in order. Not to be the case. Summary:
Yes, it is possible to disable the enforcement of the TLS1.2 requirement. And this I did, I am able to get a set of temporary keys for calls to AWS services. How many applications are going to have to do this? All of them?

Worse, it doesn't look possible to use the current AWS SDK with a Watch application. This looks like a pretty ugly show stopper:
  • 3rd party library doesn't have bitcode support. While you can disable this on the iPhone,
  • Watch and iPhone have to have the same bitcode support level. And the Watch requires bitcode enabled.
Think about what this means!  7-8 years worth of iPhone 3rd party library support out there and probably used by a few applications. And these libraries will NOT work with any application that wants to bundle with WatchOS 2.0. The proverbial 'what were they thinking?' comes to mind.

So, I'm stuck: can't integrate with 3rd party library until rebuilt...

The calendar looks rough for Apple:
  • September 9th announcements; WatchOS2 and some new hardware
  • Then they turn on the holiday season marketing; "buy our watch, it has apps"
  • In the mean time, a mountain of developers are trying to figure out how to ship anything on the Watch
  • New message "trust us, our developers will eventually catch up"


Tuesday, June 9, 2015

Federated Login Enhancements

With a little more reading and experimenting I think I've refactored the pattern into a slightly more general solution that covers startup, login and logout states properly (github updated).  Code seems robust enough -- need to test a bit under various network error states.

Next up is to wire in another social provider or two (google+ and twitter).  And then to experiment with identity merge -- how to recognize multiple authenticated userIds and then join them into a single entity.

As of this point, I'm fairly certain the auto-refreshing temporary AWS credentials are loaded and working -- seems ready to use with a sensor stream to Kinesis for example.

As a footnote, my first dabbling in Swift is encouraging.  Yes, some sort of a cross of C, Java, ObjectiveC, Javascript, heck, even Gosu.  I only know enough to be dangerous, so time to read the language spec to see what I'm missing.

Monday, June 8, 2015

Federated Login Working

Well, as expected, it is quite trivial to establish the mapped identity provider callback.  Once a login to the provider completes, our service calls out to establish a login mapping to initialize the Cognito to STS mapping (GetID is the AWS api call).  Then later, as needed, a refreshing token is fetched on the fly by the Cognito credentialsProvider.  Nice.

The git repo for this experiment is here.

Sunday, June 7, 2015

AWS Cognito wired in, federating to Amazon login

This is turning out to be a nice exercise.  Been forced to dig deeper into Objective-C to Swift 'porting'.  Anyway, application now federates to Amazon as the initial identity provider.  And I do have Cognito unauthenticated access working in the app.  In this case fetching and displaying an image stored in S3.

Oh, and I also reactivated Developer license so I also now have the app running on real hardware.

Next up, figure out how to take an authentication event callback and wire it into the CognitoCredentialsProvider.  This page seems to indicate how.  It feels like a lot of this is done under the hood in the CredentialsProvider.  But I need to see if the source is available to understand what is really going on here.  I haven't yet found a concrete Swift example of having an authenticated session being what is handled by Cognito.  Only pieces...