Recall the current plumbing looks like this:
- CMSensorRecorder data collected in Watch
- Dequeued by a native application on the Watch
- Sent via WCSession to iPhone
- iPhone uses AWS KinesisRecorder to buffer received events
- KinesisRecorder sends to AWS (using Cognito credentials)
- Kinesis is dequeued to an AWS Lambda
- The Lambda stores the records to DynamoDB (using a fixed IAM role)
This is fine, but has a lot of moving parts. And more importantly, Kinesis is provisioned at expected throughput and you pay by the hour for this capacity.
My next experiment will look like this:
- CMSensorRecorder data collected in Watch
- Dequeued by a native application on the Watch
- Sent directly to AWS IoT (using on-watch local long lived IAM credentials)
- An IoT rule will send these events to Lambda
- The Lambda stores the records to DynamoDB (using a fixed IAM role)
The important difference here is that IoT is charged based on actual use, not provisioned capacity. This means the effective costs are more directly related to use and not expected use!
Also, the Watch will be able to directly transmit to AWS instead of going through the iPhone (well, it does proxy through its paired phone if in range, otherwise it goes direct via WIFI if a known network is near). This feature will be an interesting proof of concept indeed.
Anyway, for this first experiment the writes to IoT are via http POST (not via MQTT protocol). And authentication is via long lived credentials loaded into the Watch, not Cognito. This is primarily because I haven't yet figured out how to get the AWS IoS SDK to work on the Watch (and the iPhone at the same time). So, I am instead using low-level AWS Api calls. I expect this will change with a future release of the SDK.
I have toyed with the idea of having the Watch write directly to DynamoDB or Lambda instead of going through IoT. Since the Watch is already buffering the sensor data, I don't really need yet another reliable queue in place. Tradeoffs:
- Send via IoT
- + Get Kinesis-like queue for reliable transfer
- + Get some runtime on IoT and its other capabilities
- - Paying for another buffer in the middle
- Send direct to Lambda
- + One less moving part
- - To ensure sending data, need to make a Synchronous call to Lambda which can be delayed when writing to DynamoDB, not sure how well this will work on networks in the field
- Send direct to DynamoDB
- + The lowest cost and least moving parts (and lowest latency)
- - DynamoDB batch writes can only handle 25 items (0.5 seconds) of data
Note: on DynamoDB, earlier I had discussed a slightly denormalized data storage scheme. One where each second of data is recorded on one dynamoDB row (with separately named columns per sub-second event). Since DynamoDB can do no-clobber updates, this is a nice tradeoff of rows vs data width. This would change the data model and the reader would need to take this into account, but this may make the most sense no matter what. Basically doing this gets better utilization of a DynamoDB 'row' by compacting the data as much as possible. This probably reduces the overall cost of using DynamoDB too as the provisioning would, in general, be reduced. So, I may just re-do the data model and go direct to DynamoDB for this next POC.
Stay tuned!