For this next pass, the Lambda function records the data in DynamoDB -- including duplicates. The data looks like this:
The Lambda function (here in source) deserializes the event batch and iterates through each record, one DynamoDB put at a time. Effective throughput is around 40 puts/second (on a table provisioned at 75/sec).
Here's an example run from the Lambda logs (comparing batch size 10 and batch size 1):
The Lambda function (here in source) deserializes the event batch and iterates through each record, one DynamoDB put at a time. Effective throughput is around 40 puts/second (on a table provisioned at 75/sec).
Here's an example run from the Lambda logs (comparing batch size 10 and batch size 1):
Recall, the current system configuration is:
- 50 events/second are created by the Watch Sensor Recorder
- These events are dequeued in the Watch into batches of 200 items
- These batches are sent to the iPhone on the fly
- The iPhone queues these batches in the onboard Kinesis recorder
- This recorder flushes to Amazon every 30 seconds
- Lambda will pick up these flushes in batches (presently a batch size of 1)
- These batches will be written to DynamoDB [async.queue concurrency = 8]
Regardless, this pattern needs to write to DB faster than the event creation rate...
Next steps to try:
- Try dynamo.batchWriteItem -- this may help, but will be more overhead to deal with failed items and provisioning exceptions
- Consider batching multiple sensor events into a single row. The idea here is to group all 50 events in a particular second into the same row. This will only show improvement if the actual length of an event record is a significant fraction of 1kb record size
- Shrink the size of an event to the bare minimum
- Consider using Avro for the storage scheme
- AWS IoT
- Examine the actual data sent to DynamoDB -- what is are the actual latency results?
- Any data gaps or duplication?
- How does the real accelerometer data look?
- (graph the data in a 'serverless' app)
No comments:
Post a Comment