Posted by Brian McCallion ● Jun 26, 2018 1:18:00 PM

AWS Lambda functions for Bloomberg Datalicense

Just a quick introduction to some of our work to simplify data access. Many customer have asked about metering, access controls, and a simplified programming model for Bloomberg Datalicense. To help customers focus on analytics rather than integration we've developed a set of APIs using AWS API Gateway, Lambda functions, a S3 datalake cataloged with AWS Glue. This process simplifies data acquisition and access by enabling data to be stored in S3 and queried any time using Amazon Athena and SQL. Customers build Amazon Quicksight dashboards and tell stories rather than wrangling with integration and dataaccess. Here's a snapshot from our development process.

Why AWS Lambda for acquiring data for Bloomberg Datalicense? When we set out to develop this application we listened to feedback from members of our team working on enterprise projects where the firm ingests thousands of files each day using ftp. Such processes require dedicated staff. Further, the fact that data files arrive at different times each day requires "polling" for files, or an event driven programming model that requires continuously running servers.

bbdatalicenseasync2018-06-25_7-42-02

We reconsidered this model and decided on a model that applies CloudWatch Events to schedule jobs, AWS StepFunctions to ensure each step in processing completes, or retries, and that failures become actionable alarms. As AWS Lambda is stateless, there's no waiting for data -- if the data is not ready, the function terminates. Because the functions are stateless, the process is simpler to understand, and to rerun as necessary. Inputs and outputs of each step are recorded by the AWS Step Functions State Machine, and so a single step can be rerun.

Does it scale? While one might think traditional file processing requires little compute or I/O in practice, the polling and sheer number of files destabilizes over time. By approach this process with instances of AWS Lambda functions, our process scales as the number of files scales with no thread, I/O, or locking constraints. Because for the same firm we can cache already retrieved datasets, customers benefit from very low latency when a file can be retrieved from cache. Because we meter API calls, customers can track who, within their own firm, has requested what data.  We're excited about the upcoming new release, and have a lot more of this story to tell you.

Learn more

Topics: Serverless, Capital Markets, tick data, capital markets data