Highway Three Solutions was brought in to evaluate a custom built Data Management Platform (DMP), which was not meeting the clients needs for scalability and reliability. H3 designed a more flexible solution using AWS tools that would allow for chaining schedules, error handling,
The DMP supported 200GB+ a day of incoming data from a variety of sources (Beeswax, Neustar, Experian) and was architected and implemented using AWS Glue. This tool provided a robust container to run the ETL scripts, chain jobs for dependencies, and allow flexibility in data-stores for later use in querying using Zeppelin.
To support both the DMP and supporting systems, Highway Three architected and built a highly available and scalable web application infrastructure based on a microservices architecture in an AWS environment. FE and BE components were designed to run in a Docker container and leveraged Amazon’s Elastic Container Service for container orchestration. Infrastructure ran in two availability zones such that when a given zone was down a parallel set of infrastructure was still able to service end users.
AWS development and tools included:
- building out highly automated CI/CD pipelines(Amazon CodePipeline) to deliver application artifacts and infrastructure as code changes via Cloudformation templates.
- Implemented monitoring of key metrics for FE, APIs, and other infrastructure to detect potential issues in environments.
- building out application infrastructure completely as infrastructure as code via Cloudformation templates.
- defining IAM policy permissions for various AWS resources.
- Implemented serverless functions in AWS Lambda for integration with other Amazon services such as Cognito, S3, CodePipeline
- Architected and built infrastructure to support user management, authentication and authorization via Amazon Cognito and tying it together will social providers like Google.
- Deployed Amazon API Gateway with throttle, secure, manage, version and deploy APIs
- Created detailed access rules for S3 via bucket polices and IAM policies.
- Implemented complex solutions that require cross account access for Amazon resources such as S3.
- Defined load balancing infrastructure that routes traffic to different APIs or FE instances based on url routes and hostnames using Amazon Application Load Balancer
- Asset storage in Aspera (programmatically created nodes on S3 storage)
Highway Three Solutions was brought in to evaluate numerous tools and create a cloud infrastructure for the Integrated Data Office (IDO). H3 designed a flexible solution that did its’ best to be cloud agnostic but lived on Azure.
IDO requires the secure drop off of large files, processing of them (de-identification) and dropping them into a data lake. At which point researchers can work on the de-identified data gaining the benefit of the data lake using notebooks.
H3 stood up, and evaluated numerous software as part of the IDO team to satisfy the needs of IDO. Eventually a Hadoop (HDP) solution was chosen for the data lake. Jupyterhub was chosen as the notebook. A microservice architecture was chosen to support the front-end. Docker containers with the intention of Kubernetes deployment was utilized heavily to support a highly scalable design.
Azure Development and tool highlights:
- Terraform to create infrastructure as code, with remote storage for collaboration.
- Jenkins to deploy both infrastructure changes (Terraform) and app changes (Helm Charts for Kubernetes)
- HDP for Hadoop data lake platform (Using Ambari Blueprints to setup and Azure Vms for the machines)
- Keycloak for IAM syncing to LDAP to tie to Hadoop (All tools use either LDAP or OIDC for auth to keycloak), also allows social login if desired
- Hashicorp Vault for secrets and configuration management using HCL Policies for access control with Consul for HA which also provides service discovery/mesh
- Azure Application Gateway and API Gateway for controlling access to services
- Azure NSG/Vnets/Subets for securing Infrastructure pathways
- Azure Hosted Postgres/MySql/Cosmos(NoSQL) for databases used by all services and Storage blob for storage