Organizations run mission-critical projects and operations on the VGS platform. That's why we are committed to delivering stable and secure products, applications, and networks at scale.
An Overview of VGS
Section 1: Cloud Infrastructure
Section 3: Quality and Release Practices
Section 4: Business Continuity
VGS products are designed to deliver enterprise-leading compliance and performance by descoping customers from systems that exchange electronic transactions initiated by cardholders using payment cards.
VGS offers low-latency and high-throughput transaction processing and is additionally connected with highly redundant networking to maintain a strict performance envelope.
All components described here are designed with redundancy and high availability in mind to ensure that we continue to meet your enterprise processing and availability requirements.
VGS uses infrastructure provided by Amazon Web Services (AWS), the world's leading Cloud Service Provider with whom we maintain a strategic partnership. AWS is responsible for protecting the infrastructure, which includes the hardware, software, networking, and data centers that run AWS cloud services.
VGS uses several AWS cloud services, such as EC2 and RDS, for its applications. It has designed its security infrastructure and configuration using AWS-recommended best practices for security and cloud architecture.
VGS services and databases use a multi-AZ (Availability Zone) deployment strategy to provide enhanced availability and durability by deploying database replicas across multiple availability zones in a region. This whitepaper explains the fault isolation benefits that AWS availability zones and regions offer. VGS is constantly working on improving its High Availability (HA) posture by expanding our fault-handling boundaries across regions and geographies. Talk to our team about our future direction.
High availability
Our core services, such as Tokenization Vault Service, use this specific strategy whereby a hot standby database instance, replicated synchronously with the active one, is readily available in a secondary AZ within the same global region. This provides high availability for our application services with an AWS-managed automatic database failover that completes as quickly as 60 seconds with zero data loss and no manual intervention. (See this for additional information).
Backups
VGS performs regular backups of our database. The backups are stored in our primary region and replicated in another region. In the rare event of data loss, VGS can restore from one of the last saved snapshots. Database snapshots are encrypted and retained for multiple days with support for point-in-time recovery.
In the US geography, the VGS platform services are provisioned in the US East (Northern Virginia) region. We are provisioned in the EU Central (Frankfurt) region in the EU geography. Within those regions, VGS uses multiple Availability Zones that are interconnected using low latency, high throughput, and highly redundant networking. Read more about AWS global infrastructure here.
VGS has purposefully built geo-isolation between its production and pre-production (e.g. dev/test) environments. Our pre-production environments are provisioned in the US West (Oregon) region. VGS is constantly working on expanding our regional presence in other world geographies. Talk to our team about our future direction.
Most of our application services are deployed in modern containerized form, and we use an industry-standard container orchestration framework called Kubernetes. More specifically, we use an AWS-managed Kubernetes service called EKS, designed and built by AWS with resiliency in mind. AWS fully manages the EKS control plane. AWS deploys replicas of the control plane services across multiple AZs. Our services run on “worker nodes,” for which VGS uses EKS-managed node groups to automate the provisioning and lifecycle management of the underlying EC2 nodes. Our worker nodes span multiple AZs, allowing seamless and hands-free AZ-level failure handling for our application services. (See this for additional information.)
Unlike API calls to additional services, which incur additional round-trip time to the service provider, a proxy-based architecture is fundamentally more conducive to high-throughput / low-latency transaction processing. The proxy acts like a hop in your regular transaction flow, similar to how ubiquitous tech like load balancers, CDNs, or firewalls might act for other use cases.
Our algorithms for secure vaulting are highly efficient and do not add significant compute / processing time to the transaction. Finally, using a modern and cloud-native tech stack (Java, Kubernetes, AWS) allows VGS to effectively balance the pace of innovation with reliability.
VGS Engineering follows modern SaaS practices steeped in DevOps and SRE culture. Our SDLC is tuned for two-week Sprints. Our engineers leverage modern tooling all the way from building / releasing (CI / CD) to monitoring / operating (observability, on-call paging, status page, synthetic testing).
This is woven with security practices at all stages - designing, coding, testing, and post-deployment monitoring. In addition, we follow a code promotion strategy where we run appropriate tests at each pre-production environment step as we propagate the change all the way to production.
VGS follows best practices for testing our services and APIs to ensure all new functionality is delivered against a written test strategy. This aligns test objectives against KPIs and requirements set forth by the business and acceptance criteria defined by the product engineering teams. These artifacts are delivered as a set of unit, integration, performance, and acceptance tests run against VGS services and systems during the development and deployment of each release through our continuous integration (CI) practice.
Adjacent to the VGS CI practice, synthetic transactions are continuously run against VGS services to simulate user activity. They are monitored alongside actual user transactions to ensure our systems perform within expected, acceptable limits.
VGS platform updates (for hardware, software, performance, or scale) are hassle-free and transparent to our customers. We offer a high level of predictability while also providing a continuous stream of new features and fixes.
VGS typically updates its applications during off-peak hours. The only time we make an exception is to deliver “hot fixes” for critical service issues. Regardless of the hour, our maintenance activities are generally performed without causing any downtime.
VGS releases are not monolithic in nature: we only deploy the set of services that need to change and can roll them back individually if required. This allows us to isolate potential issues to a specific component of one application and prevent it from affecting the update of other applications.
Our releases are performed by expert service owners who effectively function as “release managers.” The service owners are specifically trained to ensure a high level of discipline in change management and risk mitigation. In addition, we have managerial governance & oversight for production releases.
Given 1) the nature of the continuous replication of VGS databases across multiple AZs, and 2) the provisioning of the compute infrastructure for our application services spanning multiple AZs, as described earlier in this document, it's possible for VGS to have an in-region RPO (Recovery Point Objective) that is close to zero and one of the industry's best RTO (Recovery Time Objective).
RTO (Recovery Time Objective) scenarios are based on automatically re-establishing the VGS applications in the scenario where a primary database replica becomes unavailable and / or multiple application service compute nodes become unavailable.