Skip to main content
  1. Posts/

Checking for connectivity of ElastiCache Redis instances across peering connections using VPC Reachability Analyzer

··751 words·4 mins·
DevOps Aws VPC

It’s quite common to work in an environment where you have your VPC connected to multiple other VPCs, either via VPC Peering or Transit Gateway, it’s no different for me. I was working on deploying an application on our in-house container platform. This container platform is hosted in another account/VPC, and it had to connect to the ElastiCache Redis instances hosted in our VPC/account. I was looking into timeouts connecting to the Redis instance from the application containers, and I figured this would be a good time to test out the VPC Reachability Analyzer.

The VPC Reachability Analyzer tool is a network diagnostic tool from AWS. You can access it from the VPC Section of AWS Console. Clicking on Reachability Analyzer, you enter the source, destination, optional intermediaries and click on “Analyze Path” for AWS to trace the path between the source and the destination.

Finding Reachability Analyzer in the AWS Console
Creating a Path Analysis

I wanted to test the connection from the peered connection to the ElastiCache in the private subnet. I noticed the drop-down for destination (and the source ) did not have individual AWS services listed. That threw me off, as I couldn’t select ElastiCache there, but Ben Bridts pointed out that each of the ElastiCache nodes would have an ENI attached, so providing the ENI as the destination should work. That said, finding the ENI was not the most straightforward option - I had to head to ElastiCache, pick up the URL of a node, do a dig on node DNS to find the IP address, and search for the IP address on the ENI page on the AWS Console (the Elastic Network Interfaces is shown as an EC2 feature, to make you even more confused). I think there might be avenues to simplify this workflow here.

Finding the Elastic Network Interface (ENI)

I created some test paths to check how it works, and the experience was mostly positive.

For the first test, I created a path analysis with the peering connection as the source and the ENI of the ElastiCache Redis as the destination. However, I accidentally selected the wrong peering connection. When the test was completed, it showed a failure, and I was pleasantly surprised by the detailed error message, which outlined why the test failed:

Error message due to lack of peering and no direct path

For the next attempt, I tried to analyze the path between the correct peering connection and the Redis cluster, and the path trace failed - and yet again, the feedback and error message was pretty detailed in indicating where it was going wrong. It showed the two places where the connection was failing:

  • The route table didn’t have the correct entries to route traffic from the peering into the private subnet.
  • The security group rules were incorrectly configured and were not allowing connections from the peering connection to the Redis port.
Error message due to lack of peering and no direct path

Clicking on the details shows the full info about which route table, which security group is preventing the network communication from happening.

Error message due to lack of peering and no direct path

Once I corrected the errors, the reachability analyzer said all was good!

Reachability Analyzer says reachable

I was still seeing the timeout errors from the application container. And therein lies the trouble with the VPC Reachability Analyzer - if you have multiple routes to a destination, the analyzer seems to consider the fastest path and ignores the failing routes. This is shown briefly when you create an analysis, and it mentions using a particular intermediate component filter to analyze the alternate paths. The intermediate components can be load balancers, NAT gateways, and peering connections but not security groups, ACLs, network interfaces or route tables. In this particular case, there doesn’t seem to be a way to show alternate paths. I wish the Reachability Analyzer could use different subnets as an intermediate component filter. This should show the alternate paths where there are multiple routes via different subnets in different availability zones would show all the applicable paths and which ones are failing.

VPC Reachability Analyzer is still a fantastic tool to debug network connectivity issues, and I can see myself using this often. Path Analysis is charged at the rate of $0.10 per path analysis. And the fact that is done without sending any network traffic along the path is pretty awesome. For more details, you can refer to this re:Invent talk.

Sathyajith Bhat
Author
Sathyajith Bhat
Author, AWS Container Hero and DevOps Specialist.

Related

Running Folding@Home on AWS with AWS CDK
··440 words·3 mins
DevOps Aws AWS CDK Folding@Home
Folding@Home(aka FAH) is a distributed computing project. To quote from their website, FAH is a distributed computing project for simulating protein dynamics, including the process of protein folding and the movements of proteins implicated in a variety of diseases. Folding@Home involves you donating your spare computing power by running a small client on your computer. The client then contacts the Folding@Home Work Assignment server, gets some workunits and runs them, You can choose to have it run when only when your system is idle, or have it run all the time.
DevOps Diaries: When Terminator terminates your MySQL imports onto an AWS Instance…
··627 words·3 mins
DevOps Aws
I have begun to use Terminator quite a lot. Terminator’s quite handy when you want to connect to multiple servers on a single terminal thanks to its split pane feature. (And yes I know about tmux & screen - I have screen on my servers, don’t want to get into the headache that is nested screen panes). Few days back we were preparing for a big migration & deploy and I was tasked upon to prepare a failover just in case the migration goes wrong.
How to Remove Subnet Delegations associated with an Azure Virtual Network(VNET) Subnet
··288 words·2 mins
DevOps Azure
Removing Subnet delegations for an Azure VNET that might block deployments.