Skip to main content
  1. Posts/

Notes on moving Amazon Linux AMI to Amazon Linux 2

··435 words·3 mins·
DevOps Packer Aws AMI

Last year around this time, I worked on moving the base AMIs from Amazon Linux to Amazon Linux 2. The Amazon Linux 2 images are baked with a security-hardened base image that is provided by another team. To this hardened image, we tweak some Linux kernel parameters to improve the performance and reliability. Some the changes include increasing the conntrack limits, reserve some ports on boot (by using the net.ipv4.ip_local_reserved_ports directive) so that an ephemeral port doesn’t use up a port that is needed by a service when it starts (yep, we’ve run into that), disk-encryption using LUKS and dm-crypt.

The migration took longer than expected. It should have been painless but Amazon broke some stuff across the distros. In one instance, Amazon Linux had a symlink from /dev/ephemeral0 to the actual nvme ephemeral device. In Amazon Linux 2 this was no longer present, causing instance boot and consequently Chef(which provisions the instances) to fail. 20 hours and 19 revisions later, I got the AMI to work. Some of the major changes I had noted:

  • boto was no longer available, causing the chef boostrap to fail (reason: boto is used to fetch some scripts from S3)
  • /dev/ephemeral0 symlink no longer exists, this was a major issue, causing filesystem mounts to fail. I had to add nofail to /etc/fstab for the filesystem mount to not fail. Without this, the instance wouldn’t provision and fail early in cloud-init. This lead to EC2 instance checks to fail.
  • /etc/crypttab had to be updated to include plain, at the start of line, indicating the filesystem isn’t yet encrypted. This was another odd bug that took many hours of obscure searching to find, only to my dismay that this was reported 5+ years back but went unanswered because apparently nobody ever used it as much?
  • More cleanups on growpart & unwanted mounts/device names (though this isn’t a breaking issue or Amazon Linux 2 fault)
  • dnsmasq seems to be listening only on lo by default. This was causing DNS timeouts, making our tests take as long as 2+ hours to run. Worse was that the interface=lo was hidden around lines of commented config, only spotted due to VS Code’s syntax highlighter when I was preparing and collecting data for opening a support ticket with AWS.
  • A bunch of chkconfigs replaced by systemctl enable
  • Bunch of service start replaced by systemctl start - both these due to Amazon Linux 2 bringing in systemd

My knowledge of how cloud-init, disk encryption using custom keys was pretty poor till point, picking up this task made me understand cloud-init, dm-crypt, crypttab a lot better. Thanks Amazon!

Sathyajith Bhat
Author
Sathyajith Bhat
Author, AWS Container Hero and DevOps Specialist.

Related

Checking for connectivity of ElastiCache Redis instances across peering connections using VPC Reachability Analyzer
··751 words·4 mins
DevOps Aws VPC
The VPC Reachability Analyzer tool is a network diagnostic tool from AWS. In this post I look at how effective the VPC Reachability Analyzer tool to diagnose a network connectivity issue from a VPC peering connection to a private subnet.
Running Folding@Home on AWS with AWS CDK
··440 words·3 mins
DevOps Aws AWS CDK Folding@Home
Folding@Home(aka FAH) is a distributed computing project. To quote from their website, FAH is a distributed computing project for simulating protein dynamics, including the process of protein folding and the movements of proteins implicated in a variety of diseases. Folding@Home involves you donating your spare computing power by running a small client on your computer. The client then contacts the Folding@Home Work Assignment server, gets some workunits and runs them, You can choose to have it run when only when your system is idle, or have it run all the time.
DevOps Diaries: When Terminator terminates your MySQL imports onto an AWS Instance…
··627 words·3 mins
DevOps Aws
I have begun to use Terminator quite a lot. Terminator’s quite handy when you want to connect to multiple servers on a single terminal thanks to its split pane feature. (And yes I know about tmux & screen - I have screen on my servers, don’t want to get into the headache that is nested screen panes). Few days back we were preparing for a big migration & deploy and I was tasked upon to prepare a failover just in case the migration goes wrong.