AWS Parallelcluster External Destinations

Contents

When unrestricted outbound internet access is not allowed …

I’m an infrastructure, HPC and cloud nerd with a focus on scientific computing in the life sciences.

I’m a fanboy of the amazing AWS Parallelcluster stack. It’s a force multiplier for scientists and researchers.

For me as well it is amazing, saves me insane amounts of time and the developers are crazy active - every new parallelcluster release seems to solve a specific personal request or pain point that I’ve encountered when using it in biotech and biopharmaceutical environments.

However, 99% of my AWS cloud work is performed in AWS environments created, owned and operated by others.

Sometimes I have to work in highly secured VPCs and private subnets where outbound Internet access is blocked by default and secured/monitored by firewalls.

This can suck because a lot of “scientific computing” tooling and even culture is built around a long-held assumption that unrestricted Internet access is something that everyone will have, always and forever.

And when you have to operate in a “default deny” computing environment, you are gonna have a bad time unless you can tell your InfoSec and Firewall Folk the list of external hostnames and destinations you need to communicate with.

Get to the point

The purpose of this post is to document my work to discover all external destinations that AWS Parallelcluster requires. This includes:

  • Installing parallelcluster client on an EC2 host
  • Installing the LTS version of NodeJS that parallelcluster requires
  • Building a custom Parallelcluster AMI using the v3.x image-build pipeline

External Sites That Parallelcluster v3.0.5 Communicates with

Note: We are excluding ALL AWS destinations as the goal here is to understand and document only the external communication requirements of AWS Parallelcluster

Installing ‘aws-parallellcluster’ via Python pip3 client requires access to these sites

1
2
pypi.org:443
files.pythonhosted.org:443

Installing NodeJS Parallelcluster Dependency requires access to these sites

1
2
nodejs.org:443
raw.githubusercontent.com:443

Parallelcluster 3.0.5 Custom AMI Build External Destinations

Note: This was not easy info to generate as the pcluster-v3 custom AMI build process does not support a custom Proxy configuration. So there is no easy or “authoritative” way to document external access via parsing a nice clean squid proxy log file.

After spending tons of time trying to inject proxy configuration files and env VARS into the EC2 ImageBuilder pipleine that pcluster-v3 uses I gave up in defeat. What I ended up doing was downloading the full imagebuilder Cloudwatch LogGroup log stream and then parsing 36,111 lines of output to tease out any and all non-AWS URLs I could find.

This list may not be 100% accurate as some URLs in the log output are referenced in warning messages or comments but I’m trying to be as inclusive as possible as I really only want to go to my InfoSec/Firewall team *once to make a firewall change request form!

Other notes:

  • The Fedora destination is due to a download of EPEL repo information
  • The Ubuntu destinations are because my cluster OS selection was Ubuntu 20.0 LTS
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# HTTP Destinations
security.ubuntu.com:80
supervisord.org:80
us-east-1.ec2.archive.ubuntu.com:80
wiki.debian.org:80
# HTTPS Destinations
bugs.centos.org:443
bugs.launchpad.net:443
cloudinit.readthedocs.io:443
d1uj6qtbmh3dt5.cloudfront.net:443
dev.mysql.com:443
developer.download.nvidia.com:443
download.fedoraproject.org:443
ftp.gnu.org:443
git.launchpad.net:443
github.com:443
help.ubuntu.com:443
mirrors.fedoraproject.org:443
us.download.nvidia.com:443
www.chef.io:443
www.python.org:443
www.ubuntu.com:443

The above list of destinations was itself parsed from this list of as many “unique” destinations as I could parse out of the 30,000+ lines from the LogGroup stream data. The list itself is interesting. I left some AWS specific destinations in this list.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
http://security.ubuntu.com/ubuntu
http://supervisord.org
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://wiki.debian.org/SecuringNFS
https://bugs.launchpad.net/ubuntu/
https://bugs.launchpad.net/ubuntu/+source/grub2-signed/+bug/1936857
https://cloudinit.readthedocs.io/en/latest/topics/cli.html#clean
https://d1uj6qtbmh3dt5.cloudfront.net/2022.1/Servers/nice-dcv-2022.1-13300-ubuntu2004-x86_64.tgz
https://d1uj6qtbmh3dt5.cloudfront.net/NICE-GPG-KEY
https://dev.mysql.com/get/mysql-apt-config_0.8.23-1_all.deb
https://developer.download.nvidia._domain_/compute/cuda/repos/ubuntu2004/x86_64
https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disk-performance.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/verify-CloudWatch-Agent-Package-Signature.html
https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html
https://docs.chef.io/deprecations_unified_mode/
https://download.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-20
https://download.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-8
https://efa-installer.amazonaws.com/aws-efa-installer-1.21.0.tar.gz
https://fsx-lustre-client-repo-public-keys.s3.amazonaws.com/fsx-ubuntu-public-key.asc
https://fsx-lustre-client-repo.s3.amazonaws.com/ubuntu
https://ftp.gnu.org/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
https://git.launchpad.net/ubuntu/+source/supervisor/tree/debian/supervisor.service
https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v11.6.tar.gz
https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.3.tar.gz
https://github.com/SchedMD/slurm/archive/slurm-22-05-8-1.tar.gz
https://github.com/aws/efs-utils/archive/v1.34.1.tar.gz
https://github.com/benmcollins/libjwt/archive/refs/tags/v1.12.0.tar.gz
https://github.com/dun/munge/archive/munge-0.5.14.tar.gz
https://github.com/openpmix/openpmix/releases/download/v3.2.3/pmix-3.2.3.tar.gz
https://github.com/pyenv/pyenv-virtualenv
https://github.com/pyenv/pyenv.git
https://github.com/pypa/pip/issues/8559
https://help.ubuntu.com/
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-debug-$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-source-$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=testing-modular-debug-epel$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=testing-modular-epel$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=testing-modular-source-epel$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-debug-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-next-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-next-debug-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-next-source-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-source-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-testing-next-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=epel-testing-next-debug-20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=testing-debug-epel20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=testing-epel20&arch=$basearch
https://mirrors.fedoraproject.org/mirrorlist?repo=testing-source-epel20&arch=$basearch
https://s3.amazonaws.com/amazoncloudwatch-agent/assets/amazon-cloudwatch-agent.gpg
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/cinc/cinc-install-1.1.0.sh
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/cinc/ubuntu/20.04/cinc_17.2.29-1_amd64.deb
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/cinc/ubuntu/20.04/cinc_17.2.29-1_amd64.deb.sha256
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/impi/qt-src-5.15.2-linux.src.tgz
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/mysql
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/mysql/el/7/x86_64/mysql-community-client-8.0.31-1.tar.gz
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/source/mysql-8.0.31.tar.gz
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/source/mysql-8.0.31.tar.gz
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives/stunnel/stunnel-5.67.tar.gz
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/parallelcluster/3.5.0/cookbooks/aws-parallelcluster-cookbook-3.5.0.tgz
https://us.download.nvidia.com/tesla/470.141.03/NVIDIA-Linux-x86_64-470.141.03.run
https://www.chef.io/patents
https://www.python.org/ftp/python/3.7.16/Python-3.7.16.tar.xz
https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tar.xz
https://www.ubuntu.com/
https://www.ubuntu.com/legal/terms-and-policies/privacy-policy

The Real Firewall Approve List I Intend To Submit

The list I will send to my client Firewall/Infosec team will be slightly different then the data shown above, because of the following:

  • I need to deploy in many AWS global regions so I want to wildcard out the obviously region-specific hostnames
  • I am concerned that the Cloudfront distribution hostname may change over time or per-region
  • I had a custom build failure in the past involving the firewall blocking a Ruby gem install so I manually added rubygems.org to the list below out of caution

For that reason I’m adding some wildcards and making the destination lists a bit more open in hopes that I can deploy and perform custom AMI builds in any AWS global region. The list I will submit to the Firewall team will be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# HTTP Destinations
security.ubuntu.com:80
supervisord.org:80
*.archive.ubuntu.com:80
wiki.debian.org:80
# HTTPS Destinations
pypi.org:443
files.pythonhosted.org:443
nodejs.org:443
raw.githubusercontent.com:443
rubygems.org:443
bugs.centos.org:443
bugs.launchpad.net:443
cloudinit.readthedocs.io:443
*.cloudfront.net:443
dev.mysql.com:443
developer.download.nvidia.com:443
download.fedoraproject.org:443
ftp.gnu.org:443
git.launchpad.net:443
github.com:443
help.ubuntu.com:443
mirrors.fedoraproject.org:443
us.download.nvidia.com:443
*.download.nvidia.com:443
www.chef.io:443
www.python.org:443
www.ubuntu.com:443

Ongoing Work (Watch for updates )

  • This list was generated using t3.large ec2 instances
  • I plan to repeat the work using g5.2xlarge nodes with Nvidia a10g GPUs to see if destinations change
  • In particular launching a GPU enabled head node with DCV enabled may trigger additional external connections

If you want to reproduce this work …

Monitoring Parallelcluster & NodeJS Installation

This was the easy part because parallelcluster itself supports HTTP_PROXY functions. So all that was necessary was a test cluster-config.yaml file that looked like this:

Squid HTTP/HTTPS Proxy

The squid package was installed on an EC2 server and the configuration for squid was minimally edited with these core settings:

1
2
3
4
5
6
7
8
9
# Allow requests for any destination
http_access allow all

# Squid normally listens to port 3128
http_port 3128

# Lets do a super minimal log format that is easy to parse
logformat minimal %tl %>a %Ss/%03>Hs %rm %ru
access_log /var/log/squid/minimal-access.log minimal

parallelcluster.conf file

A generic cluster config file was created with the PROXY host information included in it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Region: us-east-1
Image:
  Os: ubuntu2004
HeadNode:
  InstanceType: t3.large
  Networking:
    SubnetId: subnet-0ca8504005553e75d
    Proxy:
      HttpProxyAddress: http://172.31.93.45:3128
  Ssh:
    KeyName: pcluster-destination-testing
  Dcv:
    Enabled: true
Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: t3-q
    ComputeResources:
    - Name: t3large
      Instances:
      - InstanceType: t3.large
      MinCount: 1
      MaxCount: 1
    Networking:
      SubnetIds:
      - subnet-0ca8504005553e75d
      Proxy:
        HttpProxyAddress: http://172.31.93.45:3128
Monitoring:
  DetailedMonitoring: false
  Logs:
    CloudWatch:
      Enabled: true
      RetentionInDays: 14
      DeletionPolicy: Delete
  Dashboards:
    CloudWatch:
      Enabled: true
Tags:
  - Key: 'Platform'
    Value: 'HPC'
  - Key: 'Project'
    Value: 'ScientificComputing'
  - Key: 'Purpose'
    Value: 'Deploy through squid proxy to log external destinations for firewall rule tweaking'

Monitoring Custom AMI Building

I utterly failed at injecting HTTP_PROXY information into the ImageBuilder pipeline so I gave up and just decided to parse the cloudwatch log stream.

This was my image builder pipeline config file, I injected a bad Componant script that exited with a non-zero status to force the image builder node to stay online if I wanted to go in and look at things directly. I ended up not needing to login to the build node and just used the cloudwatch log stream data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Build:
  InstanceType: t3.2xlarge
  ParentImage: ami-09cd747c78a9add63
  UpdateOsPackages:
    Enabled: true
  SubnetId: subnet-0ca8504005553e75d
  SecurityGroupIds:
    - sg-052bbacd50005e417
  Iam:
    InstanceRole: arn:aws:iam::290725103381:role/scientificComputing-default-Ec2-Role
  Components:
    - Type: script
      Value: s3://pcluster-dest-logging-project/bad-script.sh

AWS does not seem to have native methods for downloading ALL of a giant cloudwatch log stream so I used a python package called “awslogs”.

Install awslogs python package

1
pip3 install awslogs

Then we can use that tool to download the full image builder log stream:

1
awslogs get /aws/imagebuilder/ParallelClusterImage-u20lts-image -s2d > full-log-output.txt

The full log stream output is massive and looks like this before parsing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Setup: Currently validating document arn:aws:imagebuilder:us-east-1:290725103381:component/parallelclusterimage-updateos-33619580-c7fd-11ed-a90e-0e5824720a3f/3.5.0/1
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Setup: Currently validating document arn:aws:imagebuilder:us-east-1:290725103381:component/parallelclusterimage-33619580-c7fd-11ed-a90e-0e5824720a3f/3.5.0/1
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Setup: Currently validating document arn:aws:imagebuilder:us-east-1:290725103381:component/parallelclusterimage-tag-33619580-c7fd-11ed-a90e-0e5824720a3f/3.5.0/1
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Setup: Currently validating document arn:aws:imagebuilder:us-east-1:290725103381:component/parallelclusterimage-script-0-33619580-c7fd-11ed-a90e-0e5824720a3f/3.5.0/1
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Executor: STARTED EXECUTION OF ALL DOCUMENTS
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Document arn:aws:imagebuilder:us-east-1:290725103381:component/parallelclusterimage-updateos-33619580-c7fd-11ed-a90e-0e5824720a3f/3.5.0/1
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Phase build
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Step OperatingSystemRelease
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    ExecuteBash: STARTED EXECUTION
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Stdout: ubuntu.20.04
/aws/imagebuilder/ParallelClusterImage-u20lts-image 3.5.0/1    Stderr: FILE=/etc/os-release
...
...

So I ran a cheesy sed command …

1
 sed -ne 's/.*\(http[^"]*\).*/\1/p' < full-log-output.txt | > extracted-urls.txt

That left an “extracted-urls.txt” file that looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
...
https://www.ubuntu.com/
https://help.ubuntu.com/
https://bugs.launchpad.net/ubuntu/
https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
http2-14
httplib2
https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/archives
https://ftp.gnu.org/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
https://github.com/SchedMD/slurm/archive/slurm-22-05-8-1.tar.gz
https://github.com/openpmix/openpmix/releases/download/v3.2.3/pmix-3.2.3.tar.gz
https://github.com/dun/munge/archive/munge-0.5.14.tar.gz
https://github.com/benmcollins/libjwt/archive/refs/tags/v1.12.0.tar.gz
https://us.download.nvidia.com/tesla/470.141.03/NVIDIA-Linux-x86_64-470.141.03.run
...

So then ‘awk’ was used to pull just the first column

1
awk '{print $1}' extracted-urls.txt > single-column-urls.txt

That got us a list of just our URLs but with a lot of repetitive entries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
...
http://security.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
http://us-east-1.ec2.archive.ubuntu.com/ubuntu
...
...

So the final step was to use the “uniq” command to distill a unique list of URLs:

1
 sort single-column-urls.txt| uniq > uniq-urls.txt

The “unique” list looked like this and was manually cleaned up to generate the destination lists shown above

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
...
https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v11.6.tar.gz
https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.3.tar.gz
https://github.com/SchedMD/slurm/archive/slurm-22-05-8-1.tar.gz
https://github.com/aws/efs-utils/archive/v1.34.1.tar.gz
https://github.com/benmcollins/libjwt/archive/refs/tags/v1.12.0.tar.gz
https://github.com/dun/munge/archive/munge-0.5.14.tar.gz
https://github.com/openpmix/openpmix/releases/download/v3.2.3/pmix-3.2.3.tar.gz
https://github.com/pyenv/pyenv-virtualenv
https://github.com/pyenv/pyenv.git
https://github.com/pypa/pip/issues/8559
https://help.ubuntu.com/
https://localhost:${ext_auth_port}
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-debug-$releasever&arch=$basearch&infra=$infra&content=$contentdir
https://mirrors.fedoraproject.org/metalink?repo=epel-modular-source-$releasever&arch=$basearch&infra=$infra&content=$contentdir
...