Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ENA Driver Not Enabled On Default AMI #1558

Closed
tcf909 opened this issue Jan 20, 2017 · 29 comments
Closed

AWS ENA Driver Not Enabled On Default AMI #1558

tcf909 opened this issue Jan 20, 2017 · 29 comments
Labels
area/image blocks-next lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@tcf909
Copy link

tcf909 commented Jan 20, 2017

Hello,

I noticed that the ENA (Enhanced Networking Adapter) isn't enabled by default in the AMIs (1.4) that kops uses by default:

root@ip-172-21-35-87:~# cat /etc/debian_version
8.6
root@ip-172-21-35-87:~# ethtool -i eth0
driver: vif
$ kops version
Version 1.5.0-alpha3 (git-51b7644)

AMI Version: k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-12-05 (ami-03fdf814)

References:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html
https://wiki.debian.org/Cloud/AmazonEC2Image/Jessie

@justinsb
Copy link
Member

Which instance size was this on? Did it have enhanced networking available?

@tcf909
Copy link
Author

tcf909 commented Jan 20, 2017 via email

@justinsb justinsb modified the milestone: 1.5.0 Jan 20, 2017
@justinsb
Copy link
Member

justinsb commented Jan 20, 2017

I see now - this is a separate driver from the ixgbevf driver - my mistake.

We'll have to add a module to the base image:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html

And we'll also have to enable EnaSupport on the base image

@justinsb
Copy link
Member

It's not entirely relevant, but I confirmed that the ixgbevf driver is installed and enabled on e.g. a c4.large:

> ethtool -i eth0
driver: ixgbevf
version: 2.12.1-k
firmware-version: 
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

@tcf909
Copy link
Author

tcf909 commented Jan 20, 2017 via email

@zytek
Copy link
Contributor

zytek commented Jan 31, 2017

AWS recommends ixgbevf > 2.14 for stability and performance.
ENA driver is needed on those beefy 20Gbit/s instances (R4 family, m4.16xlarge). This driver is not available on stock Ubuntu images and has to be installed manually.

Does kops/kubernetes provide any 'official' AMIs ? always thought that it utilises 'bare' images.

edit: ok, I see we do, so I guess this should be added (bumped ixgbevf and ENA driver)

@ottoyiu
Copy link
Contributor

ottoyiu commented Feb 3, 2017

Does anyone know if the ixgbevf 2.12.1-k in Debian 8.6 k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09 (ami-aaf84aca) is affected by the stability issues, of TCP timeouts and just random packet corruption?

edit: the reason I ask is I understand that ixgbevf (2.12.1-k) is an out of tree version and the version number does not dictate what patches for ixgbevf were actually added to the kernel from the out of tree one.
I'm getting many of these as well:

[22117.455919] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[22117.489707] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

@justinsb
Copy link
Member

justinsb commented Feb 3, 2017

On ixgbevf:

On the "k8s AMIs" (which are debian jessie with a 4.4. kernel), we're running the ixgbevf driver from the linux kernel, not the out of tree version. The versioning numbering appears to not correspond directly. We switched to this as part of the move to the 4.4 kernel; with the jessie kernel we were seeing kernel panics, particularly on m4 instances (with the AWS-recommended driver): kubernetes/kubernetes#30706

I compared the 2.12.1 driver from sourceforge with the 2.14.2 driver (I could not find the upstream version control):

  • 2.14.2 introduced ixgbevf_check_tx_hang, added here: torvalds/linux@e08400b, and in kernel >= 4.0
  • 2.14.2 introduced ixgbevf_set_ivar, added in the initial commit of the driver into the kernel torvalds/linux@92915f7 . Note that a version somewhere in between 2.12.1 and 2.14.2 was labeled in the kernel as 1.0.0-k0. This suggests that the -k scheme is not comparable to the non-k scheme.
  • 2.14.2 introduced an errata check, in the kernel in torvalds/linux@8bae1b2 .

(there are more differences, but these seemed a reasonable sample of non-trivial changes)

@Jkirsher I see you do a lot of the work on the ixgbevf driver in the kernel... Is it reasonable to run the ixgbevf driver from the 4.4 LTS kernel on AWS? Any guidance is greatly appreciated!

@Jkirsher
Copy link

Jkirsher commented Feb 3, 2017 via email

@justinsb
Copy link
Member

justinsb commented Feb 5, 2017

Thanks @Jkirsher so much for the guidance on the ixgbevf driver :-)

@erulabs
Copy link

erulabs commented Apr 27, 2017

FWIW The linux-aws package in Ubuntu 16.04 is a huge performance win for a number of reasons, as well as providing the ENA/ixgbevf drivers out of the box. Other than that package nothing is required save marking the image as "SR-IOV" ready - Perhaps kube should install this when detecting it's installing on Ubuntu on AWS? I've seen pretty dramatic wins on the networking and IO layer using this kernel, and I believe there are fixes for T* instances as well, which is fairly common in the kube ecosystem. Additionally, it works everywhere - even on legacy servers that only support the old xen vif driver!

https://insights.ubuntu.com/2017/04/05/ubuntu-on-aws-gets-serious-performance-boost-with-aws-tuned-kernel/

Also for reference, packer sets the two flags together when enhanced_networking is enabled: https://github.com/hashicorp/packer/blob/81522dced0b25084a824e79efda02483b12dc7cd/builder/amazon/instance/step_register_ami.go#L32-L40

@zapman449
Copy link

Another +1 for this.

Running kops 1.5.3, kubernetes 1.5.5 on r4.xlarge shows the vif driver in play:

# ethtool -i eth0
driver: vif
# cat /etc/debian_version
8.8

But I concur that a c4.2xlarge shows:

$ sudo ethtool -i eth0
driver: ixgbevf
$ cat /etc/debian_version
8.7

(The latter cluster will be updated later this week).

@jnicholls
Copy link

jnicholls commented Jun 2, 2017

I have an m4.large instance that supports enhanced networking and thus should run the ixgbevf driver. However, it's running the vif driver. Can we get kops to set all of this up for us in the k8s debian AMI? Otherwise I'll probably just move over to Ubuntu 16.04.

@jmasonISP
Copy link

From my limited experience the ENA driver is a must.

I had a gRPC service that was experiencing poor throughput on a Kops created cluster and I narrowed it down to the fact that the default Debian image kops uses did not have the ENA installed. After making my own AMI from the kops default Debian image + ENA I have seen a ~7x improvement in throughput on i3.xlarge nodes (single node throughput increased from 1.2Gbps -> 8.03Gbps).

Cluster Setup:
Node Size: i3.xlarge
Topology: private
Networking: weave

@bcorijn
Copy link
Contributor

bcorijn commented Aug 17, 2017

After seeing this mentioned on HN I double-checked my own cluster, and sure enough my R4.XL machines running the kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 AMI are not running with ENA enabled.

$ cat /etc/debian_version
8.9
$ sudo modinfo ena
modinfo: ERROR: Module ena not found.
$ sudo ethtool -i eth0
driver: vif
version:
firmware-version:

What was your process to build a custom image @jmasonISP? The official Debian image claims this should already be supported, so wondering what is the missing piece here.

@suneeta-mall
Copy link

We are on k8s 1.7.0 (with kops) and using amis

ami-b2137ea4 k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02 us-east-1
ami-800803e3 k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02 ap-southeast-2

Driver seems to be vif and not ixgbevf:

driver: vif
version: 
firmware-version: 
bus-info: vif-0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
admin@ip-172-20-74-151:/sbi

@jmasonISP
Copy link

@bcorijn I found a k8s-1.7-debian-jessie AMI that I spun up on an EC2 instance in my k8s VPC. I then followed this guide to install and enable ENA on it. Once installed I made an AMI from that instance which is what I'm using now for my kops created nodes.

@chrislovecnm
Copy link
Contributor

Related #3868

@mv78
Copy link

mv78 commented Jan 17, 2018

We are running Kops 1.8 with k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-06a57e7e) where it says that ENA is enabled. However, when i run check on that instance it returns simple network:
aws ec2 describe-instance-attribute --instance-id 123 --attribute sriovNetSupport
{
"InstanceId": "123",
"SriovNetSupport": {
"Value": "simple"
}
}

I also ssh into the instance and ran lsmod | grep ixgbevf to verify that the needed module for ENA is installed and it is not there?!!!

@ottoyiu
Copy link
Contributor

ottoyiu commented Jan 18, 2018

@mv78 have you tried the debian stretch image?

@mv78
Copy link

mv78 commented Jan 18, 2018

I have not, but looking at source code https://github.com/kubernetes/kube-deploy/blob/master/imagebuilder/templates/1.8-stretch.yml, I dont see "ixgbevf" module included either.

@bcorijn
Copy link
Contributor

bcorijn commented Jan 25, 2018

@mv78 that is because the base image already has it in-kernel, so there is no need to install it on top.
Your SriovNetSupport looks fine to me, if ENA is not supported that property will be empty, while a value of simple means that enhanced networking is enabled. (see documentation)

It all depends on which instance type you are using, there's different ways to do ENA as described here. If you use one of the newer types, you need to check for the ENA driver, not the ixgbevf one.
What instance type are you using?

@veksler
Copy link

veksler commented Jan 25, 2018

i was using m5 .

@chrislovecnm
Copy link
Contributor

@veksler you probably where, but you need to be using the stretch ami with m5's

@justinsb justinsb modified the milestones: 1.8.0, 1.9 Feb 21, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2018
@justinsb justinsb modified the milestones: 1.9.0, 1.10 May 26, 2018
@bcorijn
Copy link
Contributor

bcorijn commented May 30, 2018

I am not sure if this one should still be open? As far as I am aware the current kops default AMI's have both types of ENA supported.
The C5/M5 instances still have issues with EBS support but ENA should not be blocking.

@mv78
Copy link

mv78 commented May 30, 2018 via email

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2018
@geojaz
Copy link
Member

geojaz commented Jun 29, 2018

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/image blocks-next lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests