Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot unmarshal array into Go value of type hcsshim.HNSNetwork #26982

Closed
fabiendibot opened this issue Sep 28, 2016 · 47 comments
Closed

cannot unmarshal array into Go value of type hcsshim.HNSNetwork #26982

fabiendibot opened this issue Sep 28, 2016 · 47 comments
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/windows

Comments

@fabiendibot
Copy link

fabiendibot commented Sep 28, 2016

Hello,

It seems there is an issue starting docker deamon 1.13-dev on the new realeses Windows Server 2016. It's an error related to network (it seems). After this error, i can pull images from repository, but i can't start any containers, seems like a timeout without any end.

Regards

ping @jhowardmsft

Here are the logs

PS C:\temp\docker\docker> .\dockerd.exe
time="2016-09-28T15:46:30.070585300+02:00" level=info msg="Windows default isolation mode: process"
time="2016-09-28T15:46:30.104587700+02:00" level=info msg="[graphdriver] using prior storage driver: windowsfilter"
time="2016-09-28T15:46:30.122586900+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2016-09-28T15:46:30.122586900+02:00" level=info msg="Loading containers: start."
......time="2016-09-28T15:46:30.178588600+02:00" level=error msg="Resolver Setup/Start failed for container none, "json
: cannot unmarshal array into Go value of type hcsshim.HNSNetwork"" time="2016-09-28T15:46:30.243593400+02:00" level=info msg="Loading containers: done."
time="2016-09-28T15:46:30.244592400+02:00" level=info msg="Daemon has completed initialization"
time="2016-09-28T15:46:30.244592400+02:00" level=info msg="Docker daemon" commit=784b601 graphdriver=windowsfilter version=1.13.0-dev time="2016-09-28T15:46:30.254595000+02:00" level=info msg="API listen on //./pipe/docker_engine"

@cpuguy83 cpuguy83 added platform/windows kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/runtime labels Sep 28, 2016
@lowenna
Copy link
Member

lowenna commented Sep 28, 2016

@msabansal

@msabansal
Copy link
Contributor

@fabiendibot Is it Windows RTM with GA package (9D). You might be getting stuck at endpoint creation. The 1.13-dev branch requires 9D package. There was a parsing issue in Server 2016 RTM that was fixed in 9D.

@eugeneagafonov
Copy link

Also happens to me in Windows 10 Anniversary Update.
Here is the output from Get-NestedVirtStatus powershell script which includes system info. Somehow it says that the host doesn't support nested virtualization (which it should).

PS C:\Projects\NestedVirtualization> Invoke-WebRequest https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/hyperv-tools/Nested/Get-NestedVirtStatus.ps1 -OutFile Get-NestedVirtStatus.ps1
PS C:\Projects\NestedVirtualization> .\Get-NestedVirtStatus.ps1
Getting system information...done.
Getting build information...done.
Validating host information...done.

The virtualization host DELLSKYLAKE supports nested virtualization: NO
The following host configuration errors have been detected:
Virtualization Based Security is running


Computer                     : DELLSKYLAKE
Manufacturer                 : Dell Inc.
Model                        : XPS 15 9550
ProccessorManufacturer       : GenuineIntel
Product Name                 : Windows 10 Pro
Installation Type            : Client
Edition ID                   : Professional
Build Lab                    : 14393.187.amd64fre.rs1_release_inmarket.160906-1818
HypervisorRunning            : True
FullHyperVRole               : True
HostNestedSupport            : False
HypervisorLoadOptionsPresent : False
HypervisorLoadOptionsValue   :
IumInstalled                 : False
VbsRunning                   : True
VbsRegEnabled                : False
BuildSupported               : True
VbsPresent                   : False

@msabansal
Copy link
Contributor

The server patch 9D has a fix. There is a client patch (9D') being released Thursday which also contains the fix. I will validate when the patch comes out.

Just FYI this was the commit that causes this issue: d3139fc

@eugeneagafonov
Copy link

Thank you! Sorry for disturbing, could you please expand a little on what a patch 9D' is? Do I get it as a newer version of docker, or maybe it comes from windows update?

@fabiendibot
Copy link
Author

Hi @msabansal same question. What is this 9D package, i have the RTM GA Windows 2016 Server version. Do i need to install some patche ?

@msabansal
Copy link
Contributor

@fabiendibot @eugeneagafonov 9D is a mandatory update for Windows Server 2016. 9D' is the corresponding client update. 9D update was released on Tuesday and 9D' would be available today on the client.

@pbering
Copy link

pbering commented Sep 29, 2016

How and when can we update to this?

@eugeneagafonov
Copy link

I confirm that the issue has been resolved with the latest windows update. Thank you!
If your machine is inside corporate network, windows updates can be delivered with delay - they need to get to the in-corp windows update server. Windows 10 update is already there, I've just installed it

@fabiendibot
Copy link
Author

Still not working after updating Windows Server 2016 with the latest patches.
i've tried with a full new VM without any containers, Netnat or anything... same error.

@msabansal
Copy link
Contributor

@fabiendibot Can you please provide the version of hostnetsvc.dll in system32 directory. On my patched system it is 10.0.14393.206

@fabiendibot
Copy link
Author

@msabansal i've got the same file version as you.

capture du 2016-10-01 04-55-39

@msabansal
Copy link
Contributor

@fabiendibot Are you getting the same error where you are stuck at endpoint creation or is it a different error?

@richardgavel
Copy link

I think there is some confusion here as there is an issue with Windows 10 that requires a Windows Update. But it seems like the issue with Windows Server 2016 is something that requires a Docker update, not an OS update, based on the patch @msabansal mentions.

Is the 9D patch integrated into the CS-1.12 Docker mentioned in https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/quick_start_windows_server or should a different URL be used to pull Docker?

@msabansal
Copy link
Contributor

The 9D patch is a windows Server 2016 patch which you will receive using the normal update channel. It requires latest docker (currently master) to enable service discovery but should not cause any issues with previous docker versions. Just that Service Discovery will not be available with previous docker versions.

@lowenna
Copy link
Member

lowenna commented Oct 3, 2016

Actually service discovery SHOULD be in the CS version of docker. As well as latest master.

Whether server or client though, you still need the 9D update regardless to run containers on Windows.

Two separate things, but for service discovery you need both updates.

@richardgavel
Copy link

richardgavel commented Oct 4, 2016

I have gone thru the entire OS install + updates + Docker install based on the MSDN site to no effect. I also have the same version of the HostNetSvc.dll mentioned above. The Docker engine being installed is 1.12.2-cs2-ws-beta-rc1. I have also tried using the 1.13.0-dev Docker engine with the same result.

@msabansal
Copy link
Contributor

@richardgavel Can you please give details of the error you see. Are you getting stuck at endpoint creation? I will also start with a fresh install and see if I get this error. The error:

time="2016-09-30T11:05:13.855535200-07:00" level=debug msg="Launching DNS server for network%!(EXTRA string=none)"
time="2016-09-30T11:05:13.862010300-07:00" level=error msg="Resolver Setup/Start failed for container none, "json: cannot unmarshal array into Go value of type hcsshim.HNSNetwork""

Is expected because none is a null network which is not there. I would submit a patch sometime to hide this error.

@msabansal
Copy link
Contributor

@richardgavel I am really confused. I just installed Windows Server 2016 (it was a VHD from an internal share) and am able to launch containers seamlessly.

PS C:\Users\Administrator> docker run -td --name=c2 nanoserver cmd
b819df1672f26e3f9c4bb3622efa2952cedabc418681745dc9e74eb03bc9d019
PS C:\Users\Administrator> docker run -td --name=c1 nanoserver cmd
feb68a1a8fbaf55c5d8cbafe25de9a5a36ddfd7d93ef9b36946ae94b0904b05b
PS C:\Users\Administrator> docker exec c1 ping c2.

Pinging c2 [172.27.63.247] with 32 bytes of data:
Reply from 172.27.63.247: bytes=32 time<1ms TTL=128
Reply from 172.27.63.247: bytes=32 time<1ms TTL=128
Reply from 172.27.63.247: bytes=32 time<1ms TTL=128
PS C:\Users\Administrator> docker version
Client:
Version: 1.13.0-dev
API version: 1.25
Go version: go1.7.1
Git commit: 694ba71
Built: Tue Oct 4 14:52:33 2016
OS/Arch: windows/amd64

Server:
Version: 1.13.0-dev
API version: 1.25
Go version: go1.7.1
Git commit: 694ba71
Built: Tue Oct 4 14:52:33 2016
OS/Arch: windows/amd64

I am now trying the steps mentioned in the blog post at https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/quick_start_windows_server aswell

@richardgavel
Copy link

I made an assumption, an incorrect one, as it turns out. In my original attempt, I tried to launch a container, saw the error in the dockerd log, googled the error to come here and saw a solution. However, since the error still appeared after the updates, I assumed that I had not solved the issue with container launch and did not actually attempt it. I have been able to successfully launch a container now.

@msabansal
Copy link
Contributor

Thanks for the confirmation.

@fabiendibot
Copy link
Author

fabiendibot commented Oct 6, 2016

@msabansal sorry for the delay, i still have the error during pipeline creation. And new errors after a docker run

Client error:

PS C:\Users\Administrateur> .\docker.exe run -p 80:80 microsoft/iis
C:\Users\Administrateur\docker.exe: Error response from daemon: failed to create endpoint grave_bartik on network nat: H
NS failed with error : L’objet existe déjà..

Deamon error

time="2016-10-06T10:34:59.759777600+02:00" level=error msg="Handler for POST /v1.25/containers/8a85a5cfd8a50079da7d4d61d
9fcea244fd9b0661077c87586f5696279f52952/start returned error: failed to create endpoint grave_bartik on network nat: HNS
failed with error : L’objet existe déjà. "

@msabansal
Copy link
Contributor

@fabiendibot This is probably because there is another container which has a port mapping of using port 80. Can you please try fixing it. Try running without the port mapping or a different one and see if you get the error.

@happysysadm
Copy link

happysysadm commented Oct 18, 2016

Hi,
I encounter the very same problem as Fabien. I have installed patch 9D and rebooted. The docker service doesn't start.
For info, I am using windows 2016 rtm with Hyper-V and I have tried three docker versions: 1.12, 1.12wsbeta and 1.13.
Any idea?

Here's the error:

dockerd
time="2016-10-18T02:21:35.312426300+02:00" level=info msg="Windows default isolation mode: process"
time="2016-10-18T02:21:35.339425700+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2016-10-18T02:21:35.339425700+02:00" level=info msg="Loading containers: start."
time="2016-10-18T02:21:35.393424100+02:00" level=error msg="Resolver Setup/Start failed for container none, \"json: cann ot unmarshal array into Go value of type hcsshim.HNSNetwork\""
Error starting daemon: Error initializing network controller: Error creating default network: HNS failed with error : Unspecified error

@fabiendibot
Copy link
Author

fabiendibot commented Oct 18, 2016

@msabansal sorry for the delay. In fact i launched this container two times. First aptemt resulted a timeout and i got this error as a second.
I'll try to fireup an Azure VM to try outside my VMWare lab.

Edit: Ok seems it's working on a Windows 2016 freshly installed on Azure after patched with updates. Using this link: https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/quick_start_windows_server

Ok, seems it works using this procedure. I'd like to know what there is in the docker package, because using the *.exe from master.dockerproject.com and installeing container feature isn't enough :)

@happysysadm
Copy link

@msabansal it looks like I have a different issue from the one opened here, since it is hte first container I am starting on a vanilla 2016. In the same time the message is exactly the same: "json: cann ot unmarshal array into Go value of type hcsshim.HNSNetwork" So do you want me to open another case? Thanks

@richardgavel
Copy link

Even though that message displays in log, containers will start OK,
assuming you have run Windows update.

On Wed, Oct 19, 2016, 3:33 AM happysysadm notifications@github.com wrote:

@msabansal https://github.com/msabansal it looks like I have a
different issue from the one opened here, since it is hte first container I
am starting on a vanilla 2016. In the same time the message is exactly the
same: "json: cann ot unmarshal array into Go value of type
hcsshim.HNSNetwork" So do you want me to open another case? Thanks


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#26982 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADsqD0yX0CUhcqiR14_o1Z_ozQHCekasks5q1dXvgaJpZM4KI4Ly
.

@sasiezilmadhavan
Copy link

I have a Win 2016 GA running in azure from Oct 15th. Until yesterday i was able to bring up my application cluster (4 services with docker-compose) and able to ping the linked containers using their service name from other containers. Also ran Resolve-DnsName "servicename" within the container and able to successfully resolve it.

But now i am running the same docker-compose.yml in the same server, where containers are created successfully, but unable to ping or resolve the container names from other containers. Please let me know if this behaviour is something related to the recent patch (9D) you are talking about! What is the KBXXX of the windows update..?

Please let me know if this should go into a new issue!

docker-version : Server Version: 1.12.2-cs2-ws-beta
windows updates in server:
Hotfix(s): 3 Hotfix(s) Installed.
[01]: KB3176936
[02]: KB3192137
[03]: KB3194798

Thank you!

@happysysadm
Copy link

@richardgavel no, containers don't start. Updates have been installed.
@sasiezilmadhavan it's KB3194798 (check http://www.happysysadm.com/2016/10/first-steps-with-microsoft-containers_18.html at the end of post)

@sasiezilmadhavan
Copy link

thanks @happysysadm

9D update is there. Just updated the docker engine to v1.13 dev. Still the problem is there. All of the sudden DNS Resolve feature stopped working in the containers. To confirm created new 2016 GA server in azure and DNS Resolve not working in it either.

@msabansal
Copy link
Contributor

msabansal commented Oct 19, 2016

@sasiezilmadhavan @happysysadm Can you please help with the error you are facing with name resolution? Does using service name. work? . Is important. Also please share ipconfiguration of one of the containers. ipconfig /all

@msabansal
Copy link
Contributor

@happysysadm What is the error for container startup? Can you please post what docker run shows?

@sasiezilmadhavan
Copy link

sasiezilmadhavan commented Oct 19, 2016

@msabansal, no using servicename doesn't work. unable to ping . Also Resolve-DnsName doesn't resolves.

ipconfig/all from the container :
Windows IP Configuration
Host Name . . . . . . . . . . . . : 4e58e32aa5c2
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : vm-windkr-demo.h7.internal.cloudapp.net

Ethernet adapter vEthernet (Temp Nic Name) 3:

Connection-specific DNS Suffix . : vm-windkr-demo.h7.internal.cloudapp.net
Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #4
Physical Address. . . . . . . . . : 00-15-5D-D9-B8-3D
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::6403:312f:c670:d7d%27(Preferred)
IPv4 Address. . . . . . . . . . . : 172.28.63.75(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.240.0
Default Gateway . . . . . . . . . : 172.28.48.1
DNS Servers . . . . . . . . . . . : 172.28.48.1
168.63.129.16
NetBIOS over Tcpip. . . . . . . . : Disabled

the docker-compose.yml that worked before and now doesn't resolve DNS inside container is

services:
kpiweb:
image: 10.5.0.5:5000/kpiwebui:v1
ports:
- "8081:80"
links:
- kpicoreservice
kpicoreservice:
image: 10.5.0.5:5000/kpicoreservice:v1
networks:
default:
external:
name: nat

FYI, the links section in docker inspect 4e58e32aa5c2 has appropriate entries
"Networks": {
"nat": {
"IPAMConfig": null,
"Links": [
"default_kpicoreservice_1:default_kpicoreservice_1",
"default_kpicoreservice_1:kpicoreservice",
"default_kpicoreservice_1:kpicoreservice_1"
],
"Aliases": [
"4e58e32aa5c2",
"kpiweb"
],

@msabansal
Copy link
Contributor

@sasiezilmadhavan I have got some really high priority work right now. Sorry. I am looking at this whenever I am getting time.
Right now I am updating a machine to the latest patches. Will ping soon if I have questions.

@msabansal
Copy link
Contributor

@sasiezilmadhavan Is it possible that your containers are terminating? What does docker ps say? Is kpicoreservice active?

I tried with this compose file on a latest patched machine:

version: '2'
services:

kpiweb:
    image: nanoserver
    ports:
        - "8081:80"

    links:
        - kpicoreservice
    tty: true

kpicoreservice:
    image: nanoserver
    tty: true

networks:
default:
external:
name: nat

PS C:\Tools\ctest3> docker-compose up -d
Recreating ctest3_kpicoreservice_1
Recreating ctest3_kpiweb_1
PS C:\Tools\ctest3> docker exec ctest3_kpiweb_1 ping kpicoreservice

Pinging kpicoreservice [172.24.10.10] with 32 bytes of data:
Reply from 172.24.10.10: bytes=32 time<1ms TTL=128
Reply from 172.24.10.10: bytes=32 time<1ms TTL=128
Reply from 172.24.10.10: bytes=32 time<1ms TTL=128

@sasiezilmadhavan
Copy link

@msabansal No the containers are not terminating. All the containers including kpicoreservice are running. I can able to ping the kpicoreservice with its IP.

Even i tried your compose file, unable to resolve the names.

version: '2'
services:

kpiweb:
image: nanoserver
ports:
- "8081:80"

links:
    - kpicoreservice
tty: true

kpicoreservice:
image: nanoserver
tty: true

PS D:\> docker ps
CONTAINER ID        IMAGE                  COMMAND                    CREATED             STATUS              PORTS                  NAMES
c0ef06cfdc8e        microsoft/nanoserver   "c:\\windows\\system..."   4 minutes ago       Up 4 minutes        0.0.0.0:8081->80/tcp   default_kpiweb_1
a5aa5d9036a6        microsoft/nanoserver   "c:\\windows\\system..."   4 minutes ago       Up 4 minutes                               default_kpicoreservice_1
PS D:\> docker exec c0e ping kpicoreservice
Ping request could not find host kpicoreservice. Please check the name and try again.
PS D:\>

@msabansal
Copy link
Contributor

@sasiezilmadhavan Can you please post the output for following:

$c1=docker run -id --name=c1 nanoserver cmd
$c2=docker run -id --name=c2 nanoserver cmd
docker exec c1 ping c2.
docker exec c1 ipconfig /all

@sasiezilmadhavan
Copy link

@msabansal, its working now. Your above lines of script, as well as the previously used compose file is working now. I can ping one container from other with its name . Strange!! Note there is no change made in the system. This machine is in azure. Wonder if that is anyway causing this inconsistencies.

Thanks a lot for your help

@msabansal
Copy link
Contributor

@sasiezilmadhavan Maybe related #27499 Workaround is documented. We are planning to fix this so that you don't need the workaround.

@thaJeztah
Copy link
Member

should we close this?

@happysysadm
Copy link

Actually I lost the whole environment I was running this upon: my OS ssd disk got broken (kind of voltage issue which makes it whistle, you should hear that!) and so I can't repro the problem for now...

At the same time I don't have this error on my azure VMs. So close it and I'll re-open in case.

@thaJeztah
Copy link
Member

ouch! Good luck with that @happysysadm

Let me go ahead and close, but happy to reopen if anyone is running into this again

@clarity99
Copy link

hm, I seem to have the same error when starting docker service: Resolver Setup/Start failed for container none, "json: cannot unmarshal array into Go value of type hcsshim.HNSNetwork"

Server 2016 with all patches installed, hostnetsvc.dll version 10.0.14393.351

This server is an upgrade from 2012r2 server and is running as a VM in Hyper-V on Windows 10.

docker version:
Version: 1.12.2-cs2-ws-beta
API version: 1.25
Go version: go1.7.1
Git commit: 050b611

@msabansal
Copy link
Contributor

@clarity99 That error is an incorrect log we need to cleanup in docker daemon. It should not cause any issues.

@clarity99
Copy link

@msabansal Hmm, well the dockerd is not starting. here are the last lines from debug output:

time="2016-11-01T12:34:48.890755600+01:00" level=error msg="Resolver Setup/Start failed for container none, \"json: cannot unmarshal array into Go value of type hcsshim.HNSNetwork\""
time="2016-11-01T12:34:48.901737200+01:00" level=debug msg="Allocating IPv4 pools for network nat (c9baf78a40c8ae87a03ef07942ca9a6c58d244823487bae7c81ba9fd22dcab0b)"
time="2016-11-01T12:34:48.901737200+01:00" level=debug msg="RequestPool(LocalDefault, , , map[], false)"
time="2016-11-01T12:34:48.902737200+01:00" level=debug msg="RequestAddress(0.0.0.0/0, <nil>, map[RequestAddressType:com.docker.network.gateway])"
time="2016-11-01T12:34:48.903736800+01:00" level=debug msg="HNSNetwork Request ={\"Name\":\"nat\",\"Type\":\"nat\",\"Subnets\":[{\"AddressPrefix\":\"0.0.0.0/0\",\"GatewayAddress\":\"0.0.0.0\"}]} Address Space=[{0.0.0.0/0 0.0.0.0 []}]"
time="2016-11-01T12:34:50.429431400+01:00" level=debug msg="releasing IPv4 pools from network nat (c9baf78a40c8ae87a03ef07942ca9a6c58d244823487bae7c81ba9fd22dcab0b)"
time="2016-11-01T12:34:50.429431400+01:00" level=debug msg="ReleaseAddress(0.0.0.0/0, 0.0.0.0)"
time="2016-11-01T12:34:50.431454300+01:00" level=debug msg="ReleasePool(0.0.0.0/0)"
time="2016-11-01T12:34:50.431454300+01:00" level=debug msg="starting clean shutdown of all containers..."
Error starting daemon: Error initializing network controller: Error creating default network: HNS failed with error : Unspecified error

If you think this is another issue I can open a separate issue for it.

@msabansal
Copy link
Contributor

@clarity99 Separate issue for sure. But might be something that is easily resolvable. How did you start facing this issue? Did you deploy a new build and start getting this error? Did you try anything to fix this issue?

@clarity99
Copy link

@msabansal I've opened #27984 for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/windows
Projects
None yet
Development

No branches or pull requests