Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference documentation for FBOSS #61

Open
ggiamarchi opened this issue Nov 21, 2017 · 20 comments
Open

Reference documentation for FBOSS #61

ggiamarchi opened this issue Nov 21, 2017 · 20 comments

Comments

@ggiamarchi
Copy link

Hi There, Can I find somewhere a reference documentation for FBOSS. I need to test FBOSS on Wedge 16x (40Gbps) switchs. Primarily, I'd love to find a reference documentation for fboss_wedge_agent configuration JSON input file.

@capveg
Copy link

capveg commented Nov 22, 2017

Hi - two points:

  1. We did a bunch of work with ONL to make sure that a lot of the base line setup work was done for you, so check out http://opennetlinux.org/wedge .

  2. A lot of the stuff is poorly documented and honestly more complicated then it needs to be, so a bunch of us are trying to both better document things and clean them us as we go. Ultimately this is a work in progress and I'll leave this issue open as we improve.

Do check out what comes with the wedge ONL build and let me know if there's more specific things I can describe.

The JSON configuration file is particularly not well documented: it contains some implicit dependencies and some of the fields are now vestigial. Was there a specific change you were trying to make? Perhaps I can help you with that while the larger refactoring/documentation is happening.

FYI: @aeckert because you had a recent diff to clean up some of the config terms.

@mimizone
Copy link

Following on @ggiamarchi question, I am also in need for a bit more documentation on the JSON file. One particular need is to be able to configure the 40G ports of the Wedge 40 as either 40G or 4x10G. It seems out of the box everything is 4x10G. Strangely enough I see only 63 ports instead of the expected 64, that's another debate...

@capveg
Copy link

capveg commented Jan 29, 2018

Hi both.

This is definitely a big issue with the current config. There is no good documentation for it as of this moment but there is an effort underway to clean up the config as well as better document it.

For context, we use the config internally as the output of another script, so there hasn't been any effort made to make it easy for humans to consume :-( I've intentionally left this issue open so that I can post documentation updates here as things get better.

That said, I can try to answer specific questions if you have them. Right now, you need to specify your break out cables (e.g., 4x10g or 1x40g) in the configuration at start time.

@capveg
Copy link

capveg commented Jan 29, 2018

Also, FYI @sonoble

@mimizone
Copy link

mimizone commented Jan 30, 2018

thanks @capveg for keeping the thread alive.
I am ok so far editing the json file "manually".

I would need more details to actually understand what you mean by configuring the breakout cables at start time. I've tried unsuccessfully to set the speed to 40000 as I found in the thrift code switch_config.thrift.
I've seen there is a concept of aggregate_ports in the code. Trying to understand code I don't understand (thrift and cpp...), I wonder if I should add a section a bit like the following in my config.json file. (the following doesn't work)

  "aggregate_port": [
    {
      "key": 1,
      "name": "port1",
      "description": "description 1",
      "memberPorts": [
        {
          "memberPortID": 53,
          "priority": 1,
          "rate": 1,
          "activity": 1
        },
        {
          "memberPortID": 54,
          "priority": 1,
          "rate": 1,
          "activity": 1
        },
        {
          "memberPortID": 55,
          "priority": 1,
          "rate": 1,
          "activity": 1
        },
        {
          "memberPortID": 56,
          "priority": 1,
          "rate": 1,
          "activity": 1
        }
      ]
    }
  ],

And btw, do you also know why list_ports returns only 63 ports on a wedge40? Is that a bug?
When I enable port 64 with fboss_route.py, it doesn't complain but still the port is not listed when calling list_ports afterwards.

@capveg
Copy link

capveg commented Aug 29, 2018

Sorry - missed this comment.

Aggregate ports are different (I know.. it's confusing.. but that's networking) than breakout cables.

So, are you trying to get one 40G port to show up as 1x40G or 4x10G?

FYI: additional documentation and motivation can be found in our new Sigcomm paper - https://dl.acm.org/authorize?N666958

@mimizone
Copy link

yes.
trying to have the QSFP ports behave as either 4x10G (to the compute nodes) or 1x40G (to compute or spines)

@capveg
Copy link

capveg commented Aug 30, 2018

So, honestly right now - this is a bit of a mess :-(

A few things that make this complicated.

  1. there is a broadcom config, typically named config.bcm, which configures the chip for each port config or even dynamic ("flexports"). This config needs to be in sync with the agent config as the agent config doesn't actually control the mapping, but must be aware of it.

  2. Depending on the version of opennsl you're using, the config.bcm is an explicit file (e.g., with newer versions) or implicitly built into the binary (with older versions, e.g,. 8e0b499f02dcef751a3703c9a18600901374b28a - which fboss uses by default). If it's implicit in the binary, you have to change it differently.

  3. Every four logical ports can combine to be one 40g port, e.g., port 1 can either be a 10G port with ports 2-4 also being 10G) or ports 2-4 can effectively "go away" and then port 1 becomes 40G. The ports that can do this fixed, e.g., only ports where the number (mod 4) = 1 can be combined into a 40G port.

@mimizone : to your question about "why only 63 ports"; there are 16 front panel ports and each one can be broken out 4 ways, so we get 64. I'm assuming there's a zero indexed port so your '63' is really '64', but hopefully that helps. Now, a separate question is : if the chip is capable of 32x40G ports, why are there only 16 ports used on the front panel? Answer: the initial wedge40 was designed to replace a 16 port x 40G switch and this was (I'm told) the easiest way to create a drop in replacement.

@bluecmd
Copy link

bluecmd commented Oct 7, 2018

@mimizone If you need help getting what @capveg said above running, have a look at https://github.com/dhtech/fboss/pull/4/files where I managed to pull in OpenNSL 3.5.0.1, and see this for how to run it.
I have not yet hit the opennsl_pkt_alloc crash mentioned in getdeps.sh but I guess I'll cross that bridge when I come to it.

Personally what we're looking at is running 1G SFP module in one of the port using a QSFP+->SFP+ adapter, so that's my current goal. But the 40G is of course also interesting. I raised a question https://github.com/Broadcom-Switch/OpenNSL/issues/37 and I've found some BCM configs laying around here and there for ideas, and also the official docs that are somewhat useful. I have not succeeded in setting anything viable however.

@cubic1271
Copy link

@bluecmd thanks for the link (and the pointers), that's incredibly helpful.

@capveg is this the right issue to be reporting issues related to the packages / binaries for fboss and the Wedge 100 on ONL, or should those be reported elsewhere? Since they're related to the hardware that seems built for FBOSS, seems like it could go either way.

@pjd-nu
Copy link

pjd-nu commented Nov 29, 2020

Hate to revive a dead conversation, but...
"implicitly built into the binary" ... "If it's implicit in the binary, you have to change it differently."
Any suggestions on what that "change it differently" is? I've got an old Wedge-16x that I'm trying to resurrect - should I just load a new image on it, or can I easily configure it to run the ports in 40g mode?
Thanks!

@robs-zeynet
Copy link

@pjd-nu Thanks for reviving this as a lot of the SDK internals have changed - particularly Broadcom has open sourced its SDK and so we've moved the fboss dependency from OpenNSL to directly depending on the open source'd SDK: https://github.com/Broadcom-Network-Switching-Software/OpenBCM .

Once you follow the new build instructions, then it should be just a matter of setting up an agent.conf file that configures the ports as 40G (as opposed to 4x10G ports in a breakout).

The format of the agent.conf file is specified in the switch_config.thrift file here: https://github.com/facebook/fboss/blob/master/fboss/agent/switch_config.thrift and is in JSON. Here are some example configs but they have not been kept up to date and may not work perfectly out of the box.

https://github.com/facebook/fboss/tree/master/fboss/agent/configs

Hope this helps.

@shri-khare
Copy link
Contributor

One correction: FBOSS does not use OpenBCM. FBOSS uses OpenNSA: https://github.com/facebook/fboss/blob/master/build/fbcode_builder/manifests/OpenNSA#L5

@shri-khare
Copy link
Contributor

@pjd-nu one way to approach this would be, to follow instructions here: https://github.com/facebook/fboss/blob/master/installer/centos-7-x86_64/README.md

and get to a point of being able to run Tests. Running FBOSS agent is superset of that, and bulk of the work towards running Agent would be done when it is possible to run the Tests.

@pjd-nu
Copy link

pjd-nu commented Dec 15, 2020

[sorry for the delay - there was a big paper submission deadline last week...]

So I just want to double check something in the directions shri-khare pointed me to - I should just install a normal copy of Centos 7 on the box as if it were some random generic server? No Open Network Linux or anything?

I was a bit surprised because all the stuff I'd found so far talked about ONL as sort of a fundamental part, and I guess subconsciously I was thinking it was something more than yet another small linux distribution. And I guess it's just a server with a few PCIe devices that we want to control...

@robs-zeynet
Copy link

robs-zeynet commented Dec 15, 2020 via email

@pjd-nu
Copy link

pjd-nu commented Dec 17, 2020

Yup, OSDI.

So now I'm in the middle of build hell, trying to figure out why folly can't find libatomic. I'll file a bug on installer/centos-7-x86_64/README.md - there are a couple of dependencies (including I think devtoolset-8-gcc, since folly requires gcc 5.0) that aren't mentioned.

@shri-khare
Copy link
Contributor

Did you follow the README steps in that order?
1.4 install-tools installs the compiler and 1.5 sets the right path.

@pjd-nu
Copy link

pjd-nu commented Dec 17, 2020

weird - not everything got installed the first time. (but it must have run through to the end, because devtoolset-8-libasan-devel was there...)

Now I'm getting:
gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)

Thanks!

@pjd-nu
Copy link

pjd-nu commented Dec 17, 2020

All set now. Thanks!

BTW, for some reason at the very beginning of the build script it consistently got a 403 trying to download openNSA from Broadcom, but it downloaded fine manually with curl and I copied it to the indicated name in the download directory.

arajeev-ARISTA pushed a commit to arajeev-ARISTA/fboss that referenced this issue Sep 26, 2023
Meru800bia: initial sensor_service and data_corral_service support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants