Connections

A connection is how LAVA handles talking to a device, for example via an automated login session on a serial port on the device or within a virtual machine hosted by a device. Some devices can support multiple connections. LAVA can use more than one connection in a test job, but the first connection is particularly important - this is where initial control of booting will happen, and kernel messages will normally go here.

The most common type of connection that LAVA will use is a serial connection to the test device, but other connection methods are also supported (such as SSH and USB). For a connection method to work, it needs to be supported by software in LAVA, services within the software running on the device and typically lab infrastructure too, e.g. a serial console server.

As an example, many devices are capable of supporting SSH connections as long as:

  • the device can be configured to raise a usable network interface
  • the device is booted into a suitable software environment which will run an ssh server

USB connections for Android support can be implemented inside test shells using the LXC support.

Connections in device configuration

For the connections to a device to be made available for test usage, they need to be declared in the device configuration, e.g.:

deploy:
  methods:
    tftp
    ssh

boot:
  connections:
    - serial
    - ssh
  methods:
    qemu:
  prompts:
    - 'linaro-test'
    - 'root@debian:~#'

Connections in test jobs

Connections are created in a test job using the boot action; it must specify the connection method as a parameter:

- boot:
    method: qemu
    media: tmpfs
    connection: serial
    failure_retry: 2
    prompts:
      - 'root@debian:~#'

Note

Avoid defaults in dispatcher code - although serial is the traditional (and previously default) way of connecting to LAVA devices, it is not assumed and must be explicitly specified in the test job YAML.

The connection created here will be used later by the test action blocks defined in the test job, so they depend on the boot action to define the connection. Even where test jobs may not actually cause a test device to boot per se, LAVA needs the test job to include a boot action for this purpose.

For basic test jobs, this describes all the information that most test writers will need to understand. However, there are several more advanced connection options that may be useful, depending on the type of device and the tests required.

Connections and namespaces

Internally, LAVA uses a namespace structure to track dynamic data inside a test job. One of the pieces of data tracked in a namespace is the connection in use; this is how the connection created in the boot action block can be shared throughout the job in further actions.

In a job definition where multiple deploy, boot and test actions are specified, there must be a mechanism to describe how the actions are connected. This is the primary purpose of a namespace; it is the way to tie related actions together. This is important - consider how an overlay created during a deploy action will be consumed by a test action somewhere down the job definition, for example.

See also

Namespaces.

In a simple job, it is often not necessary to even consider this use of namespaces. If no other namespaces are defined explicitly in a test job, LAVA will create an implicit namespace called common. The default primary serial connection created in the boot action block will be stored in the common namespace, and the test action block(s) will use it from there.

If more than one connection is desired in a test job, then the way to control which connection is used in each action block is by explicitly defining appropriate namespaces. Here’s an example using serial and lxc connections with a Beaglebone device. Look for namespace: inside_lxc and namespace: testdevice in the action blocks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
actions:
- deploy:
    namespace: inside_lxc
    timeout:
      minutes: 5
    to: lxc
    packages:
    - usbutils
    - procps
    - lsb-release
    - util-linux

- deploy:
    namespace: testdevice
    timeout:
      minutes: 3
    to: tftp
    kernel:
      url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/vmlinuz
      type: zimage
    ramdisk:
      url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/initramfs.cpio.gz
      compression: gz
    modules:
      url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/modules.tar.gz
      compression: gz
    dtb:
      url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/dtbs/am335x-bone.dtb

- boot:
    namespace: inside_lxc
    connection: lxc
    prompts:
    - 'root@(.*):/#'
    timeout:
      minutes: 5
    method: lxc

- boot:
    namespace: testdevice
    connection: serial
    method: u-boot
    commands: ramdisk
    prompts:
    # escape the brackets to ensure that the prompt does not match
    # kernel debug lines which may mention initramfs
    - '\(initramfs\)'
    timeout:
      minutes: 3

- test:
    namespace: testdevice
    connection: serial
    timeout:
      minutes: 5
    definitions:
    - repository: git://git.linaro.org/lava-team/lava-functional-tests.git
      from: git
      path: lava-test-shell/smoke-tests-basic.yaml
      name: smoke-tests-bbb

Download or view the complete example: examples/test-jobs/namespace-connections-example1.yaml:

Note

It is not allowed to combine the common namespace with any others - it is special-cased. If you are defining more namespaces in your job, give them clear descriptive names that are unique within that job.

Multiple serial port support

Background

On common test devices where LAVA interacts with the device using a serial connection, there is typically a simple flow to the test job, running through the action blocks defined in that test job:

  • deploy block:
    • Set up job artifacts for the test job
  • boot block:
    • Start the device
    • Connect to the serial port
    • Control the boot loader to boot the desired artifacts
    • Read kernel boot messages - the serial connection is the kernel console
    • Wait for the specified prompt to appear, and log in if needed
  • test block:
    • Assume the device is waiting for input at a shell prompt
    • Send shell commands over the serial connection to run tests
    • Read back the results of those tests from that same serial connection
  • (implicit) job cleanup block:
    • Disconnect from the serial port
    • Shutdown the device

Even in this simple setup, there is a potential problem. The single serial connection is used for output both by the test shell and the kernel. Ideally, this should not cause any problems, but in the real world it is all too common for kernel log messages to be interleaved with test output. This could be something simple and expected (e.g. a test action bringing up a network interface will cause the driver for that interface to announce link state), or it could be something unexpected (e.g. the kernel reporting hardware failure from an unrelated driver, or a malformed network packet). As the kernel messages and the test shell output are interleaved on a character-by-character basis, it often is impossible to parse each. This can cause tests to fail in unexpected ways, or it can cause LAVA to fail to parse test output and so log incomplete or incorrect test results.

Over the years, the LAVA developers have worked to reduce the impact of this problem, but it is fundamentally impossible to solve it while kernel messages and test shell output are sharing the same connection. It is possible to change the logging level of the kernel to reduce the number or frequency of its messages, but often those messages are critical information when debugging a fault so this is not a good solution for everybody.

Isolating kernel messages from test output

The only reliable way to solve the problem with interleaving and corrupted test shell output and kernel messages is to isolate them physically, on different connections.

For some time, LAVA has had support for driving multiple connections to a device independently, using secondary connections. This can be a great solution for many test requirements, but doesn’t solve all problems for all people. It depends on being able to start extra connections via the network (typically SSH or telnet), so on test devices without functional networking support it cannot work. It also requires tests to be written using the MultiNode protocol, which can add considerable complexity to an otherwise simple test job.

As an alternative, many test devices include more than one hardware serial port. Most such devices will use just one of those serial ports for firmware, bootloader and kernel messages (a primary console). Linux will then start a getty process (a serial login program) on that primary console. If more serial ports are available, connecting those ports and configuring the test OS to spawn more getty processes is an easy way to get more connections. These extra connections are all independent from the primary, so it is safe to run test shell commands on these without interleaving test output with kernel message output.

LAVA supports using these multiple serial connections in a simple way, avoiding the need for MultiNode complexity.

See also

Using kernel messages in a test shell for a mitigation of the interleaving problem if multiple UART support is not available.

Changes needed in the test image

In common with the ssh method, the use of multiple serial connections involves some risks because the creation of the getty on the additional serial port(s) is managed by the kernel and rootfs of the test image. When using multiple serial connections, always test that the booted system raises the getty correctly and that the login process works before committing to using this method.

Adding extra serial ports to a test device

There is one hard dependency here: if you want to use multiple serial ports, your device must have them available! This can be a problem. Some embedded devices may have many UARTs available, but not all of them might be exposed for use. If you’re testing x86 devices, many computers may only have one serial port. This is not necessarily an insurmountable problem. Although the firmware, bootloader and kernel may only talk to the primary serial console for boot messages, once you have a Linux kernel up and running it’s normally easy to add extra ports by simply plugging in a USB serial adapter. Configure the test system to start a getty on /dev/ttyUSB0 and connect cables. Don’t get carried away, though - adding more than one USB serial adapter can lead to confusion on the device as to which connection is which! If you have PCI ports, PCI serial cards are also readily available.

Also consider the cost and complexity of extra cabling (and dispatcher-side connections) when adding more serial connections. You may benefit from daughter cards or mezzanine boards to expose multiple serial connections over one cable.

Configuring serial ports

To configure LAVA to connect to one or more serial ports of a device, create a list of connection_commands in the device dictionary. LAVA will use the command tagged with primary to open the connection early in test job startup (in the first boot action) , and will keep this connection open right until the end of the test job.

In earlier versions of LAVA, only a single connection command could be used:

{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set connection_command = 'telnet dispatcher01 7001' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}

This has worked fine when just using a single serial connection but is now deprecated to support working with more than one and other improvements in connection handling. The connection_list is a more flexible way to configure one or more serial ports:

{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}
{% set connection_list = ['uart0'] %}
{% set connection_commands = {'uart0': 'telnet dispatcher01 7001'} %}
{% set connection_tags = {'uart0': ['primary', 'telnet']} %}

primary denotes the serial connection which will be started automatically with each test job.

Other tags describe how LAVA should close the connection at the end of the test job, possible values are telnet, ssh. If your connection command does not use telnet or ssh, the connection will be forcibly closed using kill -9.

Or with two serial connections:

{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}
{% set connection_list = ['uart0', 'uart1'] %}
{% set connection_commands = {'uart0': 'telnet dispatcher01 7001', 'uart1': 'telnet dispatcher01 7002'} %}
{% set connection_tags = {'uart0': ['primary', 'telnet'], 'uart1': ['telnet']} %}

This defines two serial ports (labeled uart0 and uart), then describes how to connect to each one. Finally, it sets a tag of primary on uart0 - this tells LAVA that uart0 is the primary connection, the one used for boot and kernel messages. This makes uart0 exactly equivalent to the single serial connection defined in the previous example. In future, more connection tags might be added with extra meaning.

LAVA now (as of writing in October 2017) supports either of these methods to configure serial ports, but at some point in the future the older connection_command method may be deprecated. The two methods may not be mixed in the same device dictionary - either define a single ``connection_command`` or use the new list of connections.

Using multiple serial ports

For a typical device with multiple serial ports, we can extend our simple job flow above (changes in bold):

  • deploy block:
    • Set up job artifacts for the test job
  • first boot block:
    • Start the device
    • Connect to the primary serial port, creating an explicit namespace for later actions to use it
    • Control the boot loader to boot the desired artifacts
    • Read kernel boot messages - the primary serial connection is the kernel console
    • Wait for the specified prompt to appear, and log in if needed
  • (optional) first test block:
    • Tests to run using the primary serial port, via the namespace created for it. (This is likely to be empty, in which case you can just leave this test block out altogether)
  • second boot block:
    • Start a new connection to a non-primary serial port, and create a new namespace for it
    • Wait for the specified prompt to appear on the non-primary serial port, and log in if needed
  • second test block:
    • Assume the device is waiting for input at a shell prompt
    • Send shell commands over the non-primary serial connection to run tests
    • read back the results of those tests on the non-primary serial connection
    • in the background, listen to the existing serial port connection for kernel messages (feedback)
  • (implicit) job cleanup:
    • disconnect from the serial ports
    • shutdown the device

Note

To use the extra serial port here, the operating system image on the test device will also need to be configured to start a getty on the extra serial port. This can be done in the image as it is prepared, or alternatively it could be started by logging in using the test action on the primary console. That latter approach might seem to be the obvious path, but again beware of serial corruption causing problems. The period during and immediately after boot is when kernel messages are most likely to be intermingled with attempts to control a device on the primary console.

Example job 1: Simple beaglebone-black job with a second serial port

Here’s a simple test job on a common board, a Beaglebone Black. The board only exposes one serial port for easy use, so we’ve added a USB serial adapter as a second port.

Download or view the complete example: examples/test-jobs/bbb-2serial.yaml:

  1. Deploy and boot the device
  2. Create the connection to the second serial port
  3. Tell the test shell to use the new connection

Deploy and boot the device

This is using a simple Debian Stretch nfs rootfs and initramfs. The rootfs is easy to generate using standard tools; the only change in there is to define the second serial console on /dev/ttyUSB0. Remember that a similar change will likely be needed in any test image you want to test this way. Note the explicit namespace bbb defined in the deploy action, and created in the boot action:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
- deploy:
    namespace: bbb
    timeout:
      minutes: 4
    to: tftp
    kernel:
      url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/vmlinuz-4.9.0-4-armmp
      type: zimage
    ramdisk:
      url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/initrd.img-4.9.0-4-armmp
      compression: gz
    modules:
      url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/modules.tar.gz
      compression: gz
    nfsrootfs:
      url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/stretch-armhf-nfs-extra-getty.tar.gz
      compression: gz
    dtb:
      url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/dtbs/am335x-boneblack.dtb

- boot:
    namespace: bbb
    method: u-boot
    commands: nfs
    prompts:
    - 'login:'
    timeout:
      minutes: 2

# startup the extra UART
- boot:

A boot action would typically include an auto_login section, but in this test we’re not going to be doing any testing using the primary serial connection. Hence, we just add a prompts section looking for login: to check when this boot is complete.

Create the connection to the second serial port

Next, we use a second boot action block to create a new connection in a new namespace called isolation. We’re using the new_connection method, using the uart1 connection defined in the device dictionary. As we’re going to be using this new connection for testing, we now run auto_login here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    # support auto-login
    namespace: isolation
    method: new_connection
    connection: uart1
    prompts:
    - 'root@stretch:'
    auto_login:
      login_prompt: 'login:'
      username: root
    timeout:
      # shorter timeout as the device should exist.
      minutes: 2

- test:
    # Use the bbb namespace - we depend on that for the deployed data

Tell the test shell to use the new connection

Finally, we start our tests.

  • The namespace of the test action matches the bbb namespace used in the deploy and boot actions of the device. This ensures that the test shell has access to the dynamic data created by the correct deployment action to be able to know what rootfs is in use, and where to find the test shell files on that rootfs.
  • The test action also has connection-namespace defined to isolation - this tells it to use the connection tracked in the isolation namespace, rather than the default connection in the bbb namespace. This is the key part of the isolation, running tests on the second serial port.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    namespace: bbb
    # Run the test shell using the "isolation" connection
    connection-namespace: isolation
    timeout:
      minutes: 5
    definitions:
    - repository: http://git.linaro.org/lava-team/lava-functional-tests.git
      from: git
      path: lava-test-shell/smoke-tests-basic.yaml
      name: smoke-tests

Download or view the complete example: examples/test-jobs/bbb-2serial.yaml:

Example job 2: A more complicated setup including LXC

Here’s a more complicated example job, including the use of LXC for deployment. This was the first real-world use case for the multiple serial port support, running Linux kernel functional testing on a HiKey 6220. The HiKey 6220 hardware includes an extra serial port, but deploying to the board is more involved - we use fastboot in an LXC container, which means we have another namespace to track in the test job. Let’s unpick the test job.

  1. Define the container
  2. Deploy and boot the container
  3. Use the container to deploy and boot the device
  4. Create the connection to the second serial port
  5. Tell the test shell to use the new connection

Download or view the complete example: examples/test-jobs/multiple-serial-ports-lxc.yaml:

Define the container

The distribution and suite of the container, as well as the name, are defined using the lava-lxc protocol block.

1
2
3
4
5
protocols:
  lava-lxc:
    name: lxc-hikey-oe
    template: debian
    distribution: debian

Deploy and boot the container

The deploy and boot step for the LXC set the timeouts and prompts for this container. Note the name of the namespace used in these actions.

The connection to the LXC is defined within the tlxc namespace and the connection is created in the boot action. In the case of LXC support, this is done by running lxc-attach on the dispatcher instead of a connection command from the device configuration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
actions:
- deploy:
    namespace: tlxc
    timeout:
      minutes: 15
    to: lxc
    packages:
    - fastboot

- boot:
    namespace: tlxc
    prompts:
    - 'root@(.*):/#'
    - 'hikey:/'
    timeout:
      minutes: 5
    method: lxc

Use the container to deploy and boot the device

Next, the dispatcher runs commands inside that LXC container to download and deploy an OE image to a HiKey 6220 board, then boot it. This example uses the hikey-oe namespace. The details of how the HiKey 6220 is deployed and booted are not relevant to how the multiple serial support operates, but do take note of the namespace used to boot the device. The boot operation is responsible for creating the connection (in this case by running a connection command specified in the device configuration).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
- deploy:
    namespace: hikey-oe
    timeout:
      minutes: 30
    to: fastboot
    connection: lxc
    images:
      ptable:
        url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/4.9/83/bootloader/ptable-linux-8g.img
        reboot: hard-reset
      boot:
        url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/linux-mainline/588/boot-0.0+AUTOINC+06e4def583-fb1158a365-r0-hikey-20180128213254-588.uefi.img
        reboot: hard-reset
      system:
        url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/linux-mainline/588/rpb-console-image-hikey-20180128213254-588.rootfs.img.gz
        compression: gz
        apply-overlay: true
        # ensure that this job raises a network interface with DHCP before relying on internet access
    protocols:
      lava-lxc:
      - action: fastboot-deploy
        request: pre-power-command
        timeout:
          minutes: 2

- boot:
    namespace: hikey-oe
    # terminate monitoring of this connection at login prompt
    prompts:
    - 'login:'
    timeout:
      minutes: 5
    method: grub
    commands: installed
    protocols:
      lava-lxc:
      - action: grub-sequence-action
        request: pre-os-command
        timeout:
          minutes: 2

# boot uart0 block

Create the connection to the second serial port

As with making the connection to the LXC and making the connection to the primary UART of the HiKey 6220 DUT, making the connection to the second or additional serial ports involves a boot action. The action must create a new namespace to store the connection to the second serial port. (Any subsequent connections to other serial ports would similarly require a unique namespace for each connection.) This namespace will be used later to isolate a test shell from the primary connection used for the deployment and boot actions of the device.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    # make the connection to the second uart for use in the test shell
    # support auto-login
    namespace: isolation
    connection: uart0
    prompts:
    - 'root@hikey:~#'
    method: new_connection
    auto_login:
      login_prompt: 'login:'
      username: root
    timeout:
      # shorter timeout as the device should exist.
      minutes: 2

# test isolation block

Tell the test shell to use the new connection

This is where it all comes together.

  • The namespace of the test shell matches the namespace of the deployment and boot actions of the device. This ensures that the test shell has access to the dynamic data created by the correct deployment action to be able to know what rootfs is in use and where to find the test shell files on that rootfs.

    In this example, the test shell needs a namespace of hikey-oe

  • The connection-namespace of the same test shell matches the namespace of the boot action of the second serial port. This ensures that the test shell communicates with the DUT over the isolated connection instead of the connection which is stored in the main namespace.

    In this example, the test shell needs a connection-namespace of isolation

1
2
3
4
5
6
7
8
9
    namespace: hikey-oe
    connection-namespace: isolation
    timeout:
      minutes: 5
    definitions:
    - repository: http://git.linaro.org/lava-team/lava-functional-tests.git
      from: git
      path: lava-test-shell/smoke-tests-basic.yaml
      name: smoke-tests-basic-oe

Download or view the complete example: examples/test-jobs/multiple-serial-ports-lxc.yaml:

Limitations with multiple serial ports

The method described here is reasonably simple to configure and use, but it is does have limitations. While LAVA will read from multiple connections (almost) in parallel this way, it will only write to one of them at once. The others will all be read-only. This may well suit your needs, but if not then there is another option - using MultiNode with secondary connections. This is more powerful, but much more complex to describe in a test job.

Secondary Connection

Secondary Connections are a way to have two simultaneous connections to the same physical device, equivalent to two logins. Each connection needs to be supported by a distinct TestJob, so a MultiNode group needs to be created so that the output of each connection can be viewed as the output of a single TestJob, just as if you had two terminals. The second connection does not have to use the same connection method as the current connection and many devices can only support secondary connections over a network interface, for example SSH or telnet.

A Secondary Connection has a deploy step and the device is already providing output over the primary connection (typically serial) before the secondary connection is established. This is closer to having the machine on your desk. The TestJob supplies the kernel and rootfs or image to boot the device and can optionally use the secondary connection to push other files to the device (for example, an ssh secondary connection would use scp).

A Secondary Connection can have control over the daemon via the deployment using the primary connection. The client connection is still made by the dispatcher.

Secondary Connections require authorization to be configured, so the deployment must specify the authorization method. This allows the overlay for this deployment to contain a token (e.g. the ssh public key) which will allow the connection to be made. The token will be added to the overlay tarball alongside the directories containing the test definitions.

- deploy:
    to: tmpfs
    authorize: ssh
    kernel:
      url: http://....
    nfsrootfs:
      url: http://...
    dtb:
      url: http://....

Certain deployment Actions (like SSH) will also copy the token to a particular location (e.g. /root/.ssh/authorized_keys) but test writers can also add a run step which enables authorization for a different user, if the test requires this.

Note

The /root/.ssh/authorized_keys file will be replaced when the LAVA overlay is unpacked, if it exists in the test image already. This is a security precaution (so that test images can be shared easily without allowing unexpected access). Hacking sessions append to this file after the overlay has been unpacked.

Deployment can also include delivering the LAVA overlay files, including the LAVA test shell support scripts and the test definitions specified by the submitter, to the host device to be executed over the secondary connection. So for SSH, the secondary connection typically has a test action defined and uses scp to put the overlay into place before connecting using ssh and executing the tests. The creation of the overlay is part of the deployment, the delivery of the overlay is part of the boot process of the secondary connection, i.e. deploy is passive, boot is active. To support this, use the MultiNode protocol on the host to declare the IP address of the host and communicate that to the guest as part of the guest deployment. Then the guest uses the data to copy the files and make the connection as part of the boot action. See Writing jobs using Secondary Connections.

Note

A failure to connect to a Primary remote connection would be an InfrastructureError Exception. A failure to connect to a Secondary Connection is a TestError Exception.

Considerations with a secondary connection

  1. The number of host devices
  2. Which secondary connections connect to which host device

In LAVA, this is handled using the MultiNode role using the following rules:

  1. All connections declare a host_role which is the role label for the host device for that connection. e.g. if the connection has a declared role of client and declares a host_role of host, then every client connection will be expected to be able to connect to the host device.
  2. The TestJob for each connection with the same role will be started on a single dispatcher which is local to the device with the role matching the specified host_role.
  3. There is no guarantee that a connection will be possible to any other device in the MultiNode group other than devices assigned to a role which matches the host_role requirement of the connection.

Note

The count of any role acting as the host_role must be set to 1. Multiple roles can be defined, each set as a host_role by at least one of the other roles, if more than one device in the MultiNode group needs to host secondary connections in the one submission. Multiple connections can be made to devices of any one host_role.

This allows for devices to be hosted in private networks where only a local dispatcher can access the device, without requiring that all devices are accessible (as root) from all dispatchers as that would require all devices to be publicly accessible.

Secondary connections are affected by Security issues due to the requirements of automation.

The device providing a Secondary Connection is running a TestJob and the deployment will be erased when the job completes.

Note

Avoid confusing host_role with expect_role. host_role is used by the scheduler to ensure that the job assignment operates correctly and does not affect the dispatcher or delayed start support. The two values may often have the same value with secondary connections but do not mean the same thing.

Note

Avoid using constrained resources (like dpkg or apt) from multiple tests (unless you take care with synchronization calls to ensure that each operation happens independently). Check through the test definitions for installation steps or direct calls to apt and change the test definitions.

Connections and hacking sessions

A hacking session using a Secondary Connection is the only situation where the client is configurable by the user and the daemon can be controlled by the test image. It is possible to adjust the hacking session test definitions to use different commands and options - as long as both daemon and client use compatible options. As such, a hacking session user retains security over their private keys at the cost of the loss of automation.

Hacking sessions can be used with secondary connections, depending on the use case.

Using secondary connections with VM groups

One example of the use of a secondary connection is to launch a VM on a device already running a test image. This allows the test writer to control both the kernel on the bare metal and the kernel in the VM as well as having a connection on the host machine and the guest virtual machine.

The implementation of VMGroups created a role for a delayed start MultiNode job. This would allow one job to operate over serial, publish the IP address, start an SSH server and signal the second job that a connection is ready to be established. This may be useful for situations where a debugging shell needs to be opened around a virtualization boundary.

There is an option for downloading or preparing the guest VM image on the host device within a test shell, prior to the VM delayed start. Alternatively, a deploy stage can be used which would copy a downloaded image from the dispatcher to the host device.

Each connection is a different job in a MultiNode group so that the output of each connection is tracked separately and can be monitored separately.

Sequence

  1. The host device is deployed with a test image and booted.
  2. LAVA then manages the download of the files necessary to create the secondary connection.
    • e.g. for QEMU, this would be a bootable image file
  3. LAVA also creates a suitable overlay containing the test definitions to be run inside the virtual machine.
  4. The test image must start whatever servers are required to provide the secondary connections, e.g. ssh. It does not matter whether this is done using install steps in the test definition or pre-existing packages in the test image or manual setup. The server must be configured to allow the (insecure) LAVA automation SSH private key to log in as authorized - this key is available in the /usr/lib/python3/dist-packages/lava_dispatcher/device/dynamic_vm_keys directory when lava-dispatcher is installed or in the lava-dispatcher git tree.
  5. The test image on the host device starts a test definition over the existing (typically serial) connection. At this point, the image file and overlay for the guest VM are available on the host for the host device test definition to inspect, although only the image file should actually be modified.
  6. The test definition includes a signal to the LAVA MultiNode API which allows the VM to start. The signal includes an identifier for which VM to start, if there is more than one.
  7. The second job in the MultiNode group waits until the signal is received from the coordinator. Upon receipt of the signal, the lava dispatch process running the second job will initiate the secondary connection to the host device, e.g. over SSH, using the specified private key. The connection is used to run a set of commands in the test image running on the host device. It is a TestError if any of these commands fail. The last of these commands must hold the connection open for as long as the test writer needs to execute the task inside the VM. Once those tasks are complete, the test definition running in the test image on the host device signals that the VM has completed.

The test writer is given full control over the commands issued inside the test image on the host device, including those commands which are responsible for launching the VM. The test writer is also responsible for making the overlay available inside the VM. This could be by passing arguments to the commands to mount the overlay alongside the VM or by unpacking the overlay inside the VM image before calling QEMU. If set in the job definition, the test writer can ask LAVA to unpack the overlay inside the image file for the VM and this will be done on the host device before the host device boots the test image - however, this will require an extra boot of the host device, e.g. using the dynamic master support.

Basic use cases

Prebuilt files can be downloaded, kernel, ramdisk, dtb, rootfs or complete image. These will be downloaded to the host device and the paths to these files substituted into the commands issued to start the VM, in the same way as with bootloader like u-boot. This provides support for tests within the VM using standard, packaged tools. To simplify these tests further, it is recommended to use NFS for the root filesystem of the host device boot - it leads to a quicker deployment as the files for the VM can be downloaded directly to the NFS share by the dispatcher. Deployments of the host device system to secondary media, e.g. SATA, require additional steps and the job will take longer to get to a point where the VM can be started.

The final launch of the VM will occur using a shell script (which will then be preserved in the results alongside the overlay), containing the parsed commands.

Advanced use cases

It is possible to use a test shell to build files to be used when launching the VM. This allows for a test shell to operate on the host device, building, downloading or compiling whatever files are necessary for the operation of the VM, directly controlled by the test shell.

To avoid confusion and duplication, LAVA does not support downloading some files via the dispatcher and some via the test shell. If there are files needed for the test job which are not to be built or generated within the test shell, the test shell will need to use wget or curl or some other tool present in the test image to obtain the files. This also means that LAVA is not able to verify that such URLs are correct during the validation of the job, so test writers need to be aware that LAVA will not be able to fail a job early if the URL is incorrect as would happen in the basic use case.

Any overlay containing the test definitions and LAVA test scripts which are to be executed inside the VM after the VM has booted still needs to be downloaded from the dispatcher. The URL of this overlay (a single tarball containing all files in a self-contained directory) will be injected into the test shell files on the host device, in a similar way to how the MultiNode API provides dynamic data from other devices in the group.

The test writer is responsible for extracting this tarball so that it is present or is bind mounted into the root directory of the VM so that the scripts can be launched immediately after login.

The test shell needs to create the final shell script, just as the basic use case does. This allows the dispatcher running the VM to connect to the host device and use a common interface to launch the VM in each use case.

LAVA initiates and controls the connection to the VM, using this script, so that all output is tracked in the MultiNode job assigned to the VM.

Sample job definition for the VM job

# second half of a new-style VM group job
# each connection is a different job
# even if only one physical device is actually powered up.
device_type: kvm-arm
job_name: wandboard-qemu
timeouts:
  job:
    minutes: 15
  action:
    minutes: 5
priority: medium
target_group: asd243fdgdfhgf-45645hgf
group_size: 2
parameters:
  # the test definition on the host device manages how
  # the overlay is applied to the VM image.
  overlay: manual  # use automatic for LAVA to do the overlay
# An ID appended to the signal to start this VM to distinguish
# it from any other VMs which may start later or when this one
# completes.
vm_id: gdb_session

actions:

 - boot:
    # as kvm-arm, this happens in a test image via
    # the other half of this MultiNode job
    timeout:
      minutes: 3
    # alternative to u-boot
    connection: ssh
    method: vm
    # any way to launch a vm
    commands:
      # full access to the commands to run on the other device
      - qemu-system-arm -hda {IMAGE}
    type: qemu
    prompts:
      - 'linaro-test'
      - 'root@debian:~#'

 - test:
    name: kvm-basic-singlenode
    timeout:
      minutes: 5
    definitions:
        - repository: git://git.linaro.org/lava-team/lava-functional-tests.git
          from: git
          path: lava-test-shell/smoke-tests-basic.yaml
          name: smoke-tests

Primary remote connection

When a test device does not have support at all for a primary serial connection, there is another, more limited way of using it in LAVA - the Primary Remote Connection. For this to work, the test device must boot automatically and start a remote login daemon (e.g. sshd) with configured authentication. The TestJob for a primary remote connection then skips the deploy stage and uses a simple boot method which just establishes the connection. A device providing a primary remote connection in LAVA only provides access to that connection via a single submitted TestJob at a time. A MultiNode job can make multiple connections, but other jobs will see the device as busy and not be able to start their connections.

Warning

Primary remote connections can raise issues of Persistence - the test writer is solely responsible for deleting any sensitive data copied, prepared or downloaded using a primary remote connection. Do not leave sensitive data for the next TestJob to find. Wherever possible, use primary remote connections with schroot support so that each job is kept within a temporary chroot, thereby also allowing more than one primary (schroot) remote connection on a single machine.

It is not necessarily required that a device offering a primary remote connection is permanently powered on. The only connections being made to the device are done via the scheduler, which ensures that only one TestJob can use any one device at a time. Depending on how long it takes to boot the device, it is feasible to have a device offering primary remote connections which is powered down between jobs.

A Primary Remote Connection is established by the dispatcher, and is therefore constrained in the options which are available to the client requesting the connection, The TestJob has no control over the arguments passed to the connection.

Primary remote connections are affected by Security issues due to the requirements of automation.

Devices supporting Primary Remote Connections

A device offering a primary remote connection needs a particular configuration in the device dictionary table:

  1. Only primary remote connection deployment methods defined in the deploy_methods parameter, e,g, ssh.
  2. Support in the device_type template to replace the list of deployment methods with the list supplied in the deploy_methods parameter.
  3. No serial connection support in the boot connections list.
  4. No methods in the boot parameters.
  5. No Power Commands can be used in the device dictionary.

This prevents other jobs being submitted which would cause the device to be rebooted or have a different deployment prepared. This can be further enhanced with device tag support.

Hacking sessions can also be supported with primary remote connections, depending on the use case.

Warning

Remember that in addition to issues related to the Persistence of a primary remote connection device, hacking sessions on primary remote connections also have all of the issues of a shared access device - do not copy, prepare or download sensitive data when using a shared access device.

SSH as the primary remote connection

Certain devices can support SSH as the primary remote connection - the filesystems on such devices are not erased at the end of a TestJob and provide Persistence for certain tasks. These devices declare this support in the device configuration:

deploy:
  # primary remote connection device has only connections as deployment methods
  methods:
    ssh
boot:
  connections:  # not serial
    - ssh

TestJobs then use SSH as a boot method which simply acts as a login to establish a connection:

- deploy:
    to: ssh
    os: debian

- boot:
    method: ssh
    connection: ssh
    failure_retry: 2
    prompts:
      - 'linaro-test'
      - 'root@debian:~#'

The deploy action in this case simply prepares the LAVA overlay containing the test shell definitions and copies those to a pre-determined location on the device. This location will be removed at the end of the TestJob. The os parameter is specified so that any LAVA overlay scripts are able to pick up the correct shell, package manager and other deployment data items in order to run the lava test shell definitions.

Security

A primary SSH connection from the dispatcher needs to be controlled through the device configuration, allowing the use of a private SSH key which is at least hidden from test writers. (Only protect the essential components).

The key is declared as a path on the dispatcher, so is device-specific. Devices on the same dispatcher can share the same key or may have a unique key - all keys still need to not have any passphrase - as long as all devices supported by the SSH host have the relevant keys configured as authorized for login as root. [1]

[1]Securing such private keys when the admin process is managed in a public VCS is left as an exercise for the admin teams.

LAVA provides a default (completely insecure) private key which can be used for these connections. This key is installed within lava-dispatcher and is readable by anyone inspecting the lava-dispatcher codebase in git. (This has not been changed in the refactoring.)

It is conceivable that a test image could be suitably configured before being submitted to LAVA, with a private key included inside a second job which deploys normally and executes the connection instead of running a test definition. However, anyone with access to the test image would still be able to obtain the private key. Keys generated on a per job basis would still be open for the lifetime of the test job itself, up to the job timeout specified. While this could provide test writers with the ability to control the options and commands used to create the connection, any additional security is minimal and support for this has not been implemented, yet.

See also the Considerations with a secondary connection for information on how access to devices is managed.

Persistence

Devices supporting primary SSH connections have persistent deployments and this has implications, some positive, some negative - depending on your use case.

  1. Fixed OS - the operating system (OS) you get is the OS of the device and this must not be changed or upgraded.
  2. Package interference - if another user installs a conflicting package, your test can fail.
  3. Process interference - another process could restart (or crash) a daemon upon which your test relies, so your test will fail.
  4. Contention - another job could obtain a lock on a constrained resource, e.g. dpkg or apt, causing your test to fail.
  5. Reusable scripts - scripts and utilities your test leaves behind can be reused (or can interfere) with subsequent tests.
  6. Lack of reproducibility - an artifact from a previous test can make it impossible to rely on the results of a subsequent test, leading to wasted effort with false positives and false negatives.
  7. Maintenance - using persistent filesystems in a test action results in the overlay files being left in that filesystem. Depending on the size of the test definition repositories, this could result in an inevitable increase in used storage becoming a problem on the machine hosting the persistent location. Changes made by the test action can also require intermittent maintenance of the persistent location.

Only use persistent deployments when essential and always take great care to avoid interfering with other tests. Users who deliberately or frequently interfere with other tests can have their submit privilege revoked.

See Disposable chroot deployments for a solution to some of these issues but the choice of operating system (and the versions of that OS available) within the chroot is down to the lab admins, not the test writer. The principal way to get full control over the deployment is to use a Secondary Connection.

Disposable chroot deployments

Some devices can support mechanisms like LVM snapshots which allow for a self-contained environment to be unpacked for a single session and then discarded at the end of the session. These deployments do not suffer the same entanglement issues as simple SSH deployments and can provide multiple environments, not just the OS installed on the SSH host system.

This support is similar to how distributions can offer “porter boxes” which allow upstream teams and community developers to debug platform issues in a native environment. It also allows tests to be run on a different operating system or different release of an operating system. Unlike distribution “porter boxes”, however, LAVA does not allow more than one TestJob to have access to any one device at the same time.

A device supporting disposable chroots will typically follow the configuration of Devices supporting Primary Remote Connections. The device will show as busy whenever a job is active, but although it is possible to use a secondary connection as well, the deployment methods of the device would have to disallow access to the media upon which the chroots are installed or deployed or upon which the software to manage the chroots is installed. e.g. a device offering disposable chroots on SATA could offer ramdisk or NFS tests.

LAVA support for disposable chroots is implemented via schroot (forming the replacement for the dummy-schroot device in the old dispatcher).

Typical device configuration:

deploy:
  # list of deployment methods which this device supports
  methods:
    ssh:
    schroot:
      - unstable
      - trusty
      - jessie
boot:
  connections:
    - ssh

Optional device configuration allowing secondary connections:

deploy:
  # list of deployment methods which this device supports
  methods:
    tftp:
    ssh:
    schroot:
      - unstable
      - trusty
      - jessie
boot:
  connections:
    - serial
    - ssh

The test job YAML would simply specify:

- deploy:
    to: ssh
    chroot: unstable
    os: debian

- boot:
    method: ssh
    connection: ssh
    failure_retry: 2
    prompts:
      - 'linaro-test'
      - 'root@debian:~#'

Note

The OS still needs to be specified, LAVA does not guess based on the chroot name. There is nothing to stop an schroot being named testing but actually being upgraded or replaced with something else.

The deployment of an schroot involves unpacking the schroot into a logical volume with LVM. It is an InfrastructureError Exception if this step fails, for example if the volume group has insufficient available space.

schroot also supports directories and tarballs but LVM is recommended as it avoids problems of Persistence. See the schroot man page for more information on schroot. A common way to create an schroot is to use tools packaged with sbuild or you can use debootstrap.