Connections
A connection
is how LAVA handles talking to a device, for example
via an automated login session on a serial port on the device or
within a virtual machine hosted by a device. Some devices can support
multiple connections. LAVA can use more than one connection in a test
job, but the first connection is particularly important - this is
where initial control of booting will happen, and kernel messages will
normally go here.
The most common type of connection that LAVA will use is a serial
connection to the test device, but other connection methods are also
supported (such as SSH and USB). For a connection method to work, it
needs to be supported by software in LAVA, services within the
software running on the device and typically lab infrastructure too,
e.g. a serial console server.
As an example, many devices are capable of supporting SSH connections
as long as:
- the device can be configured to raise a usable network interface
- the device is booted into a suitable software environment which will
run an ssh server
USB connections for Android support can be implemented inside test
shells using the LXC support.
Connections in device configuration
For the connections to a device to be made available for test usage,
they need to be declared in the device configuration, e.g.:
deploy:
methods:
tftp
ssh
boot:
connections:
- serial
- ssh
methods:
qemu:
prompts:
- 'linaro-test'
- 'root@debian:~#'
Connections in test jobs
Connections are created in a test job using the boot
action; it
must specify the connection method
as a parameter:
- boot:
method: qemu
media: tmpfs
connection: serial
failure_retry: 2
prompts:
- 'root@debian:~#'
Note
Avoid defaults in dispatcher code - although serial
is the traditional
(and previously default) way of connecting to LAVA devices,
it is not assumed and must be explicitly specified in
the test job YAML.
The connection created here will be used later by the test
action
blocks defined in the test job, so they depend on the boot
action
to define the connection. Even where test jobs may not actually cause
a test device to boot per se, LAVA needs the test job to include a
boot
action for this purpose.
For basic test jobs, this describes all the information that most test
writers will need to understand. However, there are several more
advanced connection options that may be useful, depending on the type
of device and the tests required.
Connections and namespaces
Internally, LAVA uses a namespace
structure to track dynamic data
inside a test job. One of the pieces of data tracked in a namespace is
the connection in use; this is how the connection created in the
boot
action block can be shared throughout the job in further
actions.
In a job definition where multiple deploy, boot and test actions are
specified, there must be a mechanism to describe how the actions are
connected. This is the primary purpose of a namespace; it is the way
to tie related actions together. This is important - consider how an
overlay created during a deploy action will be consumed by a test
action somewhere down the job definition, for example.
In a simple job, it is often not necessary to even consider this use
of namespaces. If no other namespaces are defined explicitly in a test
job, LAVA will create an implicit namespace called common
. The
default primary serial connection created in the boot
action block
will be stored in the common
namespace, and the test
action
block(s) will use it from there.
If more than one connection is desired in a test job, then the way to
control which connection is used in each action block is by explicitly
defining appropriate namespaces. Here’s an example using serial
and lxc
connections with a Beaglebone device. Look for
namespace: inside_lxc
and namespace: testdevice
in the action
blocks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 | actions:
- deploy:
namespace: inside_lxc
timeout:
minutes: 5
to: lxc
packages:
- usbutils
- procps
- lsb-release
- util-linux
- deploy:
namespace: testdevice
timeout:
minutes: 3
to: tftp
kernel:
url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/vmlinuz
type: zimage
ramdisk:
url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/initramfs.cpio.gz
compression: gz
modules:
url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/modules.tar.gz
compression: gz
dtb:
url: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/standard/debian/jessie/armhf/4/dtbs/am335x-bone.dtb
- boot:
namespace: inside_lxc
connection: lxc
prompts:
- 'root@(.*):/#'
timeout:
minutes: 5
method: lxc
- boot:
namespace: testdevice
connection: serial
method: u-boot
commands: ramdisk
prompts:
# escape the brackets to ensure that the prompt does not match
# kernel debug lines which may mention initramfs
- '\(initramfs\)'
timeout:
minutes: 3
- test:
namespace: testdevice
connection: serial
timeout:
minutes: 5
definitions:
- repository: git://git.linaro.org/lava-team/lava-functional-tests.git
from: git
path: lava-test-shell/smoke-tests-basic.yaml
name: smoke-tests-bbb
|
Download or view the complete example:
examples/test-jobs/namespace-connections-example1.yaml:
Note
It is not allowed to combine the common
namespace with any others
- it is special-cased. If you are defining more namespaces in your job, give
them clear descriptive names that are unique within that job.
Multiple serial port support
Background
On common test devices where LAVA interacts with the device using a
serial connection, there is typically a simple flow to the test job,
running through the action blocks defined in that test job:
deploy
block:
- Set up job artifacts for the test job
boot
block:
- Start the device
- Connect to the serial port
- Control the boot loader to boot the desired artifacts
- Read kernel boot messages - the serial connection is the kernel
console
- Wait for the specified prompt to appear, and log in if needed
test
block:
- Assume the device is waiting for input at a shell prompt
- Send shell commands over the serial connection to run tests
- Read back the results of those tests from that same serial
connection
- (implicit) job cleanup block:
- Disconnect from the serial port
- Shutdown the device
Even in this simple setup, there is a potential problem. The single
serial connection is used for output both by the test shell and
the kernel. Ideally, this should not cause any problems, but in the
real world it is all too common for kernel log messages to be
interleaved with test output. This could be something simple and
expected (e.g. a test action bringing up a network interface will
cause the driver for that interface to announce link state), or it
could be something unexpected (e.g. the kernel reporting hardware
failure from an unrelated driver, or a malformed network packet). As
the kernel messages and the test shell output are interleaved on a
character-by-character basis, it often is impossible to parse
each. This can cause tests to fail in unexpected ways, or it can cause
LAVA to fail to parse test output and so log incomplete or incorrect
test results.
Over the years, the LAVA developers have worked to reduce the impact
of this problem, but it is fundamentally impossible to solve it while
kernel messages and test shell output are sharing the same
connection. It is possible to change the logging level of the
kernel to reduce the number or frequency of its messages, but often
those messages are critical information when debugging a fault so this
is not a good solution for everybody.
Isolating kernel messages from test output
The only reliable way to solve the problem with interleaving and
corrupted test shell output and kernel messages is to isolate them
physically, on different connections.
For some time, LAVA has had support for driving multiple connections
to a device independently, using secondary connections. This can be a great solution for many test
requirements, but doesn’t solve all problems for all people. It
depends on being able to start extra connections via the network
(typically SSH or telnet), so on test devices without functional
networking support it cannot work. It also requires tests to be
written using the MultiNode protocol, which can add
considerable complexity to an otherwise simple test job.
As an alternative, many test devices include more than one hardware
serial port. Most such devices will use just one of those serial ports
for firmware, bootloader and kernel messages (a primary
console). Linux will then start a getty
process (a serial login
program) on that primary console. If more serial ports are available,
connecting those ports and configuring the test OS to spawn more
getty
processes is an easy way to get more connections. These
extra connections are all independent from the primary, so it is safe
to run test shell commands on these without interleaving test output
with kernel message output.
LAVA supports using these multiple serial connections in a simple way,
avoiding the need for MultiNode complexity.
Changes needed in the test image
In common with the ssh
method, the use of multiple serial
connections involves some risks because the creation of the getty
on the additional serial port(s) is managed by the kernel and rootfs
of the test image. When using multiple serial connections, always
test that the booted system raises the getty
correctly and that
the login
process works before committing to using this method.
Configuring serial ports
To configure LAVA to connect to one or more serial ports of a device, create a
list of connection_commands
in the device dictionary. LAVA will use the command tagged with
primary
to open the connection early in test job startup (in the first
boot
action) , and will keep this connection open right until the end of
the test job.
In earlier versions of LAVA, only a single connection command could be used:
{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set connection_command = 'telnet dispatcher01 7001' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}
This has worked fine when just using a single serial connection but is now
deprecated to support working with more than one and other improvements in
connection handling. The connection_list
is a more flexible way to
configure one or more serial ports:
{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}
{% set connection_list = ['uart0'] %}
{% set connection_commands = {'uart0': 'telnet dispatcher01 7001'} %}
{% set connection_tags = {'uart0': ['primary', 'telnet']} %}
primary
denotes the serial connection which will be started automatically
with each test job.
Other tags describe how LAVA should close the connection at the end of the
test job, possible values are telnet
, ssh
. If your connection command
does not use telnet
or ssh
, the connection will be forcibly closed
using kill -9
.
Or with two serial connections:
{% extends 'beaglebone-black.jinja2' %}
{% set power_off_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command off --port 12' %}
{% set hard_reset_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command reboot --port 12' %}
{% set power_on_command = '/usr/bin/pduclient --daemon localhost --hostname pdu01 --command on --port 12' %}
{% set connection_list = ['uart0', 'uart1'] %}
{% set connection_commands = {'uart0': 'telnet dispatcher01 7001', 'uart1': 'telnet dispatcher01 7002'} %}
{% set connection_tags = {'uart0': ['primary', 'telnet'], 'uart1': ['telnet']} %}
This defines two serial ports (labeled uart0
and uart
), then
describes how to connect to each one. Finally, it sets a tag
of
primary
on uart0
- this tells LAVA that uart0
is the
primary connection, the one used for boot and kernel messages. This
makes uart0
exactly equivalent to the single serial connection
defined in the previous example. In future, more connection tags might
be added with extra meaning.
LAVA now (as of writing in October 2017) supports either of these
methods to configure serial ports, but at some point in the future the
older connection_command
method may be deprecated. The two
methods may not be mixed in the same device dictionary - either define
a single ``connection_command`` or use the new list of connections.
Using multiple serial ports
For a typical device with multiple serial ports, we can extend our
simple job flow above (changes in bold):
deploy
block:
- Set up job artifacts for the test job
- first
boot
block:
- Start the device
- Connect to the primary serial port, creating an explicit
namespace for later actions to use it
- Control the boot loader to boot the desired artifacts
- Read kernel boot messages - the primary serial connection is
the kernel console
- Wait for the specified prompt to appear, and log in if needed
- (optional) first
test
block:
- Tests to run using the primary serial port, via the
namespace created for it. (This is likely to be empty, in which
case you can just leave this test block out altogether)
- second
boot
block:
- Start a new connection to a non-primary serial port, and create
a new namespace for it
- Wait for the specified prompt to appear on the non-primary
serial port, and log in if needed
- second
test
block:
- Assume the device is waiting for input at a shell prompt
- Send shell commands over the non-primary serial connection to
run tests
- read back the results of those tests on the non-primary serial
connection
- in the background, listen to the existing serial port
connection for kernel messages (feedback)
- (implicit) job cleanup:
- disconnect from the serial ports
- shutdown the device
Note
To use the extra serial port here, the operating system
image on the test device will also need to be configured to
start a getty
on the extra serial port. This can be done
in the image as it is prepared, or alternatively it could be
started by logging in using the test action on the primary
console. That latter approach might seem to be the obvious
path, but again beware of serial corruption causing
problems. The period during and immediately after boot is
when kernel messages are most likely to be intermingled with
attempts to control a device on the primary console.
Example job 1: Simple beaglebone-black job with a second serial port
Here’s a simple test job on a common board, a Beaglebone Black. The
board only exposes one serial port for easy use, so we’ve added a USB
serial adapter as a second port.
Download or view the complete example:
examples/test-jobs/bbb-2serial.yaml:
- Deploy and boot the device
- Create the connection to the second serial port
- Tell the test shell to use the new connection
Deploy and boot the device
This is using a simple Debian Stretch nfs rootfs and initramfs. The
rootfs is easy to generate using standard tools; the only change in
there is to define the second serial console on
/dev/ttyUSB0. Remember that
a similar change will likely be needed in any test image you want to
test this way. Note the explicit namespace bbb
defined in the
deploy
action, and created in the boot
action:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 | - deploy:
namespace: bbb
timeout:
minutes: 4
to: tftp
kernel:
url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/vmlinuz-4.9.0-4-armmp
type: zimage
ramdisk:
url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/initrd.img-4.9.0-4-armmp
compression: gz
modules:
url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/modules.tar.gz
compression: gz
nfsrootfs:
url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/stretch-armhf-nfs-extra-getty.tar.gz
compression: gz
dtb:
url: http://people.linaro.org/~steve.mcintyre/lava/images/stretch-armhf-multi-serial/dtbs/am335x-boneblack.dtb
- boot:
namespace: bbb
method: u-boot
commands: nfs
prompts:
- 'login:'
timeout:
minutes: 2
# startup the extra UART
- boot:
|
A boot
action would typically include an auto_login
section,
but in this test we’re not going to be doing any testing using the
primary serial connection. Hence, we just add a prompts
section
looking for login:
to check when this boot is complete.
Create the connection to the second serial port
Next, we use a second boot
action block to create a new connection
in a new namespace called isolation
. We’re using the
new_connection
method, using the uart1
connection defined in
the device dictionary. As we’re going to be using this new connection
for testing, we now run auto_login
here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | # support auto-login
namespace: isolation
method: new_connection
connection: uart1
prompts:
- 'root@stretch:'
auto_login:
login_prompt: 'login:'
username: root
timeout:
# shorter timeout as the device should exist.
minutes: 2
- test:
# Use the bbb namespace - we depend on that for the deployed data
|
Tell the test shell to use the new connection
Finally, we start our tests.
- The namespace of the
test
action matches the bbb
namespace used in the deploy
and boot
actions of the
device. This ensures that the test shell has access to the dynamic
data created by the correct deployment action to be able to know
what rootfs is in use, and where to find the test shell files on
that rootfs.
- The
test
action also has connection-namespace
defined to
isolation
- this tells it to use the connection tracked in the
isolation
namespace, rather than the default connection in the
bbb
namespace. This is the key part of the isolation, running
tests on the second serial port.
| namespace: bbb
# Run the test shell using the "isolation" connection
connection-namespace: isolation
timeout:
minutes: 5
definitions:
- repository: http://git.linaro.org/lava-team/lava-functional-tests.git
from: git
path: lava-test-shell/smoke-tests-basic.yaml
name: smoke-tests
|
Download or view the complete example:
examples/test-jobs/bbb-2serial.yaml:
Example job 2: A more complicated setup including LXC
Here’s a more complicated example job, including the use of LXC for
deployment. This was the first real-world use case for the multiple
serial port support, running Linux kernel functional testing on a
HiKey 6220. The HiKey 6220 hardware includes an extra serial port, but
deploying to the board is more involved - we use fastboot in an LXC
container, which means we have another namespace to track in the
test job. Let’s unpick the test job.
- Define the container
- Deploy and boot the container
- Use the container to deploy and boot the device
- Create the connection to the second serial port
- Tell the test shell to use the new connection
Download or view the complete example:
examples/test-jobs/multiple-serial-ports-lxc.yaml:
Define the container
The distribution and suite of the container, as well as the name, are defined
using the lava-lxc
protocol block.
| protocols:
lava-lxc:
name: lxc-hikey-oe
template: debian
distribution: debian
|
Deploy and boot the container
The deploy and boot step for the LXC set the timeouts and prompts for this
container. Note the name of the namespace
used in these actions.
The connection to the LXC is defined within the tlxc
namespace and the
connection is created in the boot
action. In the case of LXC support, this
is done by running lxc-attach
on the dispatcher instead of a connection
command from the device configuration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | actions:
- deploy:
namespace: tlxc
timeout:
minutes: 15
to: lxc
packages:
- fastboot
- boot:
namespace: tlxc
prompts:
- 'root@(.*):/#'
- 'hikey:/'
timeout:
minutes: 5
method: lxc
|
Use the container to deploy and boot the device
Next, the dispatcher runs commands inside that LXC container to download and
deploy an OE image to a HiKey 6220 board, then boot it. This example uses the
hikey-oe
namespace. The details of how the HiKey 6220 is deployed and
booted are not relevant to how the multiple serial support operates, but do
take note of the namespace
used to boot
the device. The boot
operation is responsible for creating the connection (in this case by running
a connection command specified in the device configuration).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 | - deploy:
namespace: hikey-oe
timeout:
minutes: 30
to: fastboot
connection: lxc
images:
ptable:
url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/4.9/83/bootloader/ptable-linux-8g.img
reboot: hard-reset
boot:
url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/linux-mainline/588/boot-0.0+AUTOINC+06e4def583-fb1158a365-r0-hikey-20180128213254-588.uefi.img
reboot: hard-reset
system:
url: http://images.validation.linaro.org/snapshots.linaro.org/openembedded/lkft/morty/hikey/rpb/linux-mainline/588/rpb-console-image-hikey-20180128213254-588.rootfs.img.gz
compression: gz
apply-overlay: true
# ensure that this job raises a network interface with DHCP before relying on internet access
protocols:
lava-lxc:
- action: fastboot-deploy
request: pre-power-command
timeout:
minutes: 2
- boot:
namespace: hikey-oe
# terminate monitoring of this connection at login prompt
prompts:
- 'login:'
timeout:
minutes: 5
method: grub
commands: installed
protocols:
lava-lxc:
- action: grub-sequence-action
request: pre-os-command
timeout:
minutes: 2
# boot uart0 block
|
Create the connection to the second serial port
As with making the connection to the LXC and making the connection to the
primary UART of the HiKey 6220 DUT, making the connection to
the second or additional serial ports involves a boot
action. The action
must create a new namespace to store the connection to the second serial
port. (Any subsequent connections to other serial ports would similarly require
a unique namespace for each connection.) This namespace will be used later to
isolate a test shell from the primary connection used for the deployment and
boot actions of the device.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | # make the connection to the second uart for use in the test shell
# support auto-login
namespace: isolation
connection: uart0
prompts:
- 'root@hikey:~#'
method: new_connection
auto_login:
login_prompt: 'login:'
username: root
timeout:
# shorter timeout as the device should exist.
minutes: 2
# test isolation block
|
Tell the test shell to use the new connection
This is where it all comes together.
The namespace of the test shell matches the namespace of the
deployment and boot actions of the device. This ensures that the test
shell has access to the dynamic data created by the correct deployment action
to be able to know what rootfs is in use and where to find the test shell
files on that rootfs.
In this example, the test shell needs a namespace of hikey-oe
The connection-namespace of the same test shell matches the namespace
of the boot action of the second serial port. This ensures that the test
shell communicates with the DUT over the isolated connection instead
of the connection which is stored in the main namespace.
In this example, the test shell needs a connection-namespace of
isolation
| namespace: hikey-oe
connection-namespace: isolation
timeout:
minutes: 5
definitions:
- repository: http://git.linaro.org/lava-team/lava-functional-tests.git
from: git
path: lava-test-shell/smoke-tests-basic.yaml
name: smoke-tests-basic-oe
|
Download or view the complete example:
examples/test-jobs/multiple-serial-ports-lxc.yaml:
Limitations with multiple serial ports
The method described here is reasonably simple to configure and use,
but it is does have limitations. While LAVA will read from multiple
connections (almost) in parallel this way, it will only write to one
of them at once. The others will all be read-only. This may well suit
your needs, but if not then there is another option - using MultiNode
with secondary connections. This is more
powerful, but much more complex to describe in a test job.
Secondary Connection
Secondary Connections are a way to have two simultaneous connections
to the same physical device, equivalent to two logins. Each connection
needs to be supported by a distinct TestJob, so a MultiNode group
needs to be created so that the output of each connection can be
viewed as the output of a single TestJob, just as if you had two
terminals. The second connection does not have to use the same
connection method as the current connection and many devices can only
support secondary connections over a network interface, for example
SSH or telnet.
A Secondary Connection has a deploy step and the device is already
providing output over the primary connection (typically serial) before
the secondary connection is established. This is closer to having the
machine on your desk. The TestJob supplies the kernel and rootfs or
image to boot the device and can optionally use the secondary
connection to push other files to the device (for example, an ssh
secondary connection would use scp
).
A Secondary Connection can have control over the daemon via the
deployment using the primary connection. The client connection is
still made by the dispatcher.
Secondary Connections require authorization to be configured, so the
deployment must specify the authorization method. This allows the
overlay for this deployment to contain a token (e.g. the ssh public
key) which will allow the connection to be made. The token will be
added to the overlay tarball alongside the directories containing the
test definitions.
- deploy:
to: tmpfs
authorize: ssh
kernel:
url: http://....
nfsrootfs:
url: http://...
dtb:
url: http://....
Certain deployment Actions (like SSH) will also copy the token to a particular
location (e.g. /root/.ssh/authorized_keys
) but test writers can also add a
run step which enables authorization for a different user, if the test requires
this.
Note
The /root/.ssh/authorized_keys
file will be replaced when the
LAVA overlay is unpacked, if it exists in the test image already. This is a
security precaution (so that test images can be shared easily without
allowing unexpected access). Hacking sessions append to this file after the
overlay has been unpacked.
Deployment can also include delivering the LAVA overlay files, including the
LAVA test shell support scripts and the test definitions specified by the
submitter, to the host device to be executed over the secondary connection.
So for SSH, the secondary connection typically has a test action defined and
uses scp
to put the overlay into place before connecting using
ssh
and executing the tests. The creation of the overlay is part of the
deployment, the delivery of the overlay is part of the boot process of the
secondary connection, i.e. deploy is passive, boot is active. To support this,
use the MultiNode protocol on the host to declare the IP address of the host
and communicate that to the guest as part of the guest deployment. Then the
guest uses the data to copy the files and make the connection as part of the
boot action. See Writing jobs using Secondary Connections.
Considerations with a secondary connection
- The number of host devices
- Which secondary connections connect to which host device
In LAVA, this is handled using the MultiNode role using the following
rules:
- All connections declare a
host_role
which is the role
label for the
host device for that connection. e.g. if the connection has a declared role
of client
and declares a host_role
of host
, then every
client
connection will be expected to be able to connect to the host
device.
- The TestJob for each connection with the same
role
will be started on a
single dispatcher which is local to the device with the role
matching
the specified host_role
.
- There is no guarantee that a connection will be possible to any other device
in the MultiNode group other than devices assigned to a
role
which
matches the host_role
requirement of the connection.
Note
The count
of any role
acting as the host_role
must be
set to 1. Multiple roles can be defined, each set as a host_role
by at
least one of the other roles, if more than one device in the MultiNode group
needs to host secondary connections in the one submission. Multiple
connections can be made to devices of any one host_role
.
This allows for devices to be hosted in private networks where only a local
dispatcher can access the device, without requiring that all devices are
accessible (as root) from all dispatchers as that would require all devices to
be publicly accessible.
Secondary connections are affected by Security issues due to
the requirements of automation.
The device providing a Secondary Connection is running a TestJob and the
deployment will be erased when the job completes.
Note
Avoid confusing host_role
with expect_role.
host_role
is used by the scheduler to ensure that the job assignment
operates correctly and does not affect the dispatcher or delayed start
support. The two values may often have the same value with secondary
connections but do not mean the same thing.
Note
Avoid using constrained resources (like dpkg
or apt
) from
multiple tests (unless you take care with synchronization calls to ensure
that each operation happens independently). Check through the test
definitions for installation steps or direct calls to apt
and change the
test definitions.
Connections and hacking sessions
A hacking session using a Secondary Connection is the only situation
where the client is configurable by the user and the daemon can be
controlled by the test image. It is possible to adjust the hacking session test
definitions to use different commands and options - as long as both daemon and
client use compatible options. As such, a hacking session user retains security
over their private keys at the cost of the loss of automation.
Hacking sessions can be used with secondary connections, depending on
the use case.
Using secondary connections with VM groups
One example of the use of a secondary connection is to launch a VM on a device
already running a test image. This allows the test writer to control both the
kernel on the bare metal and the kernel in the VM as well as having a
connection on the host machine and the guest virtual machine.
The implementation of VMGroups created a role for a delayed start MultiNode
job. This would allow one job to operate over serial, publish the IP address,
start an SSH server and signal the second job that a connection is ready to be
established. This may be useful for situations where a debugging shell needs to
be opened around a virtualization boundary.
There is an option for downloading or preparing the guest VM image on the host
device within a test shell, prior to the VM delayed start. Alternatively, a
deploy stage can be used which would copy a downloaded image from the
dispatcher to the host device.
Each connection is a different job in a MultiNode group so that the output of
each connection is tracked separately and can be monitored separately.
Sequence
- The host device is deployed with a test image and booted.
- LAVA then manages the download of the files necessary to create
the secondary connection.
- e.g. for QEMU, this would be a bootable image file
- LAVA also creates a suitable overlay containing the test definitions to be
run inside the virtual machine.
- The test image must start whatever servers are required to provide the
secondary connections, e.g. ssh. It does not matter whether this is done
using install steps in the test definition or pre-existing packages in the
test image or manual setup. The server must be configured to allow the
(insecure) LAVA automation SSH private key to log in as authorized - this
key is available in the
/usr/lib/python3/dist-packages/lava_dispatcher/device/dynamic_vm_keys
directory when lava-dispatcher is installed or in the lava-dispatcher git
tree.
- The test image on the host device starts a test definition over the existing
(typically serial) connection. At this point, the image file and overlay for
the guest VM are available on the host for the host device test
definition to inspect, although only the image file should actually be
modified.
- The test definition includes a signal to the LAVA MultiNode API which
allows the VM to start. The signal includes an identifier for which VM to
start, if there is more than one.
- The second job in the MultiNode group waits until the signal is received
from the coordinator. Upon receipt of the signal, the
lava dispatch
process running the second job will initiate the secondary connection to the
host device, e.g. over SSH, using the specified private key. The connection
is used to run a set of commands in the test image running on the host
device. It is a TestError if any of these commands fail. The last of these
commands must hold the connection open for as long as the test writer
needs to execute the task inside the VM. Once those tasks are complete, the
test definition running in the test image on the host device signals that
the VM has completed.
The test writer is given full control over the commands issued inside the test
image on the host device, including those commands which are responsible for
launching the VM. The test writer is also responsible for making the
overlay available inside the VM. This could be by passing arguments to the
commands to mount the overlay alongside the VM or by unpacking the overlay
inside the VM image before calling QEMU. If set in the job definition, the test
writer can ask LAVA to unpack the overlay inside the image file for the VM and
this will be done on the host device before the host device boots the test
image - however, this will require an extra boot of the host device, e.g. using
the dynamic master support.
Basic use cases
Prebuilt files can be downloaded, kernel, ramdisk, dtb, rootfs or complete
image. These will be downloaded to the host device and the paths to these files
substituted into the commands issued to start the VM, in the same way as with
bootloader like u-boot. This provides support for tests within the VM using
standard, packaged tools. To simplify these tests further, it is recommended to
use NFS for the root filesystem of the host device boot - it leads to a quicker
deployment as the files for the VM can be downloaded directly to the NFS share
by the dispatcher. Deployments of the host device system to secondary media,
e.g. SATA, require additional steps and the job will take longer to get to a
point where the VM can be started.
The final launch of the VM will occur using a shell script (which will then be
preserved in the results alongside the overlay), containing the parsed
commands.
Advanced use cases
It is possible to use a test shell to build files to be used when launching the
VM. This allows for a test shell to operate on the host device, building,
downloading or compiling whatever files are necessary for the operation of the
VM, directly controlled by the test shell.
To avoid confusion and duplication, LAVA does not support downloading some
files via the dispatcher and some via the test shell. If there are files needed
for the test job which are not to be built or generated within the test shell,
the test shell will need to use wget
or curl
or some other tool present
in the test image to obtain the files. This also means that LAVA is not able to
verify that such URLs are correct during the validation of the job, so test
writers need to be aware that LAVA will not be able to fail a job early if the
URL is incorrect as would happen in the basic use case.
Any overlay containing the test definitions and LAVA test scripts which are to
be executed inside the VM after the VM has booted still needs to be downloaded
from the dispatcher. The URL of this overlay (a single tarball containing all
files in a self-contained directory) will be injected into the test shell files
on the host device, in a similar way to how the MultiNode API provides
dynamic data from other devices in the group.
The test writer is responsible for extracting this tarball so that it is
present or is bind mounted into the root directory of the VM so that the
scripts can be launched immediately after login.
The test shell needs to create the final shell script, just as the basic use
case does. This allows the dispatcher running the VM to connect to the host
device and use a common interface to launch the VM in each use case.
LAVA initiates and controls the connection to the VM, using this script, so
that all output is tracked in the MultiNode job assigned to the VM.
Sample job definition for the VM job
# second half of a new-style VM group job
# each connection is a different job
# even if only one physical device is actually powered up.
device_type: kvm-arm
job_name: wandboard-qemu
timeouts:
job:
minutes: 15
action:
minutes: 5
priority: medium
target_group: asd243fdgdfhgf-45645hgf
group_size: 2
parameters:
# the test definition on the host device manages how
# the overlay is applied to the VM image.
overlay: manual # use automatic for LAVA to do the overlay
# An ID appended to the signal to start this VM to distinguish
# it from any other VMs which may start later or when this one
# completes.
vm_id: gdb_session
actions:
- boot:
# as kvm-arm, this happens in a test image via
# the other half of this MultiNode job
timeout:
minutes: 3
# alternative to u-boot
connection: ssh
method: vm
# any way to launch a vm
commands:
# full access to the commands to run on the other device
- qemu-system-arm -hda {IMAGE}
type: qemu
prompts:
- 'linaro-test'
- 'root@debian:~#'
- test:
name: kvm-basic-singlenode
timeout:
minutes: 5
definitions:
- repository: git://git.linaro.org/lava-team/lava-functional-tests.git
from: git
path: lava-test-shell/smoke-tests-basic.yaml
name: smoke-tests
Primary remote connection
When a test device does not have support at all for a primary serial
connection, there is another, more limited way of using it in LAVA -
the Primary Remote Connection. For this to work, the test device must
boot automatically and start a remote login daemon (e.g. sshd) with
configured authentication. The TestJob for a primary remote connection
then skips the deploy stage and uses a simple boot method which just
establishes the connection. A device providing a primary remote
connection in LAVA only provides access to that connection via a
single submitted TestJob at a time. A MultiNode job can make multiple
connections, but other jobs will see the device as busy and not be
able to start their connections.
Warning
Primary remote connections can raise issues of
Persistence - the test writer is solely responsible for
deleting any sensitive data copied, prepared or downloaded using a
primary remote connection. Do not leave sensitive data for the next
TestJob to find. Wherever possible, use primary remote connections
with schroot
support so that each job is kept within a
temporary chroot, thereby also allowing
more than one primary (schroot) remote connection on a single
machine.
It is not necessarily required that a device offering a primary remote
connection is permanently powered on. The only connections being made
to the device are done via the scheduler, which ensures that only one
TestJob can use any one device at a time. Depending on how long it
takes to boot the device, it is feasible to have a device offering
primary remote connections which is powered down between jobs.
A Primary Remote Connection is established by the dispatcher, and is
therefore constrained in the options which are available to the client
requesting the connection, The TestJob has no control over the
arguments passed to the connection.
Primary remote connections are affected by Security issues due
to the requirements of automation.
Devices supporting Primary Remote Connections
A device offering a primary remote connection needs a particular
configuration in the device dictionary table:
- Only primary remote connection deployment methods defined in the
deploy_methods
parameter, e,g, ssh
.
- Support in the device_type template to replace the list of deployment
methods with the list supplied in the
deploy_methods
parameter.
- No
serial
connection support in the boot
connections list.
- No
methods
in the boot parameters.
- No Power Commands can be used in the device dictionary.
This prevents other jobs being submitted which would cause the device
to be rebooted or have a different deployment prepared. This can be
further enhanced with device tag support.
Hacking sessions can also be supported with primary remote
connections, depending on the use case.
Warning
Remember that in addition to issues related to the
Persistence of a primary remote connection device, hacking
sessions on primary remote connections also have all of the issues
of a shared access device - do not copy, prepare or download
sensitive data when using a shared access device.
SSH as the primary remote connection
Certain devices can support SSH as the primary remote connection - the
filesystems on such devices are not erased at the end of a TestJob and
provide Persistence for certain tasks. These devices declare
this support in the device configuration:
deploy:
# primary remote connection device has only connections as deployment methods
methods:
ssh
boot:
connections: # not serial
- ssh
TestJobs then use SSH as a boot method which simply acts as a login to
establish a connection:
- deploy:
to: ssh
os: debian
- boot:
method: ssh
connection: ssh
failure_retry: 2
prompts:
- 'linaro-test'
- 'root@debian:~#'
The deploy
action in this case simply prepares the LAVA overlay containing
the test shell definitions and copies those to a pre-determined location on the
device. This location will be removed at the end of the TestJob. The os
parameter is specified so that any LAVA overlay scripts are able to pick up the
correct shell, package manager and other deployment data items in order to run
the lava test shell definitions.
Security
A primary SSH connection from the dispatcher needs to be controlled through the
device configuration, allowing the use of a private SSH key which is at least
hidden from test writers. (Only protect the essential components).
The key is declared as a path on the dispatcher, so is device-specific. Devices
on the same dispatcher can share the same key or may have a unique key - all
keys still need to not have any passphrase - as long as all devices supported
by the SSH host have the relevant keys configured as authorized for login as
root.
LAVA provides a default (completely insecure) private key which can be used for
these connections. This key is installed within lava-dispatcher and is readable
by anyone inspecting the lava-dispatcher codebase in git. (This has not been
changed in the refactoring.)
It is conceivable that a test image could be suitably configured before being
submitted to LAVA, with a private key included inside a second job which
deploys normally and executes the connection instead of running a test
definition. However, anyone with access to the test image would still be able
to obtain the private key. Keys generated on a per job basis would still be
open for the lifetime of the test job itself, up to the job timeout specified.
While this could provide test writers with the ability to control the options
and commands used to create the connection, any additional security is minimal
and support for this has not been implemented, yet.
See also the Considerations with a secondary connection for information on how access to devices is
managed.
Persistence
Devices supporting primary SSH connections have persistent deployments and this
has implications, some positive, some negative - depending on your use case.
- Fixed OS - the operating system (OS) you get is the OS of the device and
this must not be changed or upgraded.
- Package interference - if another user installs a conflicting package,
your test can fail.
- Process interference - another process could restart (or crash) a daemon
upon which your test relies, so your test will fail.
- Contention - another job could obtain a lock on a constrained resource,
e.g.
dpkg
or apt
, causing your test to fail.
- Reusable scripts - scripts and utilities your test leaves behind can be
reused (or can interfere) with subsequent tests.
- Lack of reproducibility - an artifact from a previous test can make it
impossible to rely on the results of a subsequent test, leading to wasted
effort with false positives and false negatives.
- Maintenance - using persistent filesystems in a test action results in
the overlay files being left in that filesystem. Depending on the size of
the test definition repositories, this could result in an inevitable
increase in used storage becoming a problem on the machine hosting the
persistent location. Changes made by the test action can also require
intermittent maintenance of the persistent location.
Only use persistent deployments when essential and always take great care
to avoid interfering with other tests. Users who deliberately or frequently
interfere with other tests can have their submit privilege revoked.
See Disposable chroot deployments for a solution to some of these issues but the
choice of operating system (and the versions of that OS available) within the
chroot is down to the lab admins, not the test writer. The principal way to get
full control over the deployment is to use a Secondary Connection.
Disposable chroot deployments
Some devices can support mechanisms like LVM snapshots which allow for a
self-contained environment to be unpacked for a single session and then
discarded at the end of the session. These deployments do not suffer the same
entanglement issues as simple SSH deployments and can provide multiple
environments, not just the OS installed on the SSH host system.
This support is similar to how distributions can offer “porter boxes” which
allow upstream teams and community developers to debug platform issues in a
native environment. It also allows tests to be run on a different operating
system or different release of an operating system. Unlike distribution “porter
boxes”, however, LAVA does not allow more than one TestJob to have access to
any one device at the same time.
A device supporting disposable chroots will typically follow the
configuration of Devices supporting Primary Remote Connections. The device
will show as busy whenever a job is active, but although it is
possible to use a secondary connection as well, the deployment methods
of the device would have to disallow access to the media upon which
the chroots are installed or deployed or upon which the software to
manage the chroots is installed. e.g. a device offering disposable
chroots on SATA could offer ramdisk or NFS tests.
LAVA support for disposable chroots is implemented via schroot
(forming the
replacement for the dummy-schroot device in the old dispatcher).
Typical device configuration:
deploy:
# list of deployment methods which this device supports
methods:
ssh:
schroot:
- unstable
- trusty
- jessie
boot:
connections:
- ssh
Optional device configuration allowing secondary connections:
deploy:
# list of deployment methods which this device supports
methods:
tftp:
ssh:
schroot:
- unstable
- trusty
- jessie
boot:
connections:
- serial
- ssh
The test job YAML would simply specify:
- deploy:
to: ssh
chroot: unstable
os: debian
- boot:
method: ssh
connection: ssh
failure_retry: 2
prompts:
- 'linaro-test'
- 'root@debian:~#'
Note
The OS still needs to be specified, LAVA does not guess based on the chroot name. There is nothing to stop
an schroot being named testing
but actually being upgraded or replaced
with something else.
The deployment of an schroot involves unpacking the schroot into a logical
volume with LVM. It is an InfrastructureError Exception if this step
fails, for example if the volume group has insufficient available space.
schroot
also supports directories and tarballs but LVM is recommended as it
avoids problems of Persistence. See the schroot man page
for more information on schroot
. A common way to create an schroot
is
to use tools packaged with sbuild or you can use debootstrap.