Traditional debugging of errors during image builds often require a developer to place various print commands through out the build commands to help reason about the state of the system before the failure occurs. This can be slow and cumbersome.
Earthly provides an interactive mode which gives you access to a root shell when an error occurs, which we'll cover in this guide.
Let's consider a test example that prints out a randomly generated phrase:
Now we can run it with earthly +test, and we'll see a failure has occurred:
=========================== FAILURE ===========================
+test *failed* | --> RUN python3 generate_phrase.py
+test *failed* | Traceback (most recent call last):
+test *failed* | File "generate_phrase.py", line 3, in <module>
+test *failed* | text = open('sherlock.txt').read()
+test *failed* | FileNotFoundError: [Errno 2] No such file or directory: 'sherlock.txt'
+test *failed* | Command /bin/sh -c python3 generate_phrase.py failed with exit code 1
+test *failed* | +test *failed* | ERROR: Command exited with non-zero code: RUN python3 generate_phrase.py
Error: solve side effects: solve: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c /usr/bin/earth_debugger /bin/sh -c 'python3 generate_phrase.py']: buildkit-runc did not terminate successfully
Why can't it find the sherlock.txt file? Let's re-run earthly with the --interactive (or -i) flag: earthly -i +test
This time we see a slightly different message:
+test | --> RUN python3 generate_phrase.py
+test | Traceback (most recent call last):
+test | File "generate_phrase.py", line 3, in <module>
+test | text = open('sherlock.txt').read()
+test | FileNotFoundError: [Errno 2] No such file or directory: 'sherlock.txt'
+test | Command /bin/sh -c python3 generate_phrase.py failed with exit code 1
+test | Entering interactive debugger (**Warning: only a single debugger per host is supported**)
+test | root@buildkitsandbox:/code#
This time rather than exiting, earthly will drop us into an interactive root shell within the container of the build environment. This root shell will allow us to execute arbitrary commands within the container to figure out the problem:
root@buildkitsandbox:/code# ls
generate_phrase.py
root@buildkitsandbox:/code# find / | grep sherlock.txt
/sherlock.txt
root@buildkitsandbox:/code# ls /
bin boot code dev etc home lib lib64 media mnt opt proc root run sbin sherlock.txt srv sys tmp usr var
root@buildkitsandbox:/code# ls /sherlock.txt
/sherlock.txt
Ah ha! the corpus text file was located in the root directory rather than under /code. We can try moving it manually to see if that fixes the problem:
root@buildkitsandbox:/code# mv /sherlock.txt /code/.
root@buildkitsandbox:/code# python3 generate_phrase.py
I struck him down with the servants and with the lantern and left a fragment in the midst of my work during the last three years, although he has cruelly wronged.
At this point we know what needs to be done to fix the test, so we can type exit (or ctrl-D), to exit the interactive shell.
+test | time="2020-09-16T22:23:53Z" level=error msg="failed to read from ptmx: read /dev/ptmx: input/output error"
+test | time="2020-09-16T22:23:53Z" level=error msg="failed to read data from conn: read tcp 127.0.0.1:36672->127.0.0.1:5000: use of closed network connection"
+test | ERROR: Command exited with non-zero code: RUN python3 generate_phrase.py
Note that even though we fixed the problem during debugging, the image will not have been saved, so we must go back to our Earthfile and fix the problem there:
Let's fire up our integration test with earthly -P -i +test:
buildkitd | Found buildkit daemon as docker container (earthly-buildkitd)
+base | --> FROM python:3
context | --> local context .
+base | resolve docker.io/library/python:3@sha256:e9b7e3b4e9569808066c5901b8a9ad315a9f14ae8d3949ece22ae339fff2cad0 100%
context | transferring .: 100%
+base | *cached* --> WORKDIR /code
+server | *cached* --> COPY server.py .
+test | --> FROM docker:19.03.12-dind
+test | resolve docker.io/library/docker:19.03.12-dind@sha256:674f1f40ff7c8ac14f5d8b6b28d8fb1f182647ff75304d018003f1e21a0d8771 100%
+test | *cached* --> RUN apk add curl
+test | --> WITH DOCKER RUN docker run --rm -d --network=host server:latest python3 server.py && sleep 5 && curl -s localhost:8000 | grep hello
+test | Loading images...
+test | Loaded image: server:latest
+test | ...done
+test | 1dc054c647cb75bde4897a2828edb095739cb9f864ed203ed2ddb54e62554aad
+test | Command /bin/sh -c docker run --rm -d --network=host server:latest python3 server.py && sleep 5 && curl -s localhost:8000 | grep hello failed with exit code 1
+test | Entering interactive debugger (**Warning: only a single debugger per host is supported**)
There was a failure checking that the server output contained the string hello; let's see what is going on:
/ # docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8a31c54dd17 server:latest "python3 server.py" 5 seconds ago Up 4 seconds frosty_rhodes
The good news is our server container is running; let's see what happens when we try to connect to it:
/ # curl -s localhost:8000
Hello, world!/
Ah ha! The problem is our test is expecting a lowercase h, so we can fix our grep to look for an uppercase H: