rolfwr.net
Too much time has been spent arguing about error return codes vs exceptions already
Structured log formats and GELF is pretty neat, but that is beside the point
UX ≠ UI + animation
Let's look at a motivating example
Task: Write a program that reads two integers a and b prints the result of a modulo b.
#include <iostream>
int main() {
int a, b;
std::cin >> a;
std::cin >> b;
int remainder = a % b;
std::cout << remainder << std::endl;
}
What, if anything, is wrong with this code?
— Steven Wright, comedian
No, this is not an Einstein quote, despite what you read on the internet.
How would you deal with these reported issues?
Author: Fails when using Chicago Manual of Style which spells out single digits. | FR/UX/CC |
Astronomer: Sun to Saturn distrance in meters gives wrong AU remainder | B/UX/FR |
Mathematician: I need a domain and codomain of ℤ, not significand × 2exponent | UX/CC |
Mathematician: Sign should be equal to divisor, not dividend. See Mathematica. | Doc/FR |
Dinosaur: Sometimes crashes and reboots our salary processing machine | B/RP |
Tester: Inconsistent error messages under low memory conditions | Doc |
Junior dev: Unable to reproduce reported crash. Can we add logging? | CC |
Trekkie: The core should only be dumped in the event of a warp core breach | Joke |
UI designer: Technical error messages are confusing and scary. Hide them. | UX |
UX designer: User would benefit from seeing preview of results while he types | UX/RP |
When a failure occurs, abort the directly affected activity. Let other isolated activities continue running.
Most modern OSes provide process isolation.
Service managers like Linux systemd, macOS launchd, and Windows Service Manager provides the ability to restart service processes if they fail.
If an application performs many independent tasks then it often makes sense to allow unaffected tasks to run to completion even if one of the tasks fail.
int acquire_int() {
while (true) {
try {
return request_int();
} catch (const parse_error& err) {
print_error(err.what(), err.state);
}
std::cout << "Try entering an integer value again." <<
std::endl;
}
}
int main() {
int a = acquire_int();
int b = acquire_int();
if (b == 0) {
std::cout << "Remainder undefined when dividing by "
"zero." << std::endl;
} else {
int remainder = a % b;
std::cout << remainder << std::endl;
}
}
Imagine a modern HTTP server running as a service
Memory and data corruption side-effects are be very hard to debug when visible failure occur long after the initial error.
For low-level code code the most useful behavior is to fail fast on error.
Only do retries at the outermost context. In interactive applications, the user should be kept in the loop.
Implementing retry logic at multiple layers causes unwanted delay amplification and catastrophic cascades.
Duh.
— Anonymous Al-Anon attendee
(probably)
Still not an Einstein quote, despite what the internet says.
As you abort an activity and pass the error condition up to outer contexts, don't throw away the the information only known by the inner contexts.
Each layer of context may potentially contribute to the description of the error condition, contributing higher level information in the outer layers.
Context information is frequently lost when propagating the error condition from the lower levels of a system to the higher levels of the system.
int parse_int(parser_state& state) {
skip_whitespace(state);
int sign = parse_optional_sign(state);
auto digit = parse_digit(state);
if (!digit) {
throw parse_error("Expected integer digit.", state);
}
int value = 0;
do {
value = value * 10 + sign * digit.value();
if (value != 0 && ((value < 0) != (sign < 0))) {
std::ostringstream oss;
oss << "Only integers between " << std::numeric_limits<int>::min() <<
" and " << std::numeric_limits<int>::max() << " are supported.";
throw parse_error(oss.str().c_str(), state);
}
digit = parse_digit(state);
skip_whitespace(state);
} while (digit);
if (state.pos != state.line.size()) {
throw parse_error("Unexpected character.", state);
}
return value;
}
struct parser_state {
std::string line;
size_t pos;
};
struct parse_error : public std::runtime_error {
parser_state state;
parse_error(const char* what,
const parser_state& error_state)
: std::runtime_error(what), state(error_state)
{
}
};
What do we do when we're forced to implement an interface that does not allow us to report back all the error information that we have?
Side channels!
Examples of standard side channels:
errno | Posix API |
---|---|
SetLastError() | Windows Win32 API |
Log files | Common practice |
Process dump files | Common OS mechanism |
Thread Local Storage is often used to create ad-hoc side-channels.
A unique identifier given to an ongoing activity that is provided both in main reporting channels and side channels to allow corralation of information passed through each channel.
Pass corralation IDs across network boundaries.
Identifiers you can paste into Google to find other people complaining about the same problem.
UX | The User Experience of users that just want to get on with his work with as little fuzz as possible experiences when an error occurs. |
---|---|
DX | The Developer Experience of developers that are trying to diagnose and possibly eliminate errors that have been reported. |
Don't confuse the two. Don't communicate with the user through log files.
Translation: The UI layer has no idea what happened.
Good UX requires often improving context capturing.
void print_error(const std::string& message,
const parser_state& state)
{
std::cerr << message << std::endl;
std::cerr << " " << state.line << std::endl;
std::cerr << std::string(4 + state.pos, ' ') << "^" <<
std::endl;
}
Uninitialized, incomplete or unfinished are common states which users don't consider an error.
— Anonymous
— Albert Einstein
Make the best of what you've got.
Don't close down the restaurant when you run out of parsley
void skip_whitespace(parser_state& state) {
while (state.pos < state.line.size() && state.line[state.pos] == ' ') {
++state.pos;
}
}
int parse_optional_sign(parser_state& state) {
if (state.pos < state.line.size() && state.line[state.pos] == '-') {
++state.pos;
return -1;
}
return 1;
}
std::optional<int> parse_digit(parser_state& state) {
if (state.pos < state.line.size()) {
char c = state.line[state.pos];
if (c >= '0' && c <= '9') {
++state.pos;
return c - '0';
}
}
return std::nullopt;
}
#include <iostream>
#include <ostream>
#include <limits>
#include <string>
#include <optional>
#include <sstream>
int request_int() {
parser_state state {};
std::getline(std::cin, state.line);
return parse_int(state);
}
Context Capture behavior effectively becomes part of the API as soon as outer layers start depending on it.