Error Management on Android — Part 1: Warm-up

Julián Ezequiel Picó
7 min readApr 5, 2021

--

Versión en español

Hey there! I’m glad to meet you once again 😄

In this new series, we will talk about a very interesting topic related to software development: error management. A lot of things that will be seen here apply to almost all the languages/platforms available out there, but we will put the focus on Android. So, if you’re not an Android developer, don’t run away! this could be useful too 😉

The series will have 3 articles:

  • Part 1: Warm-up
  • Part 2: Error management key points
  • Part 3: Designing your error management system.

Let’s start!

Why invest in Error Management?

Maybe the most important reason is because it’s hard. There is not a clear way to manage errors, every team (and person) do it in a different way, following some pre-defined rules, but it’s not something that we really care about.

Another important reason is because we don’t talk about this. There are a lot of articles and documentation about architecture, libraries, frameworks and tools, but just a few regarding how to manage errors.

Some time ago I read a really interesting article, The Value of Inconvenient Design, not related to our topic, but enlightening about two points:

  • We are obsessed with making things easier, removing any kind of friction we find out there. Sadly, we always end up having more difficulties, more friction.
  • The design decisions we take are important: is it necessary to work on making everything easier and automatic? What do we leave out in pursuit of removing all the friction?

Ok, but why is this related to error management? Well, I started to remember my first steps as Android Dev, and of course, my first fears: errors and how to deal with them. Some of my first instincts were “Try-Catch everything”, “Add null checks all over the code” and “Hide this error under the carpet”: generalize and automate all the error handling so it’s easy to avoid failures. Over the years, we find out that this is not the right way… or not. So, here we are!

Management and Friction

When designing and developing our system/app, we face the unavoidable: errors. As we start our developer journey, we think that to avoid and handle errors, we need to catch all of them and avoid the app crashing, to remove all of the friction. So, we do:

Well, that’s not good. At the very beginning, it looks like a good approach, but it isn’t. With the above, you lose context of the error happening, such as:

  • Which is the real error? 🤔
  • Is it an error that I’m handling on my code, or is it a, for example, NullPointerException? 🤔🤔
  • How do I need to handle the error? Is there any other place where the error is already handled? Do I need to hide the failure? 🤔🤔🤔

Furthermore, a paradox appears: while hiding or handling the error in the wrong way, we’re not only keeping that error in our code: for sure this will lead to the appearance of new errors in another place. If the error occurred, we must accept it and take the necessary measures.

At the other end, we can choose to have too much friction: let all the things explode, don’t handle anything, I’m the best developer ever exists and all of the things I do will work perfectly. Well, no comments about this, right? We’re all humans and we all make mistakes. Please, don’t do it. You can say “by letting things explode, errors get found and fixed faster”. That’s not a lie, but:

  1. You can’t check every scenario and every potential error in your app, just because it’s impossible. Not only because of the different customizations, device capabilities or context conditions (like internet connection, low memory, etc.), but because we’re biased by our code.
  2. You don’t want to release an app that will for sure crash in production because you didn’t want to spend some time working on how to manage the errors. And, let me be clear, you will have errors. “Find and Fix Fast” is a good strategy only for the development stage.

So, what’s the right amount of friction to consider when developing and designing our system?

Right, the Error Management. We will define it and talk about what we need to consider to be able to say “We’re doing the things right”. But, first, let’s warm-up and set our baseline for the upcoming articles.

Warming up: setting the Baseline

So, if we want to talk about error management, we need to set a baseline. We will talk about some concepts and define terms that will help us diving into the next 2 articles. Let’s talk about:

  • Errors.
  • Errors classification.
  • Failures.
  • Error Handling responsibility.
  • Fail Fast.
  • Exceptions.

Here we go 🚀

Errors

We can say that an error is a problem inside my system that produces a non-desired result. I will talk about errors to refer to runtime errors, leaving the compilation errors aside (we need to consider them, but we have a lot of tools for that so we can keep them out of this definition).

Furthermore, we will consider that an error can be also anything that the user can see and represents an incorrect behaviour. Why? Because if we don’t consider these ones, we are ignoring, i.e.:

  • UI freeze while waiting for some submission to finish.
  • UI freeze while waiting for the UI to be rendered.
  • Information loss while using the app.
  • Long wait times while interacting with the app.
  • Etc, etc, etc.

Errors classification

I will make 3 big (big) groups for classification. This is just to summarize all the errors that can appear, it’s not a strong classification.

  • Business errors: they’re not a problem in my code or a technical issue, but something that, because of business definitions, is an error or something that cannot happen. For example, a bank account with no money cannot pay a service.
  • Exceptional errors: they represent an exception in my system, something unexpected that can stop the common flow execution on my app. In general, they’re associated with technical errors, but they can be related to a communication failure with an external system, device capabilities, etc.
  • Other errors: here we have all the errors that are not on the above-mentioned categories. Some of them can be UX errors, an exception that doesn’t crash my app but triggers a wrong behaviour (i.e. exceeding the storage capacity of a Bundle), or eternal waiting times.

Failures

Why? Isn’t this an error? Well, no. We will use failure to describe the representation of an error that has been handled. In other words, if a component (A) detects an error on another component (B), A will propagate that error as a failure. It’s really important to map an error to a failure as fast as possible, so we avoid working with a wrong state.

We can compare them and say that:

  • The error is bad because it makes my system behave in the wrong way.
  • The failure is good because it shows an error instead of hiding it, allowing someone to handle the error responsibly.

Error Handling responsibility

The errors must be treated when we have enough context to determine which actions we need to perform to recover from them.

Handling errors is one of many component’s responsibilities. If we don’t want that component to do more than it has to, we don’t want it to treat errors if it doesn’t know how.

Some examples can be:

  • If we have an issue with some attributes in an object, who will handle it? A component responsible for setting up that object, or any other place in my system?
  • If we have a network-related issue, who will handle it? Some Repository, some Gateway or any Activity?

Fail Fast

When we detect an error, we need to fail as fast as possible. Why? Because we must avoid the propagation of that error in my system.

Exceptions

Lastly, we have our beloved Exceptions. An exception is an event that changes the common flow of a requirement in our app. At a higher level of abstraction, we can say it changes the communication flow between several components.

A key point related to the exceptions is their ability to represent an error. It sounds like something small, but it avoids handling errors with specific objects, primitive types (like ints or booleans), etc. They don’t remove the effort of detecting, reporting and handling errors, but they’re a really useful abstraction to work with them independently.

And that’s it for this article! Now, we are familiarised with a lot of useful concepts that will help us with the next article.

I hope you find this article helpful, and hope to see you again at the next one! 🙂

Any comments are welcome, see you! 👋

--

--