Exercises

Make a function
Early return
Functions as an argument
Code comprehension: Early returns
Code comprehension: Argument order, named/unnamed, defaults
Code comprehension: Lexical scoping
Challenge 1
Challenge 2

Why functions?
Syntax
Arguments/Parameters
Why specify parameter names

Readability
Robust to function changes
Sometimes you have to.
Being reasonable

Return

Lexical Scoping
Parting words

Other Resources

Functions

→ Exercises

Try these exercises BEFORE reading so you know where your gaps lie and to generate 'need' for knowledge so it doesn't just seem like a bunch of useless trivia. If you can answer all of these without referring to anything, you can skip this entirely.

→ Make a function

Make a function that takes a name as an argument and returns "Hello, NAME!". That is,

greeter("Brie")
# "Hello, Brie!"

→ Early return

Make a function that takes a name as an argument and returns "Hello, NAME!" unless that name is Kai, which it should return "...":

greeter_2("Brie")
# "Hello, Brie!"
greeter_2("Kai")
# "..."

Use an early return in your solution

→ Functions as an argument

Use sapply to greet a whole vector of people (that is, people <- c("Kai", "Gjenni", "Paula")). Do it two ways: one with an anonymous function, one with a named function.

→ Code comprehension: Early returns

Given the following code (do not run this code, only think about it in your head):

early_returns <- function(number) {
    if (number > 100) {
        return("a")
    }
    print("b")

    if (number < -100) {
        return("c")
    }

    print("d")
    return("e")
    "f"
}

What is the output (including print output) of...

early_returns(2025)
early_returns(2)
early_returns(-2025)

→ Code comprehension: Argument order, named/unnamed, defaults

If we have the following code (again, don't run it, just think it):

argument_order <- function(a = 1, b = 2, c = 3, d = 4) {
    c(a, b, c, d)
}

What is the output of:

argument_order()
argument_order(20)
argument_order(b = 11)
argument_order(b = 11, 8, 91)
argument_order(b = 20, 99, d = 100, a = 4)

→ Code comprehension: Lexical scoping

What are the outputs (if any) (again, just think it)?

special_number <- 3

lexical_scoping <- function(n) {
    special_number <- n
    special_name <- "gerald"
    special_number
}

lexical_scoping_2 <- function(n) {
    special_number + n
}

lexical_scoping(10)   # Output A
special_number        # Output B
special_name          # Output C
lexical_scoping_2(10) # Output D

→ Challenge 1

We didn't need to use sapply due to something called 'vectorization'. Can you find a way do exercise 3 without sapply? You will not need to add any additional code, just rearrange and remove code.

→ Challenge 2

We have talked about another feature in base R (that is, no packages required) that is just syntactic sugar for something else. What is it?

→ Why functions?

Things we do in life - play the guitar, run, perform a western blot - are complex sequences of events we give shorthand names. It would be as arduous as it would be insane to say "Today I picked up my left left and then moved it forward and the put it down very quickly and then..." instead of just saying "Today I ran.". You can modify how you use these verbs: with the 'leg moving' example, if I wanted to say that I ran for a longer period of time (1km vs 2km), I'd just have to repeat myself (left leg, right leg, left leg, right leg...) for twice as long. But with our shorthand, we can just say "I ran 1km" or "I ran 2km".

In programming, functions offer us similar benefits: they provide brevity and clarity through succinct names that represent a bundle of actions, and they can be modified by various arguments that change the results of the function.

You've already used a lot of functions. Anything that does something in R is a function - for instance, DESeq() is a function. Today we're going to make our own functions.

→ Syntax

To write a function, you need to know the format the language requires - the syntax. It varies based on the language, but for R, it's this:

my_function_name <- function() {
    "Wow a function :)"
}

Let's look at it bit by bit:

my_function_name

Choose something short and descriptive (this is super hard). Function names generally follow the same rules and best practices as with variables#footnote[Best practices/requirements: separate words with an underscore (aka snake_case, not camelCase), and use lower case. Function names can't start with numbers, but can have numbers in them (3_cats doesn't work but cats_3 does). No spaces or dashes ('-') allowed.].

<-

Assignment operator: It says to store this function in my_function_name

How is `<-` pronounced?

'gets'

If you're reading "x <- 4", you'd say "x gets four"

function(){}

The function keyword, parentheses (for arguments - we'll talk about that later) and the body (everything between {}). Put this in there so R knows it's a function. This is set in stone.

"Wow a function :)"

The body. This is all the code that will be run every time the function is used (when the function is 'called')

→ Arguments/Parameters

Functions can also have parameters:

add_1 <- function(x) {
    x + 1
}

subtract_y_from_x <- function(x, y) {
    x - y
}

Aside: Parameters vs Argument Terminology

'Parameters' are 'stand-in' names you use when defining the function:

adder <- function(param_1, param_2) {
    param_1 + param_2
}

'Arguments' are the things you supply when you use (call) the function:

arg_1 <- 1
arg_2 <- 2
adder(arg_1, arg_2)
# 3

I sometimes slip up and call one the other, but it usually doesn't matter too much.

Parameters can have default values. Here the default value is specified between the parentheses. If you don't set x, it will use 0.

add_1 <- function(x = 0) {
    x + 1
}

add_1()
# 1

Defaults are nice when you might want to change something later, but there's a value that's generally pretty good. I'd recommend you only make a value default if it wouldn't be catastrophic (or, even worse, subtly wrong) if you forget to change it. This is one of those judgment-call things.

If you have a function with multiple defaults, you can set one, the other, both, or neither:

subtract_y_from_x <- function(x = 3, y = 4) {
    x - y
}

subtract_y_from_x()
# -1

subtract_y_from_x(x = 5)
# 1

subtract_y_from_x(y = 10)
# -7

subtract_y_from_x(x = 5, y = 10)
# -5

If you don't specify a parameter name, they will go 'positionaly':

subtract_y_from_x(5, 10) # x = 5, y = 10
# -5

subtract_y_from_x(10, 5) # x = 10, y = 5
# 5

LOOK OUT! Named arguments before positional arguments.

What?

Long story short: If you mix named and positional arguments, anything beyond your first named argument should also be named.

my_function(a = 1, b = 2, c = 3) # Good, robust
my_function(1, 2, 3) # Not robust, common (I do this a lot)
my_function(1, b = 2, c = 3) # Common as well, usually fine.
my_function(1, b = 2, 3) # Do not do this! Will work but DANGEROUS!!
my_function(b = 2, 1, 3) # Do not do this! Will work but DANGEROUS & CONFUSING!!
my_function(b = 2, 3, a = 1) # What are you DOING

Why?

If you explicitly set argument arguments with their parameter names, they will be 'taken out of the running' for positional argument matching.

This is explained best by an example:

say_stuff <- function(a, b, c) {
    print(a)
    print(b)
    print(c)
}

say_stuff("I'm first", "I'm second", "I'm third")
# "I'm first"
# "I'm second"
# "I'm third"

say_stuff(
  b = "I'm in the first position, printed second",
  "I'm in the second position, printed first",
  "I'm in the third position, printed third"
)
# "I'm in the second position, printed first",
# "I'm in the first position, printed second",
# "I'm in the third position, printed third"

In the second example, b is taken out of the running, so the remaining arguments are a and c, and will be filled up in that order. Since the next argument ("I'm in the first position, printed second") is positional (it is not named), it will fill up the first available parameter (a and c are available, a is first, so it fills that one up).

→ Why specify parameter names

Technically, the best way to write code is to always specify the parameter name, rather than using its 'position'. That is, it's technically better to do this:

add_two_things(x = 2, y = 4)
# 6

Rather than this:

add_two_things(2, 4)
# 6

There are a variety of reasons to prefer this.

→ Readability

It's easier to read what the function is doing, since the parameter names can describe the arguments (see the 'Parameter vs Argument Terminology' box)

Suppose you stumble across code with this function:

repeater("Hello how are you", "o", 2)

It's difficult to tell what this function does. It can be partly solved with a better function name, but often that can only do so much - particularly if the function has a lot of parameters. You could just run it to see what it chugs out, but that's not always an option and sometimes that doesn't tell you much. This is a bit better:

repeater(
  input = "Hello how are you",
  letter_to_repeat = "o",
  times_to_repeat = 2
)

While this might not be enough for you to know what the output is instantly (nothing beats documentation), it can give you a good guess and is usually enough if you're reading back code you wrote previously.

→ Robust to function changes

If the function changes (maybe by you, maybe by the package writer if you updated the package) it will either error if the parameter name changed (which is good! we want our misconceptions to be loud!) or still likely work as intended if just the parameter order changed.

Suppose you've written this function

number_incrementer <- function(numbers, number_to_increment = 1) {
    # Takes a vector of numbers, adds 1 only to those equal to number_to_increment
}

And you call it like this:

number_incrementer(c(1, 2, 3, 2, 4, 2), 2)
# 1 3 3 3 4 3

Then you decide you want to allow for the user to select how MUCH the number should change by:

number_incrementer <- function(numbers, amount_to_add = 1, number_to_increment = 1) {
    # Takes a vector of numbers, adds `amount_to_add` to every number that equals `number_to_increment`
}

If you forget to update your code (verrrry easy to do, particularly if you used your function in many different places), your previous code will now do this:

number_incrementer(c(1, 2, 3, 2, 4, 2), 2)
# 3 2 3 2 4 2

Note that it will not error, but it will give us a different result. Not what we wanted! We just wanted to make the function more flexible, not change the results.

If you wrote your code like this, it will still work as expected:

number_incrementer(
  numbers = c(1, 2, 3, 2, 4, 2),
  number_to_increment = 2
)
# 1 3 3 3 4 3

→ Sometimes you have to.

Some functions have a lot of parameters (ggplot2's theme function comes to mind) but you only want to change the third one, say:

function_with_many_params <- function(a = 1, b = 2, c = 3, d = 4) {
    c(a, b, c, d)
}

The easiest way to change c and leave the others as default is by using a named argument:

function_with_many_params(c = 99)
# 1 2 99 4

→ Being reasonable

The reasons to not specify argument names are small but incredibly seductive: Less typing, less code to look at. So. This is actually what I do. In general, though, as soon as I hit my first named argument, I name all the arguments after that.

→ Return

What 'comes out' of the function is known as what the function 'returns' or the 'return value' of a function. Some languages require you to specify what the function churns out using some kind of return function. Indeed, R has this as well, but it's not required:

add_1 <- function(x) {
    return(x + 1)
}

This is useful when you need an 'early return' - give back a result immediately if some condition is met:

add_1 <- function(x) {
    if (x > 10000) {
        return("You don't need 1 - you have enough.")
    }
    x + 1
}

add_1(10001)
# "You don't need 1 - you have enough."

LOOK OUT! Assignment as a return value

Occasionally, you might write a function like this:

my_function <- function(x) {
    y <- x + 2
}

When you call it, though, nothing will print to the console:

my_function(2) # No output!!!! what's happening!!!

Even stranger, this WILL print something to the console:

my_result <- my_function(2)
my_result
# 4

What's going on here??

It's because the last statement in our function body was an assignment, which prints the output of an assignment (that is, no output at all) rather than the output of the value. That is, when you type x <- 3 in your console, it doesn't print anything back, but then when you type x in your console, it returns 3.

This functions as expected:

my_function_2 <- function(x) {
  y <- x + 2
  y
}

my_function(2)
# 4

my_result_2 <- my_function(2)
my_result_2
# 4

→ Lexical Scoping

One nice feature of functions is that - in general - the things you do inside do not effect the outside space except for the return value. This means that the variables you define within the function are specific to that function:

print(my_variable) # ERROR!!! DOESN'T EXIST

my_function <- function() {
    my_variable <- "bees"
    my_variable
}
my_function()
# "bees"

print(my_variable) # STILL ERRORS!!! my_variable exists only inside the mind of `my_function`

Imagine the alternative, where you had to remember every variable name you've ever used for every function in order to avoid overwriting one. Nightmarish!

However, you can still use variables defined outside of the function:

my_global_variable <- "apples"

my_function <- function() {
    my_global_variable
}
my_function()
# "apples"

See here how we are using a variable within the function, but was defined outside of the function. You should almost always avoid this. If someone (probably you in the future) was reading your function in isolation, they would look at my_global_variable and wonder where it came from. It's much better to instead pass it as an argument:

my_global_variable <- "apples"

my_function <- function(thing_to_return) {
    thing_to_return
}
my_function(my_global_variable)
# "apples"

Finally, if you 're-define' a variable within a function that was originally defined outside the function, it won't change the outside function (again, this is very good - we like this).

my_global_variable <- "apples"

my_function <- function() {
    my_global_variable <- "pears"
    my_global_variable
}
my_function()
# "pears"
my_global_variable
# "apples"

The main purpose of all of this is so you can understand functions in isolation: you don't need any additional information beyond the inputs. That is, you don't have to worry about the 'state' of the system. It's a button that you press that does a thing, and it doesn't matter what the weather is or what clothes your wearing - it'll always do the same thing.

→ Parting words

In general, you should use functions when you notice you've written the same piece of code down twice, particularly if it's a lot of code. This helps because:

As in our running example, we can use evocative names to help us remember what's going on instead of arcane blocks of code
If you change one block, you have to remember to change the other block as well (I've been burned by this 10 trillion times). A function makes it so that one change propagates everywhere.

Aside: Anonymous Functions

I didn't intend to talk about this since it's an additional complication, but you sickos made me.

Functions don't need names so long as you're prepared to use them immediately. Which makes sense: how would you refer back to something later if it didn't have a name?

There are two ways to write anonymous functions in R, both are equivalent. One is shorter, so I use that.

# Long way:
function(x) {
  x + 1
}

# Short way:
\(x) {
  x + 1
}

# No need for braces if it fits on one line:
function(x) x + 1
\(x) x + 1

The second way is 'syntactic sugar' for the first, and uses '\' because it looks vaguely like a lambda (λ). That's because a field of mathematics called 'lambda calculus' uses...

...you know what, doesn't matter.

In general, if you're using an anonymous function it's because you're supplying it as an argument to some other function. For instance, the function sapply applies a function to each item of a vector:

sapply(c("a", "b", "c"), toupper)
# "A" "B" "C"

This function lends itself well to using anonymous functions:

sapply(c(1, 2, 3), \(x) x + 4)
# 5 6 7

Please use anonymous functions responsibly. If your function gets to be more than one line, don't use braces - just name the function and use it:

add_3 <- function(x) {
  a <- x + 4
  b <- a - 1
  b
}
sapply(c(1, 2, 3), add_3)
# 4 5 6

Otherwise the person reading your code (which is probably you) will need to understand the whole un-named mess instead of just looking at the function and being like 'oh I bet this adds three to stuff'. This is also why good names are important.

→ Other Resources

The functions chapter in R for Data Science (link)