Sort Data in R With Examples, R is a powerful tool for data analysis and manipulation. Sorting data in R is a common task that involves arranging data in ascending or descending order based on one or more columns.

This can be especially useful in exploratory data analysis, as it allows you to quickly identify patterns and outliers in your data.

In this article, we will discuss various ways to sort data in R, including sorting by a single column, sorting by multiple columns, sorting by a specific order, and sorting based on a custom function.

We will also demonstrate these methods using several inbuilt datasets in R.

## Sorting Data in R by a Single Column

Sorting data by a single column is the most basic type of sorting in R. To sort a dataset by a single column, we can use the `order()`

function. This function returns a vector of row indices corresponding to the sorted data.

Let’s illustrate this with the `mtcars`

dataset, which contains information about various car models:

```
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
```

The `data()`

function is used to load the dataset into the R environment. The `head()`

function displays the first 6 rows of the dataset.

To sort the `mtcars`

dataset by the `mpg`

column in ascending order, we can use the following code:

```
sorted_mpg <- mtcars[order(mtcars$mpg), ]
head(sorted_mpg)
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
```

The `order()`

function takes the column we want to sort by as the first argument. We then use this vector of row indices to select and rearrange the rows of the `mtcars`

data frame.

The output shows the first 6 rows of the sorted `mtcars`

dataset, with the lowest `mpg`

values at the top.

### Sorting Data in R by Multiple Columns

Sorting data by multiple columns can be useful if we want to arrange data according to more than one criterion.

To sort a dataset by multiple columns, we can use the `order()`

function with additional arguments specifying the order of importance for each column.

For example, let’s sort the `mtcars`

dataset by decreasing `mpg`

values and then by decreasing `hp`

values:

```
sorted_mpg_hp <- mtcars[order(-mtcars$mpg, -mtcars$hp), ]
head(sorted_mpg_hp)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
```

Here, we use the `-`

sign before each variable name to sort in descending order. The `mtcars`

the dataset is now sorted first by `mpg`

and then by `hp`

, with the highest `mpg`

and `hp`

values at the top.

### Sorting Data in R by a Specific Order

Sometimes, we may want to sort data in a specific order that is not alphabetical or numerical. In such cases, we can use a factor variable to specify the desired order. When we sort a data frame by a factor variable, R sorts the data according to the order of levels in the factor.

To illustrate this, let’s use the `Iris`

dataset, which contains measurements of various iris flower species:

```
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

To sort the `Iris`

dataset by the `Species`

column in the order `setosa`

, `versicolor`

, `virginica`

, we first convert the `Species`

column to a factor variable with the desired levels:

```
iris$Species <- factor(iris$Species, levels=c("setosa", "versicolor", "virginica"))
sorted_species <- iris[order(iris$Species), ]
head(sorted_species)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

The `factor()`

function is used to convert the `Species`

column to a factor variable with the levels in the desired order.

Decision tree regression and Classification ยป

We can now sort the `Iris`

dataframe by the `Species`

column and the data will be arranged according to the desired order.

### Sorting Data in R Based on a Custom Function

In some cases, we may want to sort data based on a custom function that does not rely on a standard ordering criterion such as alphabetical or numerical order.

For example, we may want to sort a dataset of people based on their age, but we may also want to prioritize people who have a higher income or a certain job title.

To sort data based on a custom function, we can use the `order()`

function with a custom ordering function passed as an argument.

The ordering function should take a vector of values and return a vector of indices indicating the sorted order.

Let’s illustrate this with the `mtcars`

dataset, where we want to sort by the ratio of `mpg`

to `wt`

values:

```
sort_by_ratio <- function(df) {
ratio <- df$mpg / df$wt
sorted_indices <- order(ratio)
return(df[sorted_indices, ])
}
sorted_ratio <- sort_by_ratio(mtcars)
head(sorted_ratio)
mpg cyl disp hp drat wt qsec vs am gear carb
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
```

Here, we define a function `sort_by_ratio()`

that takes a dataframe as its argument and computes the ratio of `mpg`

to `wt`

. We then use the `order()`

function to sort the `ratio`

vector and return the corresponding indices. We use these indices to rearrange the rows of the dataframe and return the sorted dataframe.

## Summary

Sorting data in R is a common task that involves arranging data in ascending or descending order based on one or more columns.

We can sort a dataset in various ways, such as sorting by a single column, sorting by multiple columns, sorting by a specific order, and sorting based on a custom function.

In this article, we demonstrated these sorting methods using several inbuilt datasets in R, including the `mtcars`

and `Iris`

datasets.

By using these methods, we can quickly identify patterns and outliers in our data and make informed decisions based on our findings.