Your data is in a character format. To get the average value and confidence bounds you may split the strings of X
at appropriate patterns and convert them to numeric format. Note, however that you have 195 countries which would make the plot unreadable, I'll show you the way on a subset.
After reshaping your data into long format dl
(I use reshape
here where you used tidyr::gather
), there are some "No data"
values which we first want to mark as NA
.
dl <- `rownames<-`(reshape(d, idvar="country", varying=2:17, direction="long", sep="",
timevar="year"), NULL)
dl$X <- ifelse(dl$X == "No data", NA, dl$X)
Then we split the strings on "["
or "]"
or "-"
using a regular expression "\[|\]|-"
in strsplit
. This gives a list of each three elements which we want to rbind
and type.convert
from "character"
to "numeric"
: also we set proper names using setNames
. The result we cbind
to the first two columns of our long data set.
num <- setNames(type.convert(do.call(rbind.data.frame, strsplit(dl$X, " \[|\]|-"))),
c("bmi", "lo", "up"))
dl <- cbind(dl[1:2], num)[order(dl$country, dl$year), ]
Now we extract some values we need, unique
countries, years and the range
.
cy <- unique(dl$country)
yr <- unique(dl$year)
rg <- range(dl[3:5], na.rm=T)
This subsets the countries from 195 to 35 for demonstration purposes:
cy <- cy[1:(7*5)]
Finally we use matplot
in an sapply
..
x11() ## opens a window
op <- par(mfrow=c(7, 5), mar=c(4, 4, 3, 1))
sapply(cy, function(x) {
matplot(dl[dl$country %in% x, 3:5], type="l", lty=c(1, 2, 2), col=4, lwd=2,
main=x, xlab="year", ylab="BMI", xaxt="n", ylim=rg)
axis(1, at=axTicks(1), labels=yr[axTicks(1)])
})
par(op)
You may want to put this into a png
or pdf
as shown in this answer.
Result
Data:
d <- read.csv("https://raw.githubusercontent.com/tanaytuncer/LifeExpectancy_BMI/main/BMI.csv")[-(1:3), ]
names(d)[1] <- "country"