Browse Source

Edit: Finishing documentation of plotte.R script

janwey 1 year ago
2 changed files with 77 additions and 0 deletions
  1. 77
  2. BIN

+ 77
- 0
docs/ View File

@@ -5,6 +5,8 @@
* [Packages used](#packages)
* [The ggplot2 package](#the-ggplot2-package)
* [The gridExtra package](#the-gridextra-package)
* [Plot: Participation by Platform](#participation-by-platform)
* [Plot: Participation by Time](#participation-by-time)

* * *

@@ -97,3 +99,78 @@ well as `legend()` to provide some extra information, necessary to understand th
graphic. Our first plot uses the previously constructed `platform` variable as
input. Since this is a factor variable, R will automatically create a barplot
from this data. For the color, we use red for the Fediverse and blue for Twitter.
The y-axes limit is a construction which *should* work in the future as well, but
may need some minor adaption eventually. In its current form, it looks for the
highest occurrence of tweets on a platform and rounds it up to the next higher
100 (110 would become 200, 401 would become 500).
ylim = c(0, ceiling(max(table(platform))/100) * 100)

The color in the second plot is generated with the `rainbow()` function, which
will simply output a full color-spectrum starting from and ending with *red*
col = rainbow(n = length(unique(instances)))

In order to have both plots in a single graphic next to each other, `par()` can
be used prior to plotting. The argument in use here generates a grid with one
line and two columns:

By issuing `pdf()` prior to plotting and `` afterwards, we can export
the graphic directly to a PDF file (vectorized).

* * *

## Participation by Time

For our second graphic we use functions from the `ggplot2` package instead of
R's default `plot()` function, simply because `ggplot2` is much better with
timeseries data.

Before doing so, we first have to create the timeseries for which we use the
`date` and `time` variables in our datasets and the `strptime()`. Connecting
these strings with `paste0()` (`paste()` would create a space bewteen both
strings), they have the form: `YYYYMMDDhhmmss` which we also specify as the
`format` argument:
twitter_time <- strptime(paste0(twitter$date, twitter$time),
format = "%Y%m%d%H%M%S")
mastodon_time <- strptime(paste0(mastodon$date, mastodon$time),
format = "%Y%m%d%H%M%S")

`ggplot2` has a rather unconventional syntax for R functions. You can combine
several ggplot-functions with a `+`, enabling you to create extremely complex
plots rather easily. The `ggplot()` function itself only initializes the plot
object and specifies the data we are going to use. We combine (`+`) this function
with `geom_histogram()` which - as the name might imply - creates the histogram
itself. `scale_x_datetime()` and `scale_y_continuous()` specify the x-axes as
timeline and y-axes as an index (counting continously). Lastly, we can specify a
title of the plot with `ggtitle()` and fill the bars of the plot with a gradient
color from low to high with `scale__fill_gradient()`. In the case of twitter, we
save the entire plot into the `twitter_plot` variable for later use and do so
similarly with mastodon/the fediverse:
twitter_plot <- ggplot(data = twitter, aes(x=twitter_time)) +
geom_histogram(aes(fill=..count..), binwidth=60*180) +
scale_x_datetime("Date") +
scale_y_continuous("Frequency") +
ggtitle("Participation on Twitter") +
scale_fill_gradient("Count", low="#002864", high="#329cc3")

As opposed to R's `plot()` function, we can not use `par()` to create a unified
graphic with `ggplot2`. Instead we use `grid.arrange()` from the `gridExtra`
package. It takes both plots as arguments, as well as the number of columns it
should arrange them in. With `pdf()` and `` we can save the graphic
into a PDF file (vectorized) directly:
pdf(file="./plots/ilfs-participation-by-date.pdf", width=14, height=7)
grid.arrange(twitter_plot, mastodon_plot, ncol = 2)

docs/plotter.pdf View File