Browse Source

Edit: (WIP) documentation for plotte.R

janwey 11 months ago
parent
commit
e98fbe13d9
4 changed files with 100 additions and 2 deletions
  1. 1
    0
      docs/README.md
  2. 99
    0
      docs/plotter.md
  3. BIN
      docs/plotter.pdf
  4. 0
    2
      todo.txt

+ 1
- 0
docs/README.md View File

@@ -1,3 +1,4 @@
1 1
 # Documentation
2 2
 
3 3
 * **[collecto.R](./collector.md)** the collector/scraper for data from different socialmedia-sources
4
+* **[plotte.R](./plotter.md)** the visualizer script for the collected data

+ 99
- 0
docs/plotter.md View File

@@ -0,0 +1,99 @@
1
+# Documentation: [plotte.R](../plotte.R)
2
+
3
+## Table of Contents
4
+* [General information about the script](#the-script)
5
+* [Packages used](#packages)
6
+  * [The ggplot2 package](#the-ggplot2-package)
7
+  * [The gridExtra package](#the-gridextra-package)
8
+
9
+* * *
10
+
11
+## The Script
12
+The R script documented here doesn't have such a streamlined structure as the
13
+**Collecto.R** script. Each section
14
+
15
+## Packages
16
+As of writing this script and documentation, we only use two packages:
17
+* [ggplot2](https://cran.r-project.org/package=ggplot2) (Version 2.2.1)
18
+* [gridExtra](https://cran.r-project.org/package=gridExtra) (Version 2.3)
19
+
20
+### The ggplot2 package
21
+ggplot2 is a powerhouse of a package, when it comes to data visualisation. Our
22
+usage is rather basic and limited, however it certainly is able to create much
23
+more elegant graphics than R's default `plot()` command, which we will also use
24
+in this script at some point. From the ggplot2 package, we combine following
25
+functions:
26
+```
27
+  ggplot()              # initializing the actual plot
28
+  aes()                 # greate "aesthetic" mappings in the plot object
29
+  geom_histogram()      # declaring the histogram style of the plot
30
+  scale_x_datetime()    # positioning scales for date and time
31
+  scale_y_continuous()  # positioning scales for continous data / index
32
+  ggtitle()             # setting a title for the plot
33
+  scale_fill_gradient() # give the visualized data a gradient color
34
+```
35
+
36
+### The gridExtra package
37
+gridExtra will only be used to arrange several plots produced by the `ggplot2`
38
+package next to each other, as this does not work with the `par()` function,
39
+commonly used in conjunction with R's default `plot()`. So, we only need
40
+gridExtra for ggplot2-objects:
41
+```
42
+  grid.arrange() # arrange two ggplot2 objects in a grid
43
+```
44
+
45
+* * *
46
+
47
+## Participation by Platform
48
+The *by-platform-graphic* actually consists of two plots, arranged next to each
49
+other. One side simply divides the collected data between the two categories
50
+"Twitter" and "Fediverse". This is especially easy to divide, since the data we
51
+collected already comes in two discrete datasets for both platforms. Knowning
52
+this, we can simply create a factor variable `platform`, which contains the
53
+string `twitter` exactly so many times as we have tweet. The same is true for
54
+`fediverse`. for this, we use the `rep()` (repeat) as well as the `factor()`
55
+functions. The appropriate code looks like this:
56
+```
57
+  twitter_number <- rep(x = "twitter", times = length(twitter$text))
58
+  fediver_number <- rep(x = "fediverse", times = length(mastodon$text))
59
+  platform <- factor(c(twitter_number, fediver_number),
60
+                     levels = c("fediverse", "twitter"))
61
+```
62
+This data can now be visualized in a barplot later on.
63
+
64
+The second plot seperates all fediverse-data into the single instances. Our
65
+scraped data contains the account name of each poster, which usually includes
66
+the instance-domain as well, for example: `fsfe@status.fsfe.org`.
67
+
68
+In order to only extract the domains of the instances, we use the `sub()`
69
+function in conjunction with regex and save the results into the `instances`
70
+variable:
71
+```
72
+  instances <- sub(x = as.character(mastodon$acct), pattern = ".*\\@", replace = "")
73
+```
74
+
75
+However, all accounts on the instance you scraped your data from - in this case
76
+from [mastodon.social](https://mastodon.social) - only the username is displayed,
77
+not the domain of the instance. For example: `fsfe`.
78
+
79
+In order to catch these as well, we look for all strings, that do not contain an
80
+`@` symbol with the `grep()` function and save their position into a variable
81
+(here: `msoc`). The `invert = TRUE` argument makes sure, that we get exactly
82
+those accounts, that do **not** contain the searched pattern:
83
+```
84
+  msoc <- grep(x = as.character(mastodon$acct), pattern = "@", invert = TRUE)
85
+```
86
+
87
+Now we can replace all positions in the `instance` variable with the domain of
88
+the instance we scraped our data from. Afterwards, we should change the mode of
89
+the `instances` variable to `factor()`:
90
+```
91
+  instances[msoc] <- "mastodon.social"
92
+  instances <- as.factor(instances)
93
+```
94
+
95
+Finally, we can start plotting. For this we use the default `plot()` function, as
96
+well as `legend()` to provide some extra information, necessary to understand the
97
+graphic. Our first plot uses the previously constructed `platform` variable as
98
+input. Since this is a factor variable, R will automatically create a barplot
99
+from this data. For the color, we use red for the Fediverse and blue for Twitter.

BIN
docs/plotter.pdf View File


+ 0
- 2
todo.txt View File

@@ -1,2 +0,0 @@
1
-# TODO
2
-- replace regex of date/time in collecto.R with strptime() https://stackoverflow.com/questions/15838548/parsing-iso8601-date-and-time-format-in-r

Loading…
Cancel
Save