Browse Source

Add: Wordcloud documentation

janwey 8 months ago
parent
commit
da5ab05349
3 changed files with 76 additions and 1 deletions
  1. 2
    1
      docs/README.md
  2. 74
    0
      docs/wordcloud.md
  3. BIN
      docs/wordcloud.pdf

+ 2
- 1
docs/README.md View File

@@ -1,4 +1,5 @@
1 1
 # Documentation
2 2
 
3 3
 * **[collecto.R](./collector.md)** the collector/scraper for data from different socialmedia-sources
4
-* **[plotte.R](./plotter.md)** the visualizer script for the collected data
4
+* **[plotte.R](./plotter.md)** the visualizer script for the collected data
5
+* **[word_cloud.py](./wordcloud.md)** another visualizer script, generating a wordcloud graphic

+ 74
- 0
docs/wordcloud.md View File

@@ -0,0 +1,74 @@
1
+# Documentation: [word_cloud.py](../word_cloud.py)
2
+
3
+## Table of Contents
4
+* [Imports and Dependencies](#imports-and-dependencies)
5
+* [Word Scrambling](#scrambling)
6
+* [Creating the Wordcloud](#wordcloud)
7
+
8
+* * *
9
+
10
+## Imports and Dependencies
11
+
12
+The first import is the
13
+[regular expressions library](https://docs.python.org/2/library/re.html), which
14
+will be used in this case to divide the entire projects-strings we want to
15
+visualize into single words.
16
+
17
+Next, we also need the
18
+[random library](https://docs.python.org/2/library/random.html) to scramble the
19
+words in order to work around an issue in the `wordcloud_cli.py` script.
20
+
21
+As a dependency of this script, we obviously need Python2, including the
22
+specified import libraries. Additionally, the script described here does not
23
+create the wordcloud itself, but prepares the text we afterwards can forward to
24
+[wordcloud_cly.py](https://github.com/amueller/word_cloud).
25
+
26
+* * *
27
+
28
+## Scrambling
29
+
30
+In order to scramble the words, we define the `scrambled()` function. It simply
31
+takes a certain number of strings, scrambles their order with `random.shuffle()`
32
+and outputs the result:
33
+```
34
+  def scrambled(orig):
35
+    dest = orig[:]
36
+    random.shuffle(dest)
37
+    return dest
38
+```
39
+
40
+In conjunction with the defined function `get_words_from_string()`, which splits
41
+a string into its individual words, the entire script boils down to:
42
+
43
+* split string of projects into individual words (projects)
44
+* scramble the order of the words (projects)
45
+* join the words together to a single string again
46
+* `print()` / output the resulting string
47
+
48
+This is necessary, because the `wordcloud_cly.py` script may use several words
49
+as a single project otherwise, for example `linux linux` instead of just
50
+`linux`. Scrambling the words makes this effect extremely unlikely.
51
+
52
+* * *
53
+
54
+## Wordcloud
55
+
56
+[Wordcloud](http://amueller.github.io/word_cloud/index.html), has a rather
57
+sparse documentation and since we did not write any of its code, but simply use
58
+it, we ommit discussing the project itself.
59
+
60
+Important for our project is how we invoke the creation of the wordcloud. For
61
+this purpose, there's a `Makefile` in the root-directory of this project:
62
+```
63
+  img:
64
+	python word_cloud.py | wordcloud_cli.py --relative_scaling 0.6 \
65
+	--imagefile graphics/word_cloud.png --width=2000 --height=2000 \
66
+	--no_collocations --background="#ffffff"
67
+```
68
+
69
+Typing `make` in there, will invoke the word_cloud.py script, scrambling and
70
+exporting the names of the projects mentioned on ILoveFS Day and forward the
71
+resulting string to `wordcloud_cli.py`. As options, we choose a relative
72
+size-scale of 0.6 (0 is no size-scaling, 1 is maximum), a width and height of
73
+2000 pixels and white (`#FFFFFF`) background. The `--no_collocations` argument
74
+gives us better spacing, but you may want to experiment with that.

BIN
docs/wordcloud.pdf View File