Browse Source

Add: Wordcloud documentation

pull/15/head
janwey 1 year ago
parent
commit
da5ab05349
3 changed files with 76 additions and 1 deletions
  1. 2
    1
      docs/README.md
  2. 74
    0
      docs/wordcloud.md
  3. BIN
      docs/wordcloud.pdf

+ 2
- 1
docs/README.md View File

@@ -1,4 +1,5 @@
# Documentation

* **[collecto.R](./collector.md)** the collector/scraper for data from different socialmedia-sources
* **[plotte.R](./plotter.md)** the visualizer script for the collected data
* **[plotte.R](./plotter.md)** the visualizer script for the collected data
* **[word_cloud.py](./wordcloud.md)** another visualizer script, generating a wordcloud graphic

+ 74
- 0
docs/wordcloud.md View File

@@ -0,0 +1,74 @@
# Documentation: [word_cloud.py](../word_cloud.py)

## Table of Contents
* [Imports and Dependencies](#imports-and-dependencies)
* [Word Scrambling](#scrambling)
* [Creating the Wordcloud](#wordcloud)

* * *

## Imports and Dependencies

The first import is the
[regular expressions library](https://docs.python.org/2/library/re.html), which
will be used in this case to divide the entire projects-strings we want to
visualize into single words.

Next, we also need the
[random library](https://docs.python.org/2/library/random.html) to scramble the
words in order to work around an issue in the `wordcloud_cli.py` script.

As a dependency of this script, we obviously need Python2, including the
specified import libraries. Additionally, the script described here does not
create the wordcloud itself, but prepares the text we afterwards can forward to
[wordcloud_cly.py](https://github.com/amueller/word_cloud).

* * *

## Scrambling

In order to scramble the words, we define the `scrambled()` function. It simply
takes a certain number of strings, scrambles their order with `random.shuffle()`
and outputs the result:
```
def scrambled(orig):
dest = orig[:]
random.shuffle(dest)
return dest
```

In conjunction with the defined function `get_words_from_string()`, which splits
a string into its individual words, the entire script boils down to:

* split string of projects into individual words (projects)
* scramble the order of the words (projects)
* join the words together to a single string again
* `print()` / output the resulting string

This is necessary, because the `wordcloud_cly.py` script may use several words
as a single project otherwise, for example `linux linux` instead of just
`linux`. Scrambling the words makes this effect extremely unlikely.

* * *

## Wordcloud

[Wordcloud](http://amueller.github.io/word_cloud/index.html), has a rather
sparse documentation and since we did not write any of its code, but simply use
it, we ommit discussing the project itself.

Important for our project is how we invoke the creation of the wordcloud. For
this purpose, there's a `Makefile` in the root-directory of this project:
```
img:
python word_cloud.py | wordcloud_cli.py --relative_scaling 0.6 \
--imagefile graphics/word_cloud.png --width=2000 --height=2000 \
--no_collocations --background="#ffffff"
```

Typing `make` in there, will invoke the word_cloud.py script, scrambling and
exporting the names of the projects mentioned on ILoveFS Day and forward the
resulting string to `wordcloud_cli.py`. As options, we choose a relative
size-scale of 0.6 (0 is no size-scaling, 1 is maximum), a width and height of
2000 pixels and white (`#FFFFFF`) background. The `--no_collocations` argument
gives us better spacing, but you may want to experiment with that.

BIN
docs/wordcloud.pdf View File


Loading…
Cancel
Save