Source files of fsfe.org, pdfreaders.org, freeyourandroid.org, ilovefs.org, drm.info, and test.fsfe.org. Contribute: https://fsfe.org/contribute/web/
https://fsfe.org
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
235 lines
14 KiB
235 lines
14 KiB
<?xml version="1.0" encoding="UTF-8" ?> |
|
|
|
<html> |
|
<version>1</version> |
|
|
|
<head> |
|
<title>Minimalistic Data Format – Open Standards</title> |
|
</head> |
|
<body class="article"> |
|
<p id="category"> |
|
<a href="/freesoftware/standards/standards.html">Open Standards</a> |
|
</p> |
|
<h1>The minimal principle: because being an open standard is not enough.</h1> |
|
|
|
<p>A tool is useless without something to work on. So what do we shape with |
|
our computing tools? Data, information, knowledge, opinions, art – in |
|
short: Content. Content is created, processed and transmitted. Nowadays |
|
much more often directly in some electronic format. The number of people |
|
who own devices and connect to the internet is constantly rising. And they |
|
use it to evolve their ways of working together. </p> |
|
|
|
<p>Content is sent from one user to another and back. To do this, the |
|
content must take on some form: The data-format. This defines how content |
|
and its wrapping are to be handled, what is allowed and how each part looks |
|
within a file or stream. Anyone who wants to participate in the data |
|
exchange must use a software application that understands the data-format |
|
in question. Otherwise the content would appear like an unknown foreign |
|
language to your computer. If a data-format does not allow for the |
|
inclusion of pictures, for example, then there is no way to include |
|
pictures with it. The choice of data-format dictates the number of years |
|
for which I may access the content (backwards compatibility) and what I am |
|
able do with it. </p> |
|
|
|
<p>A single user will probably not feel any effect of her decision when |
|
saving a file in a particular data-format. When an IT-department or a |
|
public administration decides upon a data-format the impact is far greater: |
|
It will dominate their choice of software for several years, possibly |
|
decades. The more an organisation saves its precious writings, recordings |
|
or pictures electronically, the more important it becomes to secure |
|
continued access to the documents. These decisions, directly or indirecty, |
|
lead to the funding of the initial development and maintenance of |
|
data-formats, whether they be "good or bad" formats. The choices taken at |
|
one time naturally affect the available choices in the future: Many |
|
software producers intentionally try to influence users to use a |
|
data-format that they (the producer) control. For example when technical |
|
schematics of vehicles, buildings or machinery are all held in a format |
|
controlled by the software producer, the producer of the CAD application |
|
can in essence hold the data for ransom when its time to renew the |
|
contracts. From the vendor’s point of view this is a strong position to be |
|
in for the next pricing negotiation. Occasionally, whole countries have |
|
managed to maneuver themselves into the losing end of this situation. </p> |
|
|
|
<p>As you can see, a good data-format can only be an <a |
|
href="/freesoftware/standards/def.html">Open Standard</a>. This requirement |
|
alone, however, is not enough. The data-format needs to solve a problem |
|
adequately: It should be a good fit from a functional point of view, as |
|
well as on a technical level. In order to judge this, there are a number |
|
of things to consider. The <a |
|
href="http://www.w3.org/People/Bos/DesignGuide/introduction">Essay by Bert |
|
Bos</a> explains the design principles of the W3C - the organisation which |
|
develops the formats of the world wide web. He mentions efficiency, |
|
maintainability, accessibility, extensibility, learnability, simplicity, |
|
longevity and a few more.</p> |
|
|
|
<p>Two central questions here are:</p><ul> |
|
<li>How well does the data-format solve the problem? </li> |
|
<li>Is there a simpler format that could solve the problem just as well?</li></ul> |
|
|
|
<p>The first question is self-explanatory: Whoever wants to save, transmit |
|
and search within a text would not want a format for pixel based images – |
|
though it would be inevitable to use such a format during the first step of |
|
scanning papers or incoming faxes.</p> |
|
|
|
<p>The second question is much more interesting: Is the format as simple as |
|
possible and as complicated as necessary? It is very hard to design or |
|
choose a data-format which follows this principle of minimalism.</p> |
|
|
|
<p>Firstly, there is the anti-pattern of <a |
|
href="http://sourcemaking.com/antipatterns/design-by-committee">“Design |
|
by Committee”</a>, where several decision makers participate in each decision. |
|
Decisions about which software product to use within an organisation – especially in public ones – are also often made by large committees. |
|
Then it easily happens that too many cooks spoil the broth and add more into the standards than is |
|
actually necessary. The W3C at least <a |
|
href="http://www.w3.org/People/Bos/DesignGuide/committee.html"> |
|
is aware of this pattern</a>. Many groups are not.</p> |
|
|
|
|
|
<p>A second problem is the common use of checklists when evaluating |
|
software solutions. Typically it goes like this: Every stakeholder can add |
|
something to the list; the given wishes are often specific solution ideas |
|
and get condensed into the checklist for the procurement departement; the |
|
software product promising to fulfill most of the items on the checklist, |
|
wins; most of the time this means buying a single data-format which has |
|
many, rarely used and unneeded, features. It would be better to add |
|
features with a focus on the problem (rather than the solution) to begin |
|
with. The evaluation process should reward higher grades for solutions |
|
which consist of a number of simple, easily extensible and complementary |
|
data-formats which can be combined for the more complex needs.</p> |
|
|
|
<p>But software vendors know their customers: The more features on a |
|
checklist are ticked off, the more precious a software will appear. That is |
|
because it seems to – at first glance – serve many needs. Except for the |
|
need for simple elegance. And so this is what the software and the |
|
data-format will look like: Bloated with many features, to reflect as many |
|
specific solution ideas as possible. This gives the software producer |
|
another advantage: Any competitor will have a hard time supporting the full |
|
feature list of the format, or provide a superior solution to just a few |
|
elements. The customer is forced to buy all or nothing. Why bother with |
|
another data-format when there is already that claims to do everything? |
|
</p> |
|
|
|
<p>Every additional feature or guideline complicates the description of the |
|
data-format exponentially. The disadvantages of bloated formats are |
|
enormous. The developers of a software which needs to handle a data-format |
|
must understand the description in total: this includes the complete text |
|
of the specification and then all possible combinations of its elements. |
|
Having to read and understand less means the resulting software |
|
implementation will be simpler and more accurate. This leads to more |
|
software which can handle the data-format on a high level. What follows is |
|
more competition, choice and therefore more users of this format.</p> |
|
|
|
<p>The more complex a data-format is, the greater the chance that it has |
|
rarely needed features. So the format and the implementation are comparable |
|
to a huge and sprawling mansion: Some rooms are very popular and |
|
well-frequented, while other places are hardly ever visited by people. Of |
|
course such a house is harder to secure. Burglars could push open a lonely |
|
forgotten window to the basement or hide tools in a cobwebbed corner during |
|
an official visit to the premises. </p> |
|
|
|
<p>Experts see complexity as the biggest threat to software security. This is why |
|
many of them are critical or even hostile towards standards. |
|
<a class="fn" id="ref-complexity" href="#fn-complexity">1</a></p> |
|
|
|
<p>To get an understanding of the risks let us take a look at how a |
|
computer deals with written characters. A commonly used standard is Latin-9 |
|
(ISO/IEC 8859-15). It enables a computer to handle text in more than 20 |
|
languages - mostly western European ones. For a single electronic |
|
character, encoded in Latin-9, there are 256 different possible values it |
|
can have. A new standard called Unicode (ISO 10646) is supposed to encode |
|
all languages of the world. Therefore it comes with more than a million |
|
possible values per character. To make things worse, a single character |
|
could be encoded in several different ways. For example in "UTF-8" or |
|
"UCS-2". On one side Unicode is a blessing: Once implemented correctly an |
|
application is prepared to handle hundreds of languages. On the other hand |
|
a programmer cannot fully calculate in her head all the effects a character |
|
might have when looking at the source code of a software. With the 256 |
|
cases of Latin-9 she could. With Unicode the possibility of overview gets |
|
lost. A clever attacker might find combinations the developer did not |
|
think of. This happens on a regular basis. Here are two examples: 1. <a |
|
href="http://en.wikipedia.org/wiki/IDN_homograph_attack">the IDN homograph |
|
attack</a> plays tricks on the users with similar looking Internet addresses. |
|
Cyrillic from the Unicode-Fonts is well suited to this. 2. The developers of a |
|
well known webserver fell prey to <a |
|
href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CAN-2000-0884" >the |
|
possibilities of Unicode in URLs</a>.</p> |
|
|
|
<p>Unsurprisingly there are more applications out there that can handle |
|
Latin-9 better than Unicode. It is the same problem with every "fat" |
|
data-format: There are applications that do not understand the more exotic |
|
features, if not just because it has become impossible to test the myriad |
|
of features. The software will advertise that it can read data-format “X” |
|
- but whether this works in practice is questionable.</p> |
|
|
|
<p>Some data-formats create this problem on purpose: They come in different |
|
revisions. To be sure that software packages are compatible, the user has |
|
to define the precise version of the data-format used. For example there |
|
are three variants (1.0, 1.1 and 1.2) of the Open Document Format (ODF). |
|
It is likely the complexity grows with the number. Certainly there are |
|
many cases where using the features of version 1.0 would be completely |
|
okay. But the default probably is to save files in the newest version the |
|
software supports. For PDF this problem is even more significant. Some <a |
|
href="http://pdfreaders.org/os.en.html">versions or parts of PDFs</a> do |
|
not even make an open standard.</p> |
|
|
|
<p>Who wants to understand how computers work, one of the first things they |
|
are told is that there are 2 different kinds of things Data and programs |
|
(aka "applications"). While data is merely being processed, the programs |
|
contain the instructions that command the computer. Imagine a writing on a |
|
piece of paper: Jump off the bridge! I can read the data, process it by |
|
writing it down or handing it to someone else without problems. But if I |
|
consider it to be instructions, I may easily get hurt following them. It is |
|
the same for computers. Data-formats like ODF, DOC and PDF may, besides |
|
data, also contain instructions for automatic execution ("macros") or |
|
interactive elements (e.g. Javascript). This turns a regular file into a |
|
potential application controlling your computer. Naturally attackers try to |
|
take advantage of this. Like with the <a |
|
href="http://www.cert.org/tech_tips/Melissa_FAQ.html" >Melissa Macro |
|
Virus from 1999</a>.</p> |
|
|
|
<p>Most texts that are being exchanged only need a small fraction of that |
|
what common data-formats have to offer in terms of formatting, mark-up or |
|
layout. A simple file composed of Latin-9 characters can be edited since |
|
decades on every computer by means of a simple text editor or any word |
|
processor. A small subset of HTML 2 could cater for advanced needs like |
|
headlines, bullet-lists and hyperlinks. Alternatively any <a |
|
href="http://en.wikipedia.org/wiki/Creole_%28markup%29">simple textbased |
|
markup language</a> like used by Wikis would work for many tasks. The |
|
Wikipedia pages and web-logs ("blogs") of the world are proof that lot of |
|
content can be expressed by simple means.</p> |
|
|
|
<p>Everyone – except vendors of proprietary software – profits from |
|
different software products competing which each other, while being secure |
|
and interoperable. The minimal principle for data-formats promotes all |
|
this. It just has one rule: Remove everything that is not absolutely |
|
necessary. Aim for your design to be <a |
|
href="http://www.paulgraham.com/taste.html">simple and elegant</a>. A good |
|
solution resembles a set of building blocks where an infinite number of |
|
buildings can be made, just by combining a few types of elements.</p> |
|
|
|
<p>Even though there may be good reasons to choose a data-format which |
|
covers several requirements we should ask ourselves each time: “Can this be |
|
done more simply?”</p> |
|
|
|
<h2 id="fn">Footnotes</h2> |
|
<ol> |
|
<li id="fn-complexity">"Complexity is the main enemy of security", |
|
Ferguson, Niels, and Schneier, Bruce - Practical Cryptography, Wiley, 2003, |
|
ISBN 0-471-22357-3. p146 "9.4.1 Simplicity", pp365- "23 Standards" |
|
<a href="https://www.schneier.com/book-practical.html">https://www.schneier.com/book-practical.html</a> [<a href="#ref-complexity">↲</a>]</li> |
|
</ol> |
|
|
|
Thanks for suggestions, proof reading and translation work to Peter Bubestinger, Philipp Kammerer, the folks from the FSFE DE mailinglist and |
|
Anna F J Morris . |
|
</body> |
|
|
|
<legal type="cc-license"> |
|
<license>https://creativecommons.org/licenses/by-sa/3.0/</license><notice>Neben der Standardlizenz der Webseite steht dieser Artikel unter der Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)</notice> |
|
</legal> |
|
<author id="reiter" /> |
|
<date> |
|
<original content="2014-02-27" /> |
|
</date> |
|
<sidebar/> |
|
<translator>Philipp Kammerer</translator> |
|
</html>
|
|
|