246 lines
14 KiB
HTML
246 lines
14 KiB
HTML
<?xml version="1.0" encoding="UTF-8" ?>
|
||
|
||
<html>
|
||
<head>
|
||
<title>Minimalistic Data Format – Open Standards – FSFE</title>
|
||
</head>
|
||
<body id="article">
|
||
<p id="category">
|
||
<a href="/activities/os/os.html">Open Standards</a>
|
||
</p>
|
||
<h1>The minimal principle: because being an open standard is not enough.</h1>
|
||
|
||
<p>A tool is useless without something to work on. So what do we shape with
|
||
our computing tools? Data, information, knowledge, opinions, art – in
|
||
short: Content. Content is created, processed and transmitted. Nowadays
|
||
much more often directly in some electronic format. The number of people
|
||
who own devices and connect to the internet is constantly rising. And they
|
||
use it to evolve their ways of working together. </p>
|
||
|
||
<p>Content is sent from one user to another and back. To do this, the
|
||
content must take on some form: The data-format. This defines how content
|
||
and its wrapping are to be handled, what is allowed and how each part looks
|
||
within a file or stream. Anyone who wants to participate in the data
|
||
exchange must use a software application that understands the data-format
|
||
in question. Otherwise the content would appear like an unknown foreign
|
||
language to your computer. If a data-format does not allow for the
|
||
inclusion of pictures, for example, then there is no way to include
|
||
pictures with it. The choice of data-format dictates the number of years
|
||
for which may access the content (backwards compatibility) and what I am
|
||
able do with it. </p>
|
||
|
||
<p>A single user will probably not feel any effect of her decision when
|
||
saving a file in a particular data-format. When an IT-department or a
|
||
public administration decides upon a data-format the impact is far greater:
|
||
It will dominate their choice of software for several years, possibly
|
||
decades. The more an organisation saves its precious writings, recordings
|
||
or pictures electronically, the more important it becomes to secure
|
||
continued access to the documents. These decisions, directly or indirecty,
|
||
lead to the funding of the initial development and maintenance of
|
||
data-formats, wether they be "good or bad" formats. The choices taken at
|
||
one time naturally affect the available choices in the future: Many
|
||
software producers intentionally try to influence users to use a
|
||
data-format that they (the producer) control. For example when technical
|
||
schematics of vehicles, buildings or machinery are all held in a format
|
||
controlled by the software producer, the producer of the CAD application
|
||
can in essence hold the data for ransom when its time to renew the
|
||
contracts. From the vendor’s point of view this is a strong position to be
|
||
in for the next pricing negotiation. Occasionally, whole countries have
|
||
managed to maneuver themselves into the losing end of this situation. </p>
|
||
|
||
<p>As you can see, a good data-format can only be an <a
|
||
href="/activities/os/def.html">Open Standard</a>. This requirement
|
||
alone, however, is not enough. The data-format needs to solve a problem
|
||
adequately: It should be a good fit from a functional point of view, as
|
||
well as on a technical level. In order to judge this, there are a number
|
||
of things to consider. The <a
|
||
href="http://www.w3.org/People/Bos/DesignGuide/introduction">Essay by Bert
|
||
Bos</a> explains the design principles of the W3C - the organisation which
|
||
develops the formats of the world wide web. He mentions efficiency,
|
||
maintainability, accessibility, extensibility, learnability, simplicity,
|
||
longevity and a few more.</p>
|
||
|
||
<p>Two central questions here are:</p><ul>
|
||
<li>How well does the data-format solve the problem? </li>
|
||
<li>Is there a simpler format that could solve the problem just as well?</li></ul>
|
||
|
||
<p>The first question is self-explanatory: Whoever wants to save, transmit
|
||
and search within a text would not want a format for pixel based images –
|
||
though it would be inevitable to use such a format during the first step of
|
||
scanning papers or incoming faxes.</p>
|
||
|
||
<p>The second question is much more interesting: Is the format as simple as
|
||
possible and as complicated as necessary? It is very hard to design or
|
||
choose a data-format which follows this principle of minimalism.</p>
|
||
|
||
<p>Firstly, there is the anti-pattern of <a
|
||
href="http://sourcemaking.com/antipatterns/design-by-committee">“Design
|
||
by Committee”</a>, where several decision makers participate in each decision.
|
||
Decisions about which software product to use within an organisation – especially in public ones – are also often made by large committees.
|
||
Then it easily happens that too many cooks spoil the broth and add more into the standards than is
|
||
actually necessary. The W3C at least <a
|
||
href="http://www.w3.org/People/Bos/DesignGuide/committee.html">
|
||
is aware of this pattern</a>. Many groups are not.</p>
|
||
|
||
|
||
<p>A second problem is the common use of checklists when evaluating
|
||
software solutions. Typically it goes like this: Every stakeholder can add
|
||
something to the list; the given wishes are often specific solution ideas
|
||
and get condensed into the checklist for the procurement departement; the
|
||
software product promising to fulfill most of the items on the checklist,
|
||
wins; most of the time this means buying a single data-format which has
|
||
many, rarely used and unneeded, features. It would be better to add
|
||
features with a focus on the problem (rather than the solution) to begin
|
||
with. The evaluation process should reward higher grades for solutions
|
||
which consist of a number of simple, easily extensible and complementary
|
||
data-formats which can be combined for the more complex needs.</p>
|
||
|
||
<p>But software vendors know their customers: The more features on a
|
||
checklist are ticked off, the more precious a software will appear. That is
|
||
because it seems to – at first glance – serve many needs. Except for the
|
||
need for simple elegance. And so this is what the software and the
|
||
data-format will look like: Bloated with many features, to reflect as many
|
||
specific solution ideas as possible. This gives the software producer
|
||
another advantage: Any competitor will have a hard time supporting the full
|
||
feature list of the format, or provide a superior solution to just a few
|
||
elements. The customer is forced to buy all or nothing. Why bother with
|
||
another data-format when there is already that claims to do everything?
|
||
</p>
|
||
|
||
<p>Every additional feature or guideline complicates the description of the
|
||
data-format exponentially. The disadvantages of bloated formats are
|
||
enormous. The developers of a software which needs to handle a data-format
|
||
must understand the description in total: this includes the complete text
|
||
of the specification and then all possible combinations of its elements.
|
||
Having to read and understand less means the resulting software
|
||
implementation will be simpler and more accurate. This leads to more
|
||
software which can handle the data-format on a high level. What follows is
|
||
more competition, choice and therefore more users of this format.</p>
|
||
|
||
<p>The more complex a data-format is, the greater the chance that it has
|
||
rarely needed features. So the format and the implementation are comparable
|
||
to a huge and sprawling mansion: Some rooms are very popular and
|
||
well-frequented, while other places are hardly ever visited by people. Of
|
||
course such a house is harder to secure. Burglars could push open a lonely
|
||
forgotten window to the basement or hide tools in a cobwebbed corner during
|
||
an official visit to the premises. </p>
|
||
|
||
<p>Experts see complexity as the biggest threat to software security. This is why
|
||
many of them are critical or even hostile towards standards.
|
||
<a class="fn" id="ref-complexity" href="#fn-complexity">1</a></p>
|
||
|
||
<p>To get an understanding of the risks let us take a look at how a
|
||
computer deals with written characters. A commonly used standard is Latin-9
|
||
(ISO/IEC 8859-15). It enables a computer to handle text in more than 20
|
||
languages - mostly western European ones. For a single electronic
|
||
character, encoded in Latin-9, there are 256 different possible values it
|
||
can have. A new standard called Unicode (ISO 10646) is supposed to encode
|
||
all languages of the world. Therefore it comes with more than a million
|
||
possible values per character. To make things worse, a single character
|
||
could be encoded in several different ways. For example in "UTF-8" or
|
||
"UCS-2". On one side Unicode is a blessing: Once implemented correctly an
|
||
application is prepared to handle hundreds of languages. On the other hand
|
||
a programmer cannot fully calculate in her head all the effects a character
|
||
might have when looking at the source code of a software. With the 256
|
||
cases of Latin-9 she could. With Unicode the possibility of overview gets
|
||
lost. A clever attacker might find combinations the developer did not
|
||
think of. This happens on a regular basis. Here are two examples: 1. <a
|
||
href="http://en.wikipedia.org/wiki/IDN_homograph_attack">the IDN homograph
|
||
attack</a> plays tricks on the users with similar looking Internet addresses.
|
||
Cyrillic from the Unicode-Fonts is well suited to this. 2. The developers of a
|
||
well known webserver fell prey to <a
|
||
href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CAN-2000-0884" >the
|
||
possibilities of Unicode in URLs</a>.</p>
|
||
|
||
<p>Unsurprisingly there are more applications out there that can handle
|
||
Latin-9 better than Unicode. It is the same problem with every "fat"
|
||
data-format: There are applications that do not understand the more exotic
|
||
features, if not just because it has become impossible to test the myriad
|
||
of features. The software will advertise that it can read data-format “X”
|
||
- but whether this works in practice is questionable.</p>
|
||
|
||
<p>Some data-formats create this problem on purpose: They come in different
|
||
revisions. To be sure that software packages are compatible, the user has
|
||
to define the precise version of the data-format used. For example there
|
||
are three variants (1.0, 1.1 and 1.2) of the Open Document Format (ODF).
|
||
It is likely the complexity grows with the number. Certainly there are
|
||
many cases where using the features of version 1.0 would be completely
|
||
okay. But the default probably is to save files in the newest version the
|
||
software supports. For PDF this problem is even more significant. Some <a
|
||
href="http://pdfreaders.org/os.en.html">versions or parts of PDFs</a> do
|
||
not even make an open standard.</p>
|
||
|
||
<p>Who wants to understand how computers work, one of the first things they
|
||
are told is that there are 2 different kinds of things Data and programs
|
||
(aka "applications"). While data is merely being processed, the programs
|
||
contain the instructions that command the computer. Imagine a writing on a
|
||
piece of paper: Jump off the bridge! I can read the data, process it by
|
||
writing it down or handing it to someone else without problems. But if I
|
||
consider it to be instructions, I may easily get hurt following them. It is
|
||
the same for computers. Data-formats like ODF, DOC and PDF may, besides
|
||
data, also contain instructions for automatic execution ("macros") or
|
||
interactive elements (e.g. Javascript). This turns a regular file into a
|
||
potential application controlling your computer. Naturally attackers try to
|
||
take advantage of this. Like with the <a
|
||
href="http://www.cert.org/tech_tips/Melissa_FAQ.html" >Melissa Macro
|
||
Virus from 1999</a>.</p>
|
||
|
||
<p>Most texts that are being exchanged only need a small fraction of that
|
||
what common data-formats have to offer in terms of formatting, mark-up or
|
||
layout. A simple file composed of Latin-9 characters can be edited since
|
||
decades on every computer by means of a simple text editor or any word
|
||
processor. A small subset of HTML 2 could cater for advanced needs like
|
||
headlines, bullet-lists and hyperlinks. Alternatively any <a
|
||
href="http://en.wikipedia.org/wiki/Creole_%28markup%29">simple textbased
|
||
markup language</a> like used by Wikis would work for many tasks. The
|
||
Wikipedia pages and web-logs ("blogs") of the world are proof that lot of
|
||
content can be expressed by simple means.</p>
|
||
|
||
<p>Everyone – except vendors of proprietary software – profits from
|
||
different software products competing which each other, while being secure
|
||
and interoperable. The minimal principle for data-formats promotes all
|
||
this. It just has one rule: Remove everything that is not absolutely
|
||
necessary. Aim for your design to be <a
|
||
href="http://www.paulgraham.com/taste.html">simple and elegant</a>. A good
|
||
solution resembles as set of building blocks where an infinite number of
|
||
buildings can be made, just by combining a few types of elements.</p>
|
||
|
||
<p>Even though there may be good reasons to choose a data-format which
|
||
covers several requirements we should ask ourselves each time: “Can this be
|
||
done more simply?”</p>
|
||
|
||
<h2 id="fn">Footnotes</h2>
|
||
<ol>
|
||
<li id="fn-complexity">"Complexity is the main enemy of security",
|
||
Ferguson, Niels, and Schneier, Bruce - Practical Cryptography, Wiley, 2003,
|
||
ISBN 0-471-22357-3. p146 "9.4.1 Simplicity", pp365- "23 Standards"
|
||
<a href="http://www.schneier.com/book-practical.html">http://www.schneier.com/book-practical.html</a> [<a href="#ref-complexity">↲</a>]</li>
|
||
</ol>
|
||
|
||
Thanks for suggestions, proof reading and translation work to Peter Bubestinger, Philipp Kammerer, the folks from the FSFE DE mailinglist and
|
||
Anna F J Morris .
|
||
</body>
|
||
|
||
<timestamp>$Date: 2012-11-10 23:10:43 +0000 (Sat, 10 Nov 2012) $ $Author: bernhard $</timestamp>
|
||
<tags>
|
||
<tag>openstandards</tag>
|
||
</tags>
|
||
<legal type="cc-license">
|
||
<license>https://creativecommons.org/licenses/by-sa/3.0/</license><notice>Neben der Standardlizenz der Webseite steht dieser Artikel unter der Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)</notice>
|
||
</legal>
|
||
<author id="reiter" />
|
||
<date>
|
||
<original content="2014-02-27" />
|
||
</date>
|
||
<translator>Philipp Kammerer</translator>
|
||
</html>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|