Source files of,,,,, and Contribute:
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

236 lines

  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <html>
  3. <version>1</version>
  4. <head>
  5. <title>Minimalistic Data Format – Open Standards – FSFE</title>
  6. </head>
  7. <body class="article">
  8. <p id="category">
  9. <a href="/activities/os/os.html">Open Standards</a>
  10. </p>
  11. <h1>The minimal principle: because being an open standard is not enough.</h1>
  12. <p>A tool is useless without something to work on. So what do we shape with
  13. our computing tools? Data, information, knowledge, opinions, art – in
  14. short: Content. Content is created, processed and transmitted. Nowadays
  15. much more often directly in some electronic format. The number of people
  16. who own devices and connect to the internet is constantly rising. And they
  17. use it to evolve their ways of working together. </p>
  18. <p>Content is sent from one user to another and back. To do this, the
  19. content must take on some form: The data-format. This defines how content
  20. and its wrapping are to be handled, what is allowed and how each part looks
  21. within a file or stream. Anyone who wants to participate in the data
  22. exchange must use a software application that understands the data-format
  23. in question. Otherwise the content would appear like an unknown foreign
  24. language to your computer. If a data-format does not allow for the
  25. inclusion of pictures, for example, then there is no way to include
  26. pictures with it. The choice of data-format dictates the number of years
  27. for which may access the content (backwards compatibility) and what I am
  28. able do with it. </p>
  29. <p>A single user will probably not feel any effect of her decision when
  30. saving a file in a particular data-format. When an IT-department or a
  31. public administration decides upon a data-format the impact is far greater:
  32. It will dominate their choice of software for several years, possibly
  33. decades. The more an organisation saves its precious writings, recordings
  34. or pictures electronically, the more important it becomes to secure
  35. continued access to the documents. These decisions, directly or indirecty,
  36. lead to the funding of the initial development and maintenance of
  37. data-formats, wether they be "good or bad" formats. The choices taken at
  38. one time naturally affect the available choices in the future: Many
  39. software producers intentionally try to influence users to use a
  40. data-format that they (the producer) control. For example when technical
  41. schematics of vehicles, buildings or machinery are all held in a format
  42. controlled by the software producer, the producer of the CAD application
  43. can in essence hold the data for ransom when its time to renew the
  44. contracts. From the vendor’s point of view this is a strong position to be
  45. in for the next pricing negotiation. Occasionally, whole countries have
  46. managed to maneuver themselves into the losing end of this situation. </p>
  47. <p>As you can see, a good data-format can only be an <a
  48. href="/activities/os/def.html">Open Standard</a>. This requirement
  49. alone, however, is not enough. The data-format needs to solve a problem
  50. adequately: It should be a good fit from a functional point of view, as
  51. well as on a technical level. In order to judge this, there are a number
  52. of things to consider. The <a
  53. href="">Essay by Bert
  54. Bos</a> explains the design principles of the W3C - the organisation which
  55. develops the formats of the world wide web. He mentions efficiency,
  56. maintainability, accessibility, extensibility, learnability, simplicity,
  57. longevity and a few more.</p>
  58. <p>Two central questions here are:</p><ul>
  59. <li>How well does the data-format solve the problem? </li>
  60. <li>Is there a simpler format that could solve the problem just as well?</li></ul>
  61. <p>The first question is self-explanatory: Whoever wants to save, transmit
  62. and search within a text would not want a format for pixel based images –
  63. though it would be inevitable to use such a format during the first step of
  64. scanning papers or incoming faxes.</p>
  65. <p>The second question is much more interesting: Is the format as simple as
  66. possible and as complicated as necessary? It is very hard to design or
  67. choose a data-format which follows this principle of minimalism.</p>
  68. <p>Firstly, there is the anti-pattern of <a
  69. href="">“Design
  70. by Committee”</a>, where several decision makers participate in each decision.
  71. Decisions about which software product to use within an organisation – especially in public ones – are also often made by large committees.
  72. Then it easily happens that too many cooks spoil the broth and add more into the standards than is
  73. actually necessary. The W3C at least <a
  74. href="">
  75. is aware of this pattern</a>. Many groups are not.</p>
  76. <p>A second problem is the common use of checklists when evaluating
  77. software solutions. Typically it goes like this: Every stakeholder can add
  78. something to the list; the given wishes are often specific solution ideas
  79. and get condensed into the checklist for the procurement departement; the
  80. software product promising to fulfill most of the items on the checklist,
  81. wins; most of the time this means buying a single data-format which has
  82. many, rarely used and unneeded, features. It would be better to add
  83. features with a focus on the problem (rather than the solution) to begin
  84. with. The evaluation process should reward higher grades for solutions
  85. which consist of a number of simple, easily extensible and complementary
  86. data-formats which can be combined for the more complex needs.</p>
  87. <p>But software vendors know their customers: The more features on a
  88. checklist are ticked off, the more precious a software will appear. That is
  89. because it seems to – at first glance – serve many needs. Except for the
  90. need for simple elegance. And so this is what the software and the
  91. data-format will look like: Bloated with many features, to reflect as many
  92. specific solution ideas as possible. This gives the software producer
  93. another advantage: Any competitor will have a hard time supporting the full
  94. feature list of the format, or provide a superior solution to just a few
  95. elements. The customer is forced to buy all or nothing. Why bother with
  96. another data-format when there is already that claims to do everything?
  97. </p>
  98. <p>Every additional feature or guideline complicates the description of the
  99. data-format exponentially. The disadvantages of bloated formats are
  100. enormous. The developers of a software which needs to handle a data-format
  101. must understand the description in total: this includes the complete text
  102. of the specification and then all possible combinations of its elements.
  103. Having to read and understand less means the resulting software
  104. implementation will be simpler and more accurate. This leads to more
  105. software which can handle the data-format on a high level. What follows is
  106. more competition, choice and therefore more users of this format.</p>
  107. <p>The more complex a data-format is, the greater the chance that it has
  108. rarely needed features. So the format and the implementation are comparable
  109. to a huge and sprawling mansion: Some rooms are very popular and
  110. well-frequented, while other places are hardly ever visited by people. Of
  111. course such a house is harder to secure. Burglars could push open a lonely
  112. forgotten window to the basement or hide tools in a cobwebbed corner during
  113. an official visit to the premises. </p>
  114. <p>Experts see complexity as the biggest threat to software security. This is why
  115. many of them are critical or even hostile towards standards.
  116. <a class="fn" id="ref-complexity" href="#fn-complexity">1</a></p>
  117. <p>To get an understanding of the risks let us take a look at how a
  118. computer deals with written characters. A commonly used standard is Latin-9
  119. (ISO/IEC 8859-15). It enables a computer to handle text in more than 20
  120. languages - mostly western European ones. For a single electronic
  121. character, encoded in Latin-9, there are 256 different possible values it
  122. can have. A new standard called Unicode (ISO 10646) is supposed to encode
  123. all languages of the world. Therefore it comes with more than a million
  124. possible values per character. To make things worse, a single character
  125. could be encoded in several different ways. For example in "UTF-8" or
  126. "UCS-2". On one side Unicode is a blessing: Once implemented correctly an
  127. application is prepared to handle hundreds of languages. On the other hand
  128. a programmer cannot fully calculate in her head all the effects a character
  129. might have when looking at the source code of a software. With the 256
  130. cases of Latin-9 she could. With Unicode the possibility of overview gets
  131. lost. A clever attacker might find combinations the developer did not
  132. think of. This happens on a regular basis. Here are two examples: 1. <a
  133. href="">the IDN homograph
  134. attack</a> plays tricks on the users with similar looking Internet addresses.
  135. Cyrillic from the Unicode-Fonts is well suited to this. 2. The developers of a
  136. well known webserver fell prey to <a
  137. href="" >the
  138. possibilities of Unicode in URLs</a>.</p>
  139. <p>Unsurprisingly there are more applications out there that can handle
  140. Latin-9 better than Unicode. It is the same problem with every "fat"
  141. data-format: There are applications that do not understand the more exotic
  142. features, if not just because it has become impossible to test the myriad
  143. of features. The software will advertise that it can read data-format “X”
  144. - but whether this works in practice is questionable.</p>
  145. <p>Some data-formats create this problem on purpose: They come in different
  146. revisions. To be sure that software packages are compatible, the user has
  147. to define the precise version of the data-format used. For example there
  148. are three variants (1.0, 1.1 and 1.2) of the Open Document Format (ODF).
  149. It is likely the complexity grows with the number. Certainly there are
  150. many cases where using the features of version 1.0 would be completely
  151. okay. But the default probably is to save files in the newest version the
  152. software supports. For PDF this problem is even more significant. Some <a
  153. href="">versions or parts of PDFs</a> do
  154. not even make an open standard.</p>
  155. <p>Who wants to understand how computers work, one of the first things they
  156. are told is that there are 2 different kinds of things Data and programs
  157. (aka "applications"). While data is merely being processed, the programs
  158. contain the instructions that command the computer. Imagine a writing on a
  159. piece of paper: Jump off the bridge! I can read the data, process it by
  160. writing it down or handing it to someone else without problems. But if I
  161. consider it to be instructions, I may easily get hurt following them. It is
  162. the same for computers. Data-formats like ODF, DOC and PDF may, besides
  163. data, also contain instructions for automatic execution ("macros") or
  164. interactive elements (e.g. Javascript). This turns a regular file into a
  165. potential application controlling your computer. Naturally attackers try to
  166. take advantage of this. Like with the <a
  167. href="" >Melissa Macro
  168. Virus from 1999</a>.</p>
  169. <p>Most texts that are being exchanged only need a small fraction of that
  170. what common data-formats have to offer in terms of formatting, mark-up or
  171. layout. A simple file composed of Latin-9 characters can be edited since
  172. decades on every computer by means of a simple text editor or any word
  173. processor. A small subset of HTML 2 could cater for advanced needs like
  174. headlines, bullet-lists and hyperlinks. Alternatively any <a
  175. href="">simple textbased
  176. markup language</a> like used by Wikis would work for many tasks. The
  177. Wikipedia pages and web-logs ("blogs") of the world are proof that lot of
  178. content can be expressed by simple means.</p>
  179. <p>Everyone – except vendors of proprietary software – profits from
  180. different software products competing which each other, while being secure
  181. and interoperable. The minimal principle for data-formats promotes all
  182. this. It just has one rule: Remove everything that is not absolutely
  183. necessary. Aim for your design to be <a
  184. href="">simple and elegant</a>. A good
  185. solution resembles as set of building blocks where an infinite number of
  186. buildings can be made, just by combining a few types of elements.</p>
  187. <p>Even though there may be good reasons to choose a data-format which
  188. covers several requirements we should ask ourselves each time: “Can this be
  189. done more simply?”</p>
  190. <h2 id="fn">Footnotes</h2>
  191. <ol>
  192. <li id="fn-complexity">"Complexity is the main enemy of security",
  193. Ferguson, Niels, and Schneier, Bruce - Practical Cryptography, Wiley, 2003,
  194. ISBN 0-471-22357-3. p146 "9.4.1 Simplicity", pp365- "23 Standards"
  195. <a href=""></a> [<a href="#ref-complexity">&#8626;</a>]</li>
  196. </ol>
  197. Thanks for suggestions, proof reading and translation work to Peter Bubestinger, Philipp Kammerer, the folks from the FSFE DE mailinglist and
  198. Anna F J Morris .
  199. </body>
  200. <legal type="cc-license">
  201. <license></license><notice>Neben der Standardlizenz der Webseite steht dieser Artikel unter der Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)</notice>
  202. </legal>
  203. <author id="reiter" />
  204. <date>
  205. <original content="2014-02-27" />
  206. </date>
  207. <sidebar/>
  208. <translator>Philipp Kammerer</translator>
  209. </html>