Source files of fsfe.org, pdfreaders.org, freeyourandroid.org, ilovefs.org, drm.info, and test.fsfe.org. Contribute: https://fsfe.org/contribute/web/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

minimalisticstandards.en.xhtml 13KB


  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <html>
  3. <head>
  4. <title>Minimalistic Data Format – Open Standards – FSFE</title>
  5. </head>
  6. <body id="article">
  7. <p id="category">
  8. <a href="/activities/os/os.html">Open Standards</a>
  9. </p>
  10. <h1>Minimalistic Standards, because being an Open Standard is not enough.</h1>
  11. <p>A tool is useless without a piece to work on. What are workpieces to our
  12. computers? Data, information, knowledge, opinions, art – in short: Content. It is being
  13. created, processed and transmitted, often directly in an electronic format. More and more
  14. people have a device with an internet connection available, using it to apply evolution
  15. to their ways of working together.
  16. </p>
  17. <p>Content is send from one user to another and back. For this it needs to take
  18. on some form: The data-format, which defines rules how content and its
  19. wrapping is handled, what is allowed and how the bit look within a file or over an
  20. online connection. Whoever wants to participate must use a software that understands this
  21. data-format. Otherwise the content would appear like an unkonwn foreign language to the application.
  22. If a data-format doesn’t allow to include pictures, then I simply can’t save pictures with it.
  23. The choice which data-format is used dictates how long I may access the content
  24. and what I may do with it.
  25. </p>
  26. <p>When saving a file in a particular data format, a single user probably won’t feel any effect
  27. of his decision. When a IT-department or a public authority
  28. decides which data-format they want to use it has a great impact. The choice of software
  29. going along with the data-format has an effect for years or decades. The more precious
  30. writings, recordings or pictures are saved electronically the more valuable it becomes to be
  31. able to access them.
  32. Conciously or indirectly do these decisions drive the funding of the initial development
  33. or maintenance of data-formats.
  34. Many software producers intentionally try to influence users to
  35. use one of the data-formats the vendor controls. For example for technical schematics of
  36. vehicles, buildings or machinery. The producer of the according CAD application
  37. basically can hold the data for ransom. From the vendor’s point of view this is a strong position
  38. in the upcoming negotiation about the price of the new software version.
  39. Sometimes whole countries end up in such a situation.
  40. </p>
  41. <p>Therefore a good data-format can only be an <a
  42. href="/activities/os/def.html">Open Standard</a>.
  43. This requirement however is not enough. The data-format needs to solve a problem properly.
  44. It needs to fit from a functional as well as from a technical point of view.
  45. For this many asspects can be considered. The <a
  46. href="http://jendryschik.de/wsdev/trans/designguide/">Essay by Bert
  47. Bos</a> explains the design principle of the W3C - the organisation which develops the formats
  48. of the world wide web. Among others he mentions efficiency, maintainability, accessibility,
  49. extensibility, learnability, simplicity and durability.</p>
  50. <p>Two central questions hereby are:</p><ul>
  51. <li>How well does the data-format solve the problem? And:</li>
  52. <li>Is it the most simple data-format available or is there an even more simple one?</li></ul>
  53. <p>The first question is self-explanatory: Whoever wants to save, transmit and search within
  54. a text would not want a format for pixel based images – though it is inevitable
  55. using such a format during the first step of scanning papers or facsimiles.</p>
  56. <p>The second question is much more interesting: Is the format as easy as possible and as
  57. complicated as necessary? It hard to design or choose a data-format which correspondents
  58. to this rule of minimalism.</p>
  59. <p>For on there is the bad influence of a patter called <a
  60. href="http://sourcemaking.com/antipatterns/design-by-committee">“Design
  61. by Committee”</a>, which stands for the <a
  62. href="http://webstandard.kulando.de/post/2010/07/21/design-by-committee-gestaltung-durch-viele-entscheider">
  63. participation of several decision makers</a> on a technical question. Often many people
  64. are involved on the development of a standard. Decisions about what software is to be used
  65. within an organisation – especially in public ones – are also often made by large committees.
  66. It easily happens that too many cooks spoil the broth and add more than
  67. actually necessary. The W3C at least <a
  68. href="http://www.w3.org/People/Bos/DesignGuide/committee.html">
  69. is aware of this pattern, says Bos</a>. Many others are not.</p>
  70. <p>In addition many use a checklist when evaluating software solutions. Everyone can
  71. add something to the list. These wishes often are specific ideas for a solution and afterwards
  72. they are compiled into a dense list with all the necessary requirements for the new software.
  73. The software solution checking most marks wins. Most of the time this leads to one data-format
  74. which has many unneeded features. It would be better if wishes were being added
  75. in a problem orientated manner and higher grades are given for solutions which
  76. work with a number of simple, easy extensible and combinable data-formats.</p>
  77. <p>Software manufacturers know their customers. The more features on the checklist are ticked
  78. the more precious a software appears. That is because it can – on a quick glimpse –
  79. serve many needs. Except the need for simple elegance. And that is why the software and the data-format
  80. often ends up looking: Bloated with many features, each directly corresponding to one of the
  81. proposed technical solution idea. This give the software producer another edge:
  82. Any competitor will have a hard time to process the complete format or to offer a superior alternative
  83. complete solution. The customer is forced to buy all or nothing. Why another data-format
  84. when there is one that can do everything?</p>
  85. MARK
  86. <p>Every additional feature or guideline complicates the description of the data-format
  87. exponentially. The disadvantages are immane. The developers of a software
  88. that needs to handle a data-format need to understand the description fully. This includes
  89. the whole text as well as all possible combinations of the contained elements. To read less
  90. and understand more leads to a more easy and secure software. This leads to more software
  91. packages that can handle this data-format on a high level. What follows is more competition,
  92. choice and therefore more user for this format.</p>
  93. <p>The more tricky a data-format is, the greater the chance there are
  94. rarely needed features. This format and the implementation are comparable to a
  95. huge and angled house. Some rooms are very populated others are virtually never entered.
  96. Of course such a house is hard to secure. Burglars could open a long forgotten window to
  97. the basement or while walking through the hallway hide something in a dark staircase.</p>
  98. <p>Experts see complexity as the greatest problem for software security. Because of this
  99. many are critical or even hostile towards standards.
  100. <a class="fn" id="ref-complexity" href="#fn-complexity">1</a></p>
  101. <p>To grasp the risks just take a look at how a computer renders fonts: There is the very
  102. commonly used standard ISO/IEC 8859-15 (Latin-9). More than 20, mostly western European
  103. languages could be processed with it. For a single character there are 256 different possibilities.
  104. A new standard namely Unicode (ISO 10646) is supposed to encode all languages. It needs many
  105. more – more than one million – possibilities. In addition a character could be coded with
  106. two different ways. For example with UTF-8 or UCS-2. On one side Unicode is a blessing:
  107. Programmed correctly once an application is prepared to feature hundreds of languages. On
  108. the other hand a programmer can’t possibly predict what could happen with all the characters
  109. in the source code. With the 256 cases with Latin-9 she could. With Unicode this overview
  110. is missing. A feisty attacker might find combinations the developer didn’t think of. This
  111. happens on a regular basis. Here are two examples: 1. (DE)<a
  112. href="http://de.wikipedia.org/wiki/Homographischer_Angriff">Der
  113. homographische Angriff</a> / (EN)<a href="http://en.wikipedia.org/wiki/IDN_homograph_attack">
  114. the homograph attack</a>
  115. frauds the user with similar looking Internet addresses. Cyrillic from the Unicode-Fonts is
  116. suitable for this. 2. The developers of a well known webserver have been <a
  117. href="https://www.bsi.bund.de/ContentBSI/grundschutz/kataloge/m/m05/m05102.html">pwned by URIs in
  118. Unicode</a>.</p>
  119. <p>It is to no surprise that there are more applications out there that can handle Latin-9
  120. more correctly than Unicode. The problem is identical with every “thicker” specified data-format:
  121. There are applications that don’t understand the exotic features. Especially because there
  122. are so many features so it is impossible to test. The adverts say the software can read the
  123. data-format “X” but whether this works in practice is questionable.</p>
  124. <p>Some data-formats use this problem on purpose: There are different versions. Who likes to
  125. certainty of all applications are compatible needs to express exactly which version.
  126. For example there are three (1.0, 1.1 and 1.2) variants from the Open Document Format (ODF).
  127. Probably with increasing complexity. Are probably many uses in which version 1.0 is
  128. sufficient. But the preset would probably be the newest version the application supports.
  129. For PDF this problem is even more significant. Some <a
  130. href="http://pdfreaders.org/os.de.html">versions or parts of a PDF</a> doesn’t even
  131. suffice as an open standard.</p>
  132. <p>Who likes to understand computers is being told that there are two different things:
  133. Data and programmes. While data is merely processed the programmes contain commands for
  134. the computer. The difference is clarified with a sticky note saying: Jump from a bridge!
  135. I can read this note, write it and pass it on (process) without any problems. But if I
  136. regard it as a command and execute it then I probably will land on my nose. With computers
  137. it’s the same. Data-formats like ODF, Doc an PDF may contain data and commands for automatic
  138. procession (“Macros”) or interactive elements (Javascript). This turns a regular file into
  139. a potential application with commands for your computer. Naturally attackers try to take
  140. advantage of this. Like with the (DE)<a
  141. href="https://www.bsi-fuer-buerger.de/ContentBSIFB/GefahrenImNetz/Schadprogramme/Viren/viren.html">Macro-Viruses</a> / (EN)<a href="http://en.wikipedia.org/wiki/Macro_virus">Macro-Viruses</a>.</p>
  142. <p>Most texts which are transmitted only need a small fraction of that what common
  143. data-formats have to offer on formatting, mark-up or layout. Since decades a simple file
  144. composed of Latin-9 characters can be edited on every computer with a simple text editor
  145. and all word processors. With increasing demands a small part of HTML 2 could suffice for
  146. headlines, lists and links. Or a (DE)<a
  147. href="http://de.wikipedia.org/wiki/Creole_(Markup)">simple</a> / (EN)<a href="http://en.wikipedia.org/wiki/Creole_%28markup%29">
  148. simple textbased markup</a>, as it is used in Wikis. Wikipedias and Weblogs of the world
  149. proof that lots of content can be expressed with these simple means.</p>
  150. <p>All – except manufacturers of proprietary software – are interested in competing
  151. software and secure products which are interoperable. The minimal rule for data-formats
  152. facilitates all this. It’s meaning is to leave away everything that is not necessarily
  153. needed. The aim is a (DE)<a
  154. href="http://magplot.de/TasteForMakers">simple and elegant design</a> / (EN)<a href="http://www.paulgraham.com/taste.html">
  155. simple and elegant design</a>. A nice solution is a kit with which infinite works may
  156. be created with just a few elements.</p>
  157. <p>Even though there are good reasons to choose a data-format which covers several
  158. requirements we should ask ourselves: “Can’t we do that simpler?”</p>
  159. <h2 id="fn">Footnotes</h2>
  160. <ol>
  161. <li id="fn-complexity">"Complexity is the main enemy of security",
  162. Ferguson, Niels, and Schneier, Bruce - Practical Cryptography, Wiley, 2003,
  163. ISBN 0-471-22357-3. p146 "9.4.1 Simplicity", pp365- "23 Standards"
  164. <a href="http://www.macfergus.com/pc">http://www.macfergus.com/pc</a> [<a href="#ref-complexity">&#8626;</a>]</li>
  165. </ol>
  166. </body>
  167. <timestamp>$Date$ $Author$</timestamp>
  168. <tags>
  169. <tag>open-standards</tag>
  170. </tags>
  171. <legal type="cc-license">
  172. <license>https://creativecommons.org/licenses/by-sa/3.0/</license><notice>Neben der Standardlizenz der Webseite steht dieser Artikel unter der Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)</notice>
  173. </legal>
  174. <author id="reiter" />
  175. <!-- <date>
  176. <original content="2012-03-23" />
  177. </date> -->
  178. <translator>Philipp Kammerer</translator>
  179. </html>