Source files of,,,,, and Contribute:
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

317 lines
19 KiB

<?xml version="1.0" encoding="UTF-8" ?>
<title>Artificial Intelligence and Free Software</title>
<body class="article">
Controlling technology at the age of Artificial Intelligence: a Free
Software perspective
<div id="introduction">
Technical improvements, the accumulation of large, detailed datasets and
advancement in computer hardware have led to an Artificial Intelligence
(AI) revolution. For example, breakthroughs in computer vision as well
as the building of large datasets and amelioration in text analysis
coupled with the gathering of personal data have given birth to
countless AI applications. These new AI applications have given many
benefits to European Union citizens. However, because of its inherent
complexity and requirements in technical resources and knowledge, AI may
undermine our ability to control technology and put fundamental freedoms
at risk. Therefore, introducing new legislation on AI is a worthwhile
In the context of a new legislation, this article explains how releasing
AI applications under Free Software licences paves the way for more
accessibility, transparency, and fairness.
<h2 id="freesoftware">What is Free Software?</h2>
<a href="freesoftware.html">Free Software</a> (also known as Open Source)
empowers people to control technology by granting four freedoms to each
The freedom to use software for any purpose, without geographical
The freedom to study software, without any non-disclosure agreement;
<li>The freedom to share software and copy it at no cost;</li>
<li>The freedom to improve software and share the improvements.</li>
These freedoms are granted by releasing software under a Free Software
licence, whose terms are compatible with the aforementioned freedoms. There
exist multiple Free Software licences with different goals. Software may
be licensed under more than one license. Because in order to be freely
modified, an AI application requires its training code and training data,
both need to be released under a Free Software license to consider the AI
as being Free.
<h2 id="accessibility">Accessibility</h2>
Accessibility for AI means making it reusable, so that everyone may tinker
with it, improve it and use for their own purposes. To make AI reusable, it can
be released under a Free Software license. The advantages of this approach
are many. By having open legal grounds, Free AI fosters innovation,
because one does not have to deal with artificial restrictions that prevent
people from reusing work. Making AI Free therefore saves everyone from
having to reinvent the wheel, making researchers and developers alike able
to focus on creating new, better AI software instead of rebuilding blocks
and reproducing previous work again and again. In addition to improving
efficiency, by sharing expertise, Free AI lowers the cost of
development by saving time and removing license fees. All of this improves
accessibility of AI, which leads to better and more democratic solutions as
everyone can participate.
Making AI reusable also makes it easier to base specialised AI models upon
more generic ones. If a generic AI model is released as Free Software,
rather than training a new model from scratch, one can leverage the
generic model as a starting point for a specific, downstream prediction
task. For example, one can use a generic computer vision model<a
href="#fn-1" id="ref-1" class="fn">1</a><span class="fn">,</span><a
href="#fn-2" id="ref-2" class="fn">2</a> as a starting point for managing
public infrastructure which requires specific image treatments. Just as
with accessibility in general, this approach has a key advantage: generic
models with a lot of parameters and trained on large datasets may make the
downstream task easier to learn. This makes AI more accessible by lowering
the barrier to entry by making it easier to reuse works.
However, making both the source code used to train the AI application and
the corresponding data Free is sometimes not enough to make it
accessible. AI requires a huge amount of data in order to identify
patterns and correlations which lead to correct predictions. On the
contrary, not having enough data reduces its ability to understand the
world. Furthermore, big datasets and their inherent complexity tend to
make AI models large, making their training time-consuming and
resource-intensive. The complexity in handling the data required to train AI
models, coupled with the knowledge required to develop them and manage
a huge computer capacity, demands a lot of human resources. Therefore, it may be
hard to exercise the freedoms offered by Free AI, even though its
training source code and data might be released as Free Software. In
those cases, releasing the trained AI models as Free Software would
greatly improve accessibility.
Finally, it should be noted that, just like any other technology, making AI
reusable by everyone can potentially be harmful. For example, reusing a face
detector released as Free Software as part of facial recognition software
can cause human rights issues. However, this holds true regardless of the
technology involved. If a software use case is deemed harmful, it should
therefore be prohibited without an explicit ban on AI technology.
<h2 id="transparency">Transparency</h2>
AI transparency can be subdivided into openness and interpretability. In this
context, openness is defined as the right to be informed about the AI
software, and interpretability is defined as being able to understand how
the input is processed so that one can identify the factors taken into
account to make predictions, and their relative importance. In Europe, the
right to be informed about the decision of an algorithm is granted by the
Recital 71 of the General Data Protection Regulation (GDPR) 2016/679 “<em>
In any case, such processing should be subject to suitable safeguards, which
should include specific information to the data subject and the right to
obtain human intervention, to express his or her point of view, to obtain an
explanation of the decision reached after such assessment and to challenge
the decision.</em>”. Transparency can thus be defined as the ability to
understand what led to the predictions.
AI needs to be transparent because it is used for critical matters. For
example, it is used to determine credit worthiness<a href="#fn-3" id="ref-3"
class="fn">3</a>, in self-driving cars<a href="#fn-4" id="ref-4"
class="fn">4</a>, in predictive policing<a href="#fn-5" id="ref-5"
class="fn">5</a> or in healthcare<a href="#fn-6" id="ref-6"
class="fn">6</a>. In these contexts, getting information about how the
predictions are done is therefore critical and information about the data
used and how it was processed by the AI should be made available. Moreover,
trust and adoption of AI would consequently be higher. Furthermore, modern
AI technologies such as deep learning are not meant to be transparent,
because they are composed of millions or billions of individual parameters<a
href="#fn-7" id="ref-7" class="fn">7</a>, making them very complex and hard
to understand. This calls for Free Software which can assist in analysing this
Technologies released as Free Software to make AI more transparent already
exist. For example, Local Interpretable Model-Agnostic Explanations
(LIME)<a href="#fn-8" id="ref-8" class="fn">8</a> is a software package
which simplifies a complex prediction model by simulating it with a simpler,
more interpretable version, thus enabling users of the AI to understand the
parameters that played a role in the prediction. Figure 1 illustrates this
process by comparing predictions made by two different models. Captum<a
href="#fn-9" id="ref-9" class="fn">9</a> is a library released as Free
Software providing an attribution mechanism allowing one to understand the
relative importance of each input variable and each parameter of a deep
learning model. Making AI more transparent is therefore possible.
<img src="" />
<figcaption>Figure 1: example of prediction explanations by LIME<a href="#fn-8" id="ref-8" class="fn">8</a></figcaption>
Although a proprietary AI model can be transparent, Free Software facilitates
transparency by making auditing and inspection easier. While some data might be
too sensitive to be released under a Free Software license, statistical
properties of the data can still be published. With Free Software, everyone
is able to run the AI to understand how it is made, and look up the data
that went through it. However, it should be noted that the AI model itself,
being composed of millions or billions of parameters, is not meant to be
transparent. But simulating the AI model with a much simpler one would make
it easy to inspect it.
Another benefit of Free Software in this context is that by granting the
right to improve the AI software and share improvements with others, it
allows everybody to improve transparency, thereby preventing vendor lock-in
where one has to wait until the software provider makes the AI software more
<h2 id="fairness">Fairness</h2>
In Artificial Intelligence (AI), fairness is defined as making it free of
harmful discrimination based on one’s sensitive characteristics such as
gender, ethnicity, religion, disabilities, or sexual orientation. Because AI
models are trained on datasets containing human behaviors and activities
that can be unfair, and AI models are designed to recognise and reproduce
existing patterns, they can create harmful discrimination and human rights
violations. For example, (COMPAS)<a href="#fn-10" id="ref-10"
class="fn">10</a>, an algorithm attributing scores which indicate how
likely one would recidivate, was found to be unfair
towards African Americans<a href="#fn-11" id="ref-11" class="fn">11</a>
because for them 44.9% of cases were false positives. The algorithm
attributed a high chance of recidivism despite the defendants not
re-offending. Conversely, 47.7% of the cases for white people were labeled
as low risk of recidivism despite them re-offending. Suspected unfairness
has also been found in healthcare<a href="#fn-12" id="ref-12"
class="fn">12</a>, where an algorithm was used to attribute risks scores to
patients, thereby identifying those needing additional care resources. To
have the same risks scores as white people, black people needed to be in a
worse health situation, in term of severity in hypertension, diabetes,
anemia, bad cholesterol, or renal failure. Therefore, real fairness issues
may exist in AI algorithms. Moreover, from a legal perspective, checking for
fairness issues is required by Recital 71 of the GDPR, which requires to
<em>prevent, inter alia, discriminatory effects on natural persons on the
basis of racial or ethnic origin, political opinion, religion or beliefs,
trade union membership, genetic or health status or sexual orientation, or
processing that results in measures having such an effect.</em>”. We thus
need solutions to detect potential fairness issues in datasets on which AI
is trained and correct them when they occur.
To detect fairness, one needs to quantify it. There are lots of ways to
define fairness for AI, based on two categories of approaches. The first one
verifies that people grouped according to some sensitive characteristic are
treated similarly by the algorithm, e.g. in term of accuracy, true positive
rate and false positive rate. The second approach measures fairness at the
individual level by ensuring that similar individuals are treated similarly
by the algorithm<a href="#fn-13" id="ref-13" class="fn">13</a>. More
formally, a distance measure between samples of the dataset and a distance
measure between the predictions of the algorithm are compared to ensure
their ratio is consistent. However, satisfying group fairness and individual
fairness at the same time might be impossible<a href="#fn-14" id="ref-14"
class="fn">14</a>. There are three commonly used methods to mitigate
unfairness, if detected:
Remove the sensitive attribute (e.g. gender, ethnicity, religion, etc.)
from the dataset. This approach may not work in real-world scenarios.
When the sensitive attribute is correlated with other attributes of the
dataset, removing the sensitive attribute is not be enough to completely
mask it. Removing it may therefore not be sufficient, and removing all
attributes correlated with it may lead to a lot of information loss;
Ensure that the dataset has an equal representation of people if grouped
by a sensitive characteristic;
Optimise the AI model for accuracy and fairness at the same time. While
the algorithm is trained on an existing dataset that contains unfair
discrimination, both consider its accuracy and its fairness<a
href="#fn-15" id="ref-15" class="fn">15</a>. In other words, add fairness
to the goal of the algorithm.
If those methods are used, having a perfectly accurate and fair algorithm is
impossible<a href="#fn-14" id="ref-14" class="fn">14</a>, but if the accuracy
is defined on a dataset known to contain unfair treatment of a particular
group, having a less than perfect accuracy may be deemed acceptable.
Because as AI application released as Free Software may be used and inspected
by everyone, verification of whether it is free of potentially harmful discrimination
is easier than if it were proprietary. Moreover, this synergises with AI
transparency (see Section <a href="#transparency">Transparency</a>), as a
transparent AI applicationfacilitates the understanding of the factors considered for
making predictions. While necessary, releasing an AI application as Free Software
does not make it fair. However, it makes fairness easier to evaluate and enforce.
<h2 id="conclusions">Conclusions</h2>
In this article, potential issues around the democratisation
of artificial intelligence (AI) and implications for human rights are
highlighted, and potential Free Software solutions are presented to tackle
them. In particular, it is shown that AI needs to be accessible, transparent
and fair in order to be usable. While not a sufficient solution, releasing
AI under Free Software licences is necessary for its widespread use
throughout our information systems by making it more scrutable, trustworthy
and safe for everyone.
<h2 id="fn">References</h2>
<li id="fn-1">K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” <em>arXiv:1512.03385 [cs]</em>, Dec. 2015. <a href="#ref-1">&#8617;</a></li>
<li id="fn-2">K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” <em>arXiv:1409.1556 [cs]</em>, Apr. 2015. <a href="#ref-2">&#8617;</a></li>
<li id="fn-3">X. Dastile, T. Celik, and M. Potsane, “Statistical and machine learning models in credit scoring: A systematic literature survey,” <em>Applied Soft Computing</em>, vol. 91, p. 106263, 2020, doi: <a href="">10.1016/j.asoc.2020.106263</a>. <a href="#ref-3">&#8617;</a></li>
<li id="fn-4">C. Badue <em>et al.</em>, “Self-Driving Cars: A Survey,” <em>arXiv:1901.04407 [cs]</em>, Oct. 2019. <a href="#ref-4">&#8617;</a></li>
<li id="fn-5">D. Ensign, S. A. Friedler, S. Neville, C. Scheidegger, and S. Venkatasubramanian, “Runaway Feedback Loops in Predictive Policing,” in <em>Conference on Fairness, Accountability and Transparency</em>, Jan. 2018, pp. 160–171. <a href="#ref-5">&#8617;</a></li>
<li id="fn-6">N. Schwalbe and B. Wahl, “Artificial intelligence and the future of global health,” <em>The Lancet</em>, vol. 395, no. 10236, pp. 1579–1586, May 2020, doi: <a href="">10.1016/S0140-6736(20)30226-9</a>. <a href="#ref-6">&#8617;</a></li>
<li id="fn-7">A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications,” <em>arXiv:1605.07678 [cs]</em>, Apr. 2017. <a href="#ref-7">&#8617;</a></li>
<li id="fn-8">M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why Should I Trust You?": Explaining the Predictions of Any Classifier,” <em>arXiv:1602.04938 [cs, stat]</em>, Aug. 2016. <a href="#ref-8">&#8617;</a></li>
<li id="fn-9">N. Kokhlikyan <em>et al.</em>, <em>Captum: A unified and generic model interpretability library for PyTorch</em>. 2020. <a href="#ref-9">&#8617;</a></li>
<li id="fn-10">“Practitioners Guide to COMPAS.” Northpointe, Mar. 2015. <a href="#ref-10">&#8617;</a></li>
<li id="fn-11">L. K. Mattu Jeff Larson, “Machine Bias,” <em>ProPublica</em>. Mar. 2015. <a href="#ref-11">&#8617;</a></li>
<li id="fn-12">Z. Obermeyer, B. Powers, C. Vogeli, and S. Mullainathan, “Dissecting racial bias in an algorithm used to manage the health of populations,” <em>Science (New York, N.Y.)</em>, vol. 366, no. 6464, pp. 447–453, Oct. 2019, doi: <a href="">10.1126/science.aax2342</a>. <a href="#ref-12">&#8617;</a></li>
<li id="fn-13">C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness Through Awareness,” <em>arXiv:1104.3913 [cs]</em>, Nov. 2011. <a href="#ref-13">&#8617;</a></li>
<li id="fn-14">J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores,” <em>arXiv:1609.05807 [cs, stat]</em>, Nov. 2016. <a href="#ref-14">&#8617;</a></li>
<li id="fn-15">M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi, “Fairness Beyond Disparate Treatment &amp; Disparate Impact: Learning Classification without Disparate Mistreatment,” <em>Proceedings of the 26th International Conference on World Wide Web</em>, pp. 1171–1180, Apr. 2017, doi: <a href="">10.1145/3038912.3052660</a>. <a href="#ref-15">&#8617;</a></li>
<author id="lequertier" />
<original content="2021-04-17" />