318 lines
19 KiB
HTML
318 lines
19 KiB
HTML
<?xml version="1.0" encoding="UTF-8" ?>
|
||
|
||
<html>
|
||
<version>1</version>
|
||
|
||
<head>
|
||
<title>Artificial Intelligence and Free Software</title>
|
||
</head>
|
||
<body class="article">
|
||
<h1>
|
||
Controlling technology at the age of Artificial Intelligence: a Free
|
||
Software perspective
|
||
</h1>
|
||
<div id="introduction">
|
||
<p>
|
||
Technical improvements, the accumulation of large, detailed datasets and
|
||
advancement in computer hardware have led to an Artificial Intelligence
|
||
(AI) revolution. For example, breakthroughs in computer vision as well
|
||
as the building of large datasets and amelioration in text analysis
|
||
coupled with the gathering of personal data have given birth to
|
||
countless AI applications. These new AI applications have given many
|
||
benefits to European Union citizens. However, because of its inherent
|
||
complexity and requirements in technical resources and knowledge, AI may
|
||
undermine our ability to control technology and put fundamental freedoms
|
||
at risk. Therefore, introducing new legislation on AI is a worthwhile
|
||
objective.
|
||
</p>
|
||
|
||
<p>
|
||
In the context of a new legislation, this article explains how releasing
|
||
AI applications under Free Software licences paves the way for more
|
||
accessibility, transparency, and fairness.
|
||
</p>
|
||
</div>
|
||
|
||
<h2 id="freesoftware">What is Free Software?</h2>
|
||
|
||
<p>
|
||
<a href="freesoftware.html">Free Software</a> (also known as Open Source)
|
||
empowers people to control technology by granting four freedoms to each
|
||
user:
|
||
</p>
|
||
<ol>
|
||
<li>
|
||
The freedom to use software for any purpose, without geographical
|
||
limitations;
|
||
</li>
|
||
<li>
|
||
The freedom to study software, without any non-disclosure agreement;
|
||
</li>
|
||
<li>The freedom to share software and copy it at no cost;</li>
|
||
<li>The freedom to improve software and share the improvements.</li>
|
||
</ol>
|
||
|
||
<p>
|
||
These freedoms are granted by releasing software under a Free Software
|
||
licence, whose terms are compatible with the aforementioned freedoms. There
|
||
exist multiple Free Software licences with different goals. Software may
|
||
be licensed under more than one license. Because in order to be freely
|
||
modified, an AI application requires its training code and training data,
|
||
both need to be released under a Free Software license to consider the AI
|
||
as being Free.
|
||
</p>
|
||
|
||
<h2 id="accessibility">Accessibility</h2>
|
||
|
||
<p>
|
||
Accessibility for AI means making it reusable, so that everyone may tinker
|
||
with it, improve it and use for their own purposes. To make AI reusable, it can
|
||
be released under a Free Software license. The advantages of this approach
|
||
are many. By having open legal grounds, Free AI fosters innovation,
|
||
because one does not have to deal with artificial restrictions that prevent
|
||
people from reusing work. Making AI Free therefore saves everyone from
|
||
having to reinvent the wheel, making researchers and developers alike able
|
||
to focus on creating new, better AI software instead of rebuilding blocks
|
||
and reproducing previous work again and again. In addition to improving
|
||
efficiency, by sharing expertise, Free AI lowers the cost of
|
||
development by saving time and removing license fees. All of this improves
|
||
accessibility of AI, which leads to better and more democratic solutions as
|
||
everyone can participate.
|
||
</p>
|
||
|
||
<p>
|
||
Making AI reusable also makes it easier to base specialised AI models upon
|
||
more generic ones. If a generic AI model is released as Free Software,
|
||
rather than training a new model from scratch, one can leverage the
|
||
generic model as a starting point for a specific, downstream prediction
|
||
task. For example, one can use a generic computer vision model<a
|
||
href="#fn-1" id="ref-1" class="fn">1</a><span class="fn">,</span><a
|
||
href="#fn-2" id="ref-2" class="fn">2</a> as a starting point for managing
|
||
public infrastructure which requires specific image treatments. Just as
|
||
with accessibility in general, this approach has a key advantage: generic
|
||
models with a lot of parameters and trained on large datasets may make the
|
||
downstream task easier to learn. This makes AI more accessible by lowering
|
||
the barrier to entry by making it easier to reuse works.
|
||
</p>
|
||
|
||
<p>
|
||
However, making both the source code used to train the AI application and
|
||
the corresponding data Free is sometimes not enough to make it
|
||
accessible. AI requires a huge amount of data in order to identify
|
||
patterns and correlations which lead to correct predictions. On the
|
||
contrary, not having enough data reduces its ability to understand the
|
||
world. Furthermore, big datasets and their inherent complexity tend to
|
||
make AI models large, making their training time-consuming and
|
||
resource-intensive. The complexity in handling the data required to train AI
|
||
models, coupled with the knowledge required to develop them and manage
|
||
a huge computer capacity, demands a lot of human resources. Therefore, it may be
|
||
hard to exercise the freedoms offered by Free AI, even though its
|
||
training source code and data might be released as Free Software. In
|
||
those cases, releasing the trained AI models as Free Software would
|
||
greatly improve accessibility.
|
||
</p>
|
||
<p>
|
||
Finally, it should be noted that, just like any other technology, making AI
|
||
reusable by everyone can potentially be harmful. For example, reusing a face
|
||
detector released as Free Software as part of facial recognition software
|
||
can cause human rights issues. However, this holds true regardless of the
|
||
technology involved. If a software use case is deemed harmful, it should
|
||
therefore be prohibited without an explicit ban on AI technology.
|
||
</p>
|
||
|
||
<h2 id="transparency">Transparency</h2>
|
||
|
||
<p>
|
||
AI transparency can be subdivided into openness and interpretability. In this
|
||
context, openness is defined as the right to be informed about the AI
|
||
software, and interpretability is defined as being able to understand how
|
||
the input is processed so that one can identify the factors taken into
|
||
account to make predictions, and their relative importance. In Europe, the
|
||
right to be informed about the decision of an algorithm is granted by the
|
||
Recital 71 of the General Data Protection Regulation (GDPR) 2016/679 “<em>
|
||
In any case, such processing should be subject to suitable safeguards, which
|
||
should include specific information to the data subject and the right to
|
||
obtain human intervention, to express his or her point of view, to obtain an
|
||
explanation of the decision reached after such assessment and to challenge
|
||
the decision.</em>”. Transparency can thus be defined as the ability to
|
||
understand what led to the predictions.
|
||
</p>
|
||
|
||
<p>
|
||
AI needs to be transparent because it is used for critical matters. For
|
||
example, it is used to determine credit worthiness<a href="#fn-3" id="ref-3"
|
||
class="fn">3</a>, in self-driving cars<a href="#fn-4" id="ref-4"
|
||
class="fn">4</a>, in predictive policing<a href="#fn-5" id="ref-5"
|
||
class="fn">5</a> or in healthcare<a href="#fn-6" id="ref-6"
|
||
class="fn">6</a>. In these contexts, getting information about how the
|
||
predictions are done is therefore critical and information about the data
|
||
used and how it was processed by the AI should be made available. Moreover,
|
||
trust and adoption of AI would consequently be higher. Furthermore, modern
|
||
AI technologies such as deep learning are not meant to be transparent,
|
||
because they are composed of millions or billions of individual parameters<a
|
||
href="#fn-7" id="ref-7" class="fn">7</a>, making them very complex and hard
|
||
to understand. This calls for Free Software which can assist in analysing this
|
||
complexity.
|
||
</p>
|
||
<p>
|
||
Technologies released as Free Software to make AI more transparent already
|
||
exist. For example, Local Interpretable Model-Agnostic Explanations
|
||
(LIME)<a href="#fn-8" id="ref-8" class="fn">8</a> is a software package
|
||
which simplifies a complex prediction model by simulating it with a simpler,
|
||
more interpretable version, thus enabling users of the AI to understand the
|
||
parameters that played a role in the prediction. Figure 1 illustrates this
|
||
process by comparing predictions made by two different models. Captum<a
|
||
href="#fn-9" id="ref-9" class="fn">9</a> is a library released as Free
|
||
Software providing an attribution mechanism allowing one to understand the
|
||
relative importance of each input variable and each parameter of a deep
|
||
learning model. Making AI more transparent is therefore possible.
|
||
</p>
|
||
<figure>
|
||
<img src="https://pics.fsfe.org/uploads/big/79c895a423457c9b60b4106a2401d631.png" />
|
||
<figcaption>Figure 1: example of prediction explanations by LIME<a href="#fn-8" id="ref-8" class="fn">8</a></figcaption>
|
||
</figure>
|
||
<p>
|
||
Although a proprietary AI model can be transparent, Free Software facilitates
|
||
transparency by making auditing and inspection easier. While some data might be
|
||
too sensitive to be released under a Free Software license, statistical
|
||
properties of the data can still be published. With Free Software, everyone
|
||
is able to run the AI to understand how it is made, and look up the data
|
||
that went through it. However, it should be noted that the AI model itself,
|
||
being composed of millions or billions of parameters, is not meant to be
|
||
transparent. But simulating the AI model with a much simpler one would make
|
||
it easy to inspect it.
|
||
</p>
|
||
<p>
|
||
Another benefit of Free Software in this context is that by granting the
|
||
right to improve the AI software and share improvements with others, it
|
||
allows everybody to improve transparency, thereby preventing vendor lock-in
|
||
where one has to wait until the software provider makes the AI software more
|
||
transparent.
|
||
</p>
|
||
|
||
<h2 id="fairness">Fairness</h2>
|
||
|
||
<p>
|
||
In Artificial Intelligence (AI), fairness is defined as making it free of
|
||
harmful discrimination based on one’s sensitive characteristics such as
|
||
gender, ethnicity, religion, disabilities, or sexual orientation. Because AI
|
||
models are trained on datasets containing human behaviors and activities
|
||
that can be unfair, and AI models are designed to recognise and reproduce
|
||
existing patterns, they can create harmful discrimination and human rights
|
||
violations. For example, (COMPAS)<a href="#fn-10" id="ref-10"
|
||
class="fn">10</a>, an algorithm attributing scores which indicate how
|
||
likely one would recidivate, was found to be unfair
|
||
towards African Americans<a href="#fn-11" id="ref-11" class="fn">11</a>
|
||
because for them 44.9% of cases were false positives. The algorithm
|
||
attributed a high chance of recidivism despite the defendants not
|
||
re-offending. Conversely, 47.7% of the cases for white people were labeled
|
||
as low risk of recidivism despite them re-offending. Suspected unfairness
|
||
has also been found in healthcare<a href="#fn-12" id="ref-12"
|
||
class="fn">12</a>, where an algorithm was used to attribute risks scores to
|
||
patients, thereby identifying those needing additional care resources. To
|
||
have the same risks scores as white people, black people needed to be in a
|
||
worse health situation, in term of severity in hypertension, diabetes,
|
||
anemia, bad cholesterol, or renal failure. Therefore, real fairness issues
|
||
may exist in AI algorithms. Moreover, from a legal perspective, checking for
|
||
fairness issues is required by Recital 71 of the GDPR, which requires to
|
||
“<em>prevent, inter alia, discriminatory effects on natural persons on the
|
||
basis of racial or ethnic origin, political opinion, religion or beliefs,
|
||
trade union membership, genetic or health status or sexual orientation, or
|
||
processing that results in measures having such an effect.</em>”. We thus
|
||
need solutions to detect potential fairness issues in datasets on which AI
|
||
is trained and correct them when they occur.
|
||
</p>
|
||
<p>
|
||
To detect fairness, one needs to quantify it. There are lots of ways to
|
||
define fairness for AI, based on two categories of approaches. The first one
|
||
verifies that people grouped according to some sensitive characteristic are
|
||
treated similarly by the algorithm, e.g. in term of accuracy, true positive
|
||
rate and false positive rate. The second approach measures fairness at the
|
||
individual level by ensuring that similar individuals are treated similarly
|
||
by the algorithm<a href="#fn-13" id="ref-13" class="fn">13</a>. More
|
||
formally, a distance measure between samples of the dataset and a distance
|
||
measure between the predictions of the algorithm are compared to ensure
|
||
their ratio is consistent. However, satisfying group fairness and individual
|
||
fairness at the same time might be impossible<a href="#fn-14" id="ref-14"
|
||
class="fn">14</a>. There are three commonly used methods to mitigate
|
||
unfairness, if detected:
|
||
</p>
|
||
<ol>
|
||
<li>
|
||
Remove the sensitive attribute (e.g. gender, ethnicity, religion, etc.)
|
||
from the dataset. This approach may not work in real-world scenarios.
|
||
When the sensitive attribute is correlated with other attributes of the
|
||
dataset, removing the sensitive attribute is not be enough to completely
|
||
mask it. Removing it may therefore not be sufficient, and removing all
|
||
attributes correlated with it may lead to a lot of information loss;
|
||
</li>
|
||
<li>
|
||
Ensure that the dataset has an equal representation of people if grouped
|
||
by a sensitive characteristic;
|
||
</li>
|
||
<li>
|
||
Optimise the AI model for accuracy and fairness at the same time. While
|
||
the algorithm is trained on an existing dataset that contains unfair
|
||
discrimination, both consider its accuracy and its fairness<a
|
||
href="#fn-15" id="ref-15" class="fn">15</a>. In other words, add fairness
|
||
to the goal of the algorithm.
|
||
</li>
|
||
</ol>
|
||
<p>
|
||
If those methods are used, having a perfectly accurate and fair algorithm is
|
||
impossible<a href="#fn-14" id="ref-14" class="fn">14</a>, but if the accuracy
|
||
is defined on a dataset known to contain unfair treatment of a particular
|
||
group, having a less than perfect accuracy may be deemed acceptable.
|
||
</p>
|
||
<p>
|
||
Because as AI application released as Free Software may be used and inspected
|
||
by everyone, verification of whether it is free of potentially harmful discrimination
|
||
is easier than if it were proprietary. Moreover, this synergises with AI
|
||
transparency (see Section <a href="#transparency">Transparency</a>), as a
|
||
transparent AI applicationfacilitates the understanding of the factors considered for
|
||
making predictions. While necessary, releasing an AI application as Free Software
|
||
does not make it fair. However, it makes fairness easier to evaluate and enforce.
|
||
</p>
|
||
|
||
<h2 id="conclusions">Conclusions</h2>
|
||
|
||
<p>
|
||
|
||
In this article, potential issues around the democratisation
|
||
of artificial intelligence (AI) and implications for human rights are
|
||
highlighted, and potential Free Software solutions are presented to tackle
|
||
them. In particular, it is shown that AI needs to be accessible, transparent
|
||
and fair in order to be usable. While not a sufficient solution, releasing
|
||
AI under Free Software licences is necessary for its widespread use
|
||
throughout our information systems by making it more scrutable, trustworthy
|
||
and safe for everyone.
|
||
|
||
</p>
|
||
|
||
<h2 id="fn">References</h2>
|
||
<ol>
|
||
<li id="fn-1">K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” <em>arXiv:1512.03385 [cs]</em>, Dec. 2015. <a href="#ref-1">↩</a></li>
|
||
<li id="fn-2">K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” <em>arXiv:1409.1556 [cs]</em>, Apr. 2015. <a href="#ref-2">↩</a></li>
|
||
<li id="fn-3">X. Dastile, T. Celik, and M. Potsane, “Statistical and machine learning models in credit scoring: A systematic literature survey,” <em>Applied Soft Computing</em>, vol. 91, p. 106263, 2020, doi: <a href="https://doi.org/10.1016/j.asoc.2020.106263">10.1016/j.asoc.2020.106263</a>. <a href="#ref-3">↩</a></li>
|
||
<li id="fn-4">C. Badue <em>et al.</em>, “Self-Driving Cars: A Survey,” <em>arXiv:1901.04407 [cs]</em>, Oct. 2019. <a href="#ref-4">↩</a></li>
|
||
<li id="fn-5">D. Ensign, S. A. Friedler, S. Neville, C. Scheidegger, and S. Venkatasubramanian, “Runaway Feedback Loops in Predictive Policing,” in <em>Conference on Fairness, Accountability and Transparency</em>, Jan. 2018, pp. 160–171. <a href="#ref-5">↩</a></li>
|
||
<li id="fn-6">N. Schwalbe and B. Wahl, “Artificial intelligence and the future of global health,” <em>The Lancet</em>, vol. 395, no. 10236, pp. 1579–1586, May 2020, doi: <a href="https://doi.org/10.1016/S0140-6736(20)30226-9">10.1016/S0140-6736(20)30226-9</a>. <a href="#ref-6">↩</a></li>
|
||
<li id="fn-7">A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications,” <em>arXiv:1605.07678 [cs]</em>, Apr. 2017. <a href="#ref-7">↩</a></li>
|
||
<li id="fn-8">M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why Should I Trust You?": Explaining the Predictions of Any Classifier,” <em>arXiv:1602.04938 [cs, stat]</em>, Aug. 2016. <a href="#ref-8">↩</a></li>
|
||
<li id="fn-9">N. Kokhlikyan <em>et al.</em>, <em>Captum: A unified and generic model interpretability library for PyTorch</em>. 2020. <a href="#ref-9">↩</a></li>
|
||
<li id="fn-10">“Practitioners Guide to COMPAS.” Northpointe, Mar. 2015. <a href="#ref-10">↩</a></li>
|
||
<li id="fn-11">L. K. Mattu Jeff Larson, “Machine Bias,” <em>ProPublica</em>. Mar. 2015. <a href="#ref-11">↩</a></li>
|
||
<li id="fn-12">Z. Obermeyer, B. Powers, C. Vogeli, and S. Mullainathan, “Dissecting racial bias in an algorithm used to manage the health of populations,” <em>Science (New York, N.Y.)</em>, vol. 366, no. 6464, pp. 447–453, Oct. 2019, doi: <a href="https://doi.org/10.1126/science.aax2342">10.1126/science.aax2342</a>. <a href="#ref-12">↩</a></li>
|
||
<li id="fn-13">C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness Through Awareness,” <em>arXiv:1104.3913 [cs]</em>, Nov. 2011. <a href="#ref-13">↩</a></li>
|
||
<li id="fn-14">J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores,” <em>arXiv:1609.05807 [cs, stat]</em>, Nov. 2016. <a href="#ref-14">↩</a></li>
|
||
<li id="fn-15">M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi, “Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment,” <em>Proceedings of the 26th International Conference on World Wide Web</em>, pp. 1171–1180, Apr. 2017, doi: <a href="https://doi.org/10.1145/3038912.3052660">10.1145/3038912.3052660</a>. <a href="#ref-15">↩</a></li>
|
||
</ol>
|
||
|
||
</body>
|
||
<author id="lequertier" />
|
||
<date>
|
||
<original content="2021-04-17" />
|
||
</date>
|
||
<sidebar/>
|
||
</html>
|