Connect the webserver to a registration service #10

Closed
opened 2019-08-05 10:37:17 +00:00 by carmenbianca · 24 comments
Owner

Because we might not be able to carry the load of accepting any and all queries, we can require the users to register their project before they can request badges.

Because we might not be able to carry the load of accepting any and all queries, we can require the users to register their project before they can request badges.
Member

Are you thinking about an external registration service, or a web page of our own where people have to register their repository before they can access the API?

Are you thinking about an external registration service, or a web page of our own where people have to register their repository before they can access the API?
Author
Owner

@max.mehl can answer that question better, I think. Extract from e-mail:

But I also wanted to avoid
that too many projects end up in our scheduler. Imagine someone requests
badges for a thousand projects. The registration adds some threshold
(obviously not for evil minds, but at least for jokers or just testers).

I am aware that this adds some complexity for inclusion with other
services, because people always have to register with us manually, but
I'm not sure what's worse: some inconvenience or completely overloaded
servers that need hours/days to return a result.

@max.mehl can answer that question better, I think. Extract from e-mail: > But I also wanted to avoid > that too many projects end up in our scheduler. Imagine someone requests > badges for a thousand projects. The registration adds some threshold > (obviously not for evil minds, but at least for jokers or just testers). > > I am aware that this adds some complexity for inclusion with other > services, because people always have to register with us manually, but > I'm not sure what's worse: some inconvenience or completely overloaded > servers that need hours/days to return a result.
Owner

Yes, let me quote our email conversation

Here, I thought about basically offering a form which sends data to
forms.fsfe.org, asking one to confirm their email address. On success, a
JSON file will be generated on the Docker host, e.g.
/srv/forms/reuse-api/repos.json. This could be picked up by the API.

I have created this application [^1], and I attached the JSON file that
can be found at /srv/forms/reuse-api/repos.json, filled with 2 test
entries.

The file repos.json, as created by the forms app, is attached.

Yes, let me quote our email conversation >> Here, I thought about basically offering a form which sends data to forms.fsfe.org, asking one to confirm their email address. On success, a JSON file will be generated on the Docker host, e.g. /srv/forms/reuse-api/repos.json. This could be picked up by the API. > > I have created this application [^1], and I attached the JSON file that can be found at /srv/forms/reuse-api/repos.json, filled with 2 test entries. The file repos.json, as created by the [forms app](https://git.fsfe.org/fsfe-system-hackers/forms/src/branch/master/fsfe_forms/applications.json#L38), is attached.
1.2 KiB
Member

This would mean that we have to parse the (whole!) JSON file each time a check for a new repository is requested, and then create a record in our SQLite database.

Maybe it could be much easier if the forms server was able to store the data directly in our SQLite database, so we can skip the whole step of writing and parsing the JSON file. Would you agree?

This would mean that we have to parse the (whole!) JSON file each time a check for a new repository is requested, and then create a record in our SQLite database. Maybe it could be much easier if the forms server was able to store the data directly in our SQLite database, so we can skip the whole step of writing and parsing the JSON file. Would you agree?
Member

Thinking about it again, we have defined some requirements e.g. about what we accept as a repository, this is something that the forms server will not be able to check. In the end, people will be able to register a repository URL in a format not supported by us, and only when they try to use the API, they will receive an error message telling them about that.

Considering on the other hand that a double opt-in is not rocket science, I am increasingly unsure whether our case here is a good application for the form server, or we'd be better off by implementing the registration here in a clean and self-contained way.

Thinking about it again, we have defined some requirements e.g. about what we accept as a repository, this is something that the forms server will not be able to check. In the end, people will be able to register a repository URL in a format not supported by us, and only when they try to use the API, they will receive an error message telling them about that. Considering on the other hand that a double opt-in is not rocket science, I am increasingly unsure whether our case here is a good application for the form server, or we'd be better off by implementing the registration here in a clean and self-contained way.
Owner

You have a good point on the invalid format. How about making this API the entry point, checking the requirements, and only then sending a request to forms? This would outsource the opt-in part and also the storage of personal data (GDPR!) to this service.

This would not solve the problem of constantly reading the JSON file though, but I don't consider this a great performance impact TBH.

But I cannot fully evaluate the technical dimensions, so this is only my 2 cents ;)

You have a good point on the invalid format. How about making this API the entry point, checking the requirements, and only then sending a request to forms? This would outsource the opt-in part and also the storage of personal data (GDPR!) to this service. This would not solve the problem of constantly reading the JSON file though, but I don't consider this a great performance impact TBH. But I cannot fully evaluate the technical dimensions, so this is only my 2 cents ;)
Member

I see your point about outsorcing the storage of personal data. When thinking about this further, I got to the following idea:

  1. User fills in a form at https://api.reuse.software/register with the fields "email address" and "repository URL". The last is checked according to our rules.

  2. Server sends email to the given address with a confirmation link like https://api.reuse.software/confirm?url=<repository-url>&signature=<signature-hash>. Server does not store any data.

  3. User clicks on confirmation link in email, gets a page with button to confirm registration.

  4. Server enters the confirmed URL in the database and starts the first lint.

I guess that's kind of "least complexity solution". What do you think?

I see your point about outsorcing the storage of personal data. When thinking about this further, I got to the following idea: 1. User fills in a form at `https://api.reuse.software/register` with the fields "email address" and "repository URL". The last is checked according to our rules. 2. Server sends email to the given address with a confirmation link like `https://api.reuse.software/confirm?url=<repository-url>&signature=<signature-hash>.` Server does *not* store any data. 3. User clicks on confirmation link in email, gets a page with button to confirm registration. 4. Server enters the confirmed URL in the database and starts the first lint. I guess that's kind of "least complexity solution". What do you think?
Owner

Sure, that would work. The benefit of using forms would indeed be that we actually store data. Perhaps we need a way to communicate with the repository owners (or at least those that have entered the projects), e.g. because the API changed.

The advantage of using forms would be that personal data resides where it is stored anyway.

Sure, that would work. The benefit of using forms would indeed be that we actually store data. Perhaps we need a way to communicate with the repository owners (or at least those that have entered the projects), e.g. because the API changed. The advantage of using forms would be that personal data resides where it is stored anyway.
Member

Okay, I was not aware that we actually want to store the registration data.

Some caveats you might want to be aware of:

  • There will be no check for duplicates, because by design forms checks for duplicates based on the email address.
  • If the confirmation button is going to be implemented, that web page will be FSFE-branded, not REUSE-branded.
Okay, I was not aware that we actually want to store the registration data. Some caveats you might want to be aware of: * There will be no check for duplicates, because by design forms checks for duplicates based on the email address. * If [the confirmation button](https://git.fsfe.org/fsfe-system-hackers/forms/issues/12) is going to be implemented, that web page will be FSFE-branded, not REUSE-branded.
Owner

Thank you for your thoughts!

There will be no check for duplicates, because by design forms checks for duplicates based on the email address.

True. Can the api make the first sanity checks here?

If the confirmation button is going to be implemented, that web page will be FSFE-branded, not REUSE-branded.

Also true. I have responded to that issue with an idea how it could be easily fixed. In general, we don't need to hide that the FSFE is behind REUSE, so that's fair ;)

Thank you for your thoughts! > There will be no check for duplicates, because by design forms checks for duplicates based on the email address. True. Can the api make the first sanity checks here? > If the confirmation button is going to be implemented, that web page will be FSFE-branded, not REUSE-branded. Also true. I have responded to that issue with an idea how it could be easily fixed. In general, we don't need to hide that the FSFE is behind REUSE, so that's fair ;)
Member

Yes, the form on api.reuse.software can make some sanity checks, actually that would be the same checks as would be done in my proposal step 1, and that would include a check for duplicates.

Yes, the form on api.reuse.software can make some sanity checks, actually that would be the same checks as would be done in my proposal step 1, and that would include a check for duplicates.
Owner

Excellent. So weighing the pros and cons, would you rather choose:

  1. api -> forms -> api: we outsource registration + mails etc to forms, store personal data there, and only take repo info from its files. perhaps larger complexity, interdependencies
  2. all inside the api: baked for our use case specifically, independent, but duplicates work of forms, and personal data storage would have to be discussed
Excellent. So weighing the pros and cons, would you rather choose: 1. api -> forms -> api: we outsource registration + mails etc to forms, store personal data there, and only take repo info from its files. perhaps larger complexity, interdependencies 2. all inside the api: baked for our use case specifically, independent, but duplicates work of forms, and personal data storage would have to be discussed
Member

Personally, I am not convinced that it makes much sense to store the email address which registered the project. My feeling is that at some point in a few years, when we might really have a use case where we'd need to contact the developers, more than 50% of these email addresses would no longer work or not be involved in the project any more.

About the double opt-in procedure, I maintain the point that it can be solved near-trivially.

So below the line, I have the impression that the benefit of storing the registrant's data and reusing an existing double opt-in framework is not worth the increased complexity arising from involving the form server.

Note this is my personal input, I can live with whatever the project leads decide.

Personally, I am not convinced that it makes much sense to store the email address which registered the project. My feeling is that at some point in a few years, when we might really have a use case where we'd need to contact the developers, more than 50% of these email addresses would no longer work or not be involved in the project any more. About the double opt-in procedure, I maintain the point that it can be solved near-trivially. So below the line, I have the impression that the benefit of storing the registrant's data and reusing an existing double opt-in framework is not worth the increased complexity arising from involving the form server. Note this is my personal input, I can live with whatever the project leads decide.
Owner

@carmenbianca @florian.vuillemot what do you think?

@carmenbianca @florian.vuillemot what do you think?
Author
Owner

I wish I had an educated opinion, but I'm a little bit in the dark about the functionality of the forms project. I agree with @reinhard however that we probably should not store personal information.

I wish I had an educated opinion, but I'm a little bit in the dark about the functionality of the forms project. I agree with @reinhard however that we probably should not store personal information.
max.mehl added this to the 0.1 milestone 2019-08-07 14:51:36 +00:00
Owner

After a lot of thinking, I would - personally - still prefer to use forms because I know the service, and because we can contact all repo owners if something changes dramatically in the first months.

But let me try to find a compromise:

  • Email confirmation is done solely by reuse-api, as proposed by @reinhard here: #10 (comment)
  • After everything is set up, each submitter will be asked to optionally sign up for technical and further infos about REUSE - using the forms service, so basically just a HTML form.

This way, we might not catch all submitters, but still have some way to reach interested parties. Not only for technical messages, but perhaps also for large updates of REUSE in general.

What do you think?

After a lot of thinking, I would - personally - still prefer to use forms because I know the service, and because we can contact all repo owners if something changes dramatically in the first months. But let me try to find a compromise: * Email confirmation is done solely by reuse-api, as proposed by @reinhard here: https://git.fsfe.org/reuse/api/issues/10#issuecomment-6529 * After everything is set up, each submitter will be asked to optionally sign up for technical and further infos about REUSE - using the forms service, so basically just a HTML form. This way, we might not catch all submitters, but still have some way to reach interested parties. Not only for technical messages, but perhaps also for large updates of REUSE in general. What do you think?
Member

We could do this, but we could also use a simple mailing list for the voluntary sign up for further infos.

(You know I love the form server as much as you do, but I guess for some purposes it's just not the best tool...)

We could do this, but we could also use a simple mailing list for the voluntary sign up for further infos. (You know I love the form server as much as you do, but I guess for some purposes it's just not the best tool...)
Owner

Hm, this will decrease the amount of people we can contact yet again I'm afraid, since mailing lists are a high threshold...

Hm, this will decrease the amount of people we can contact yet again I'm afraid, since mailing lists are a high threshold...
Member

Didn't we have some magic that allows us to automatically add somebody to a mailing list? So we could just add a checkbox to the registration form "[x] keep me informed about this service (recommended)"? And the user just has to do the standard mailman subscription confirmation?

Didn't we have some magic that allows us to automatically add somebody to a mailing list? So we could just add a checkbox to the registration form "[x] keep me informed about this service (recommended)"? And the user just has to do the standard mailman subscription confirmation?
Owner

Didn’t we have some magic that allows us to automatically add somebody to a mailing list?

I worked on this for the Spread the Word page, but I failed with my first tries. The XSRF protection makes a few things more complicated, and there seems to be another protection in place which I was not aware of. It's on my todo list to try it again.

> Didn’t we have some magic that allows us to automatically add somebody to a mailing list? I worked on this for the Spread the Word page, but I failed with my first tries. The XSRF protection makes a few things more complicated, and there seems to be another protection in place which I was not aware of. It's on my todo list to try it again.
Owner

Please let's try it as I proposed here: #10 (comment)

The existing forms app is not completely fitting for this, as the text suggests that it's about registration of a project, and requires its URL, but we can change it later.

Please let's try it as I proposed here: https://git.fsfe.org/reuse/api/issues/10#issuecomment-6598 The existing forms app is not completely fitting for this, as the text suggests that it's about registration of a project, and requires its URL, but we can change it later.
Member

@max.mehl just to avoid misunderstandings: should the signup on forms for future information about the project require double opt-in?

@max.mehl just to avoid misunderstandings: should the signup on forms for future information about the project require double opt-in?
Owner

@max.mehl just to avoid misunderstandings: should the signup on forms for future information about the project require double opt-in?

Yes. All email addresses we send mails to should be somehow confirmed, otherwise we'll run into GDPR issues. Since the registration with forms is separate from this API's registration, I'm afraid we have to ask interested users twice for email confirmation.

> @max.mehl just to avoid misunderstandings: should the signup on forms for future information about the project require double opt-in? Yes. All email addresses we send mails to should be somehow confirmed, otherwise we'll run into GDPR issues. Since the registration with forms is separate from this API's registration, I'm afraid we have to ask interested users twice for email confirmation.
reinhard self-assigned this 2019-08-27 12:19:33 +00:00
Member

I'm working on this, will issue a PR once the other open PR is merged (to avoid conflicts).

I'm working on this, will issue a PR once the other open PR is merged (to avoid conflicts).
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: reuse/api#10
No description provided.