Review #518
openReview branch backup
Description
This is a start: the committed configuration backups only PostgreSQL databases hosted on Toushirou.
Other hosts/directories will be added latter.
Updated by Pierre-Louis Bonicoli over 7 years ago
- Related to Review #519: Review burp role added
Updated by Marc Dequènes over 7 years ago
- Status changed from New to In Progress
I did not look at the role yet, first remarks about what i can see from the outside.
The database backup rewrites what /usr/local/sbin/srv_backup_databases did. So this script is most probably to be improved, but it was complex because it involved using fifo to avoid storing local data. On Orfeo we lack space and this is not gonna work as the DSPAM database is huge. So we need to use the feature to read from fifos.
We don't have any playbook separation, we're using tags at the moment. Is there any plan to merge it to dc.yml or do you think we should arrange differently?
Where is defined variable 'backup_excludes'?
I see "ratelimit:" without any value; I don't recall this is a valid YAML syntax. Is it accepted?
Every 5 minutes is not a lot of stress? I guess it is simply retries if for eg the network is down?
More later.
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
I did not look at the role yet, first remarks about what i can see from the outside.
The database backup rewrites what /usr/local/sbin/srv_backup_databases did. So this script is most probably to be improved, but it was complex because it involved using fifo to avoid storing local data. On Orfeo we lack space and this is not gonna work as the DSPAM database is huge. So we need to use the feature to read from fifos.
Fifos are used again.
We don't have any playbook separation, we're using tags at the moment. Is there any plan to merge it to dc.yml or do you think we should arrange differently?
I plan to add include backup.yml
in dc.yml
.
Where is defined variable 'backup_excludes'?
variable removed.
I see "ratelimit:" without any value; I don't recall this is a valid YAML syntax. Is it accepted?
yes (None
), ratelimit
removed anyway.
Every 5 minutes is not a lot of stress? I guess it is simply retries if for eg the network is down?
We don't want to miss the short windows. There is a bug, the cron task is executed on Toushirou
which use Paris timezone.
More later.
Updated by Marc Dequènes over 7 years ago
- Category set to Service :: Backup
Pierre-Louis Bonicoli wrote:
Fifos are used again.
Could you make the global catalog a fifo too please?
I plan to add
include backup.yml
indc.yml
.
Ok, fine.
I see "ratelimit:" without any value; I don't recall this is a valid YAML syntax. Is it accepted?
yes (
None
),ratelimit
removed anyway.
I know None, but without value at all?
Every 5 minutes is not a lot of stress? I guess it is simply retries if for eg the network is down?
We don't want to miss the short windows. There is a bug, the cron task is executed on
Toushirou
which use Paris timezone.
So TZ can cause problems when asking the server if it is the right time? Not sure to get it.
I think the BACKUP_DIR should be specific to the task, just in case we have concurrent backups, and it's clearer too.
To be clean, the list of excluded databases should be computed with a parameter. postgres and template0 are always to be excluded, but bacula is an admin-specific choice.
Also it would be nice to be able to kill all remaining processes in the post action, but be sure to only touch the ones we started in this script. This can be done in a future PR though.
Ok, the rest is fine, I'm gonna read the role now.
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
Pierre-Louis Bonicoli wrote:
Fifos are used again.
Could you make the global catalog a fifo too please?
Done
I see "ratelimit:" without any value; I don't recall this is a valid YAML syntax. Is it accepted?
yes (
None
),ratelimit
removed anyway.I know None, but without value at all?
None
or without value at all, that's the same.
See the following python snippet:
import yaml; print(yaml.load("{'1':, '2':null, '3':None, '4':~}"))
outputs:
{'1': None,
'3': 'None',
'2': None,
'4': None}
Every 5 minutes is not a lot of stress? I guess it is simply retries if for eg the network is down?
We don't want to miss the short windows. There is a bug, the cron task is executed on
Toushirou
which use Paris timezone.So TZ can cause problems when asking the server if it is the right time? Not sure to get it.
No: cron task is executed on Toushirou which is using PARIS timezone, Korutopi is using Japan timezone.
I think the BACKUP_DIR should be specific to the task, just in case we have concurrent backups, and it's clearer too.
There is a lock on the client and a lock on the server, concurrent backups aren't possible.
To be clean, the list of excluded databases should be computed with a parameter. postgres and template0 are always to be excluded, but bacula is an admin-specific choice.
Done
Also it would be nice to be able to kill all remaining processes in the post action, but be sure to only touch the ones we started in this script. This can be done in a future PR though.
Implemented. I guess we should use systemd units instead of bash script/cron task :-/ Besides systemd time can use UTC timezone.
Updated by Marc Dequènes over 7 years ago
- Category deleted (
Service :: Backup)
Pierre-Louis Bonicoli wrote:
None
or without value at all, that's the same.
Thanks for clarifying.
I guess it was not the same in the past in Ruby, but all languages seem to use libyaml now so same behavior.
No: cron task is executed on Toushirou which is using PARIS timezone, Korutopi is using Japan timezone.
Not very easy to setup then.
Implemented. I guess we should use systemd units instead of bash script/cron task :-/ Besides systemd time can use UTC timezone.
In the "jobs directory will be ignored" part, why a mkdir and not a touch? It worked with a file with Bacula so it would be nice current config in user data dirs (anywhere on DC) would work too.
I guess using UTC would be better for our heads and avoid calculation errors.
As for using systemd I'll be in favor but I'm still loking for a solution for non-linux systems (like the Hurd). There is a systemd-shim
package, but only builds on linux-any systems and is orphaned, so…
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
In the "jobs directory will be ignored" part, why a mkdir and not a touch? It worked with a file with Bacula so it would be nice current config in user data dirs (anywhere on DC) would work too.
Both file and directory are allowed/recognized.
Updated by Pierre-Louis Bonicoli over 7 years ago
- Related to Enhancement #497: Change Backup System added
Updated by Marc Dequènes over 7 years ago
New review on current code.
Why are cron_*
, timer_*
and keep_
variables not backup_
prefixed like other such variables. Looks like PG config stuff, and even if now I know there is no such thing in our Ansible config, it might happen one day.
Except from this the config in backup.yml
files is clear.
The play on Orfeo:Toushirou
is I guess supposed to setup the generic file backup, but I see no roles or tasks defined.
I will discuss the rest in #519 because it is related to the role's API.
Updated by Marc Dequènes over 7 years ago
If i understand well, include_vars
in client_specific_conf_from_file.yml
loads the files in playbooks/files/burp/Nicecity_*
.
So what I see is a way to prepare the parameters for burp using *_vars
parameters and a higher level template. This is not bad but does not allow to share config between similar hosts easily (for example all DB hosts). Why not generate the entry in backups
for all hosts of a group instead? (using moving the content of playbooks/files/burp/Nicecity_Orfeo_postgres_db.yml
into group_vars/postgresql_servers/backup.yml
in the backups
tree for example)
Updated by Marc Dequènes over 7 years ago
So I've got this error:
TASK [burp : Render BURP clientconfdir configuration] ********************************************************************************************************* fatal: [Toushirou -> nicecity.duckcorp.org]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'keep_db_mysql'"}
Unfortunately the file playbooks/files/burp/Nicecity_Toushirou_root_db.yml
is both caught is the file backup and database backup, and playbooks/templates/burp/duckcorp_ccd.conf
used for file backup does not cope well with the defined parameters because this file is made with the database backup in mind.
As the client name is used for the user
doing the backup, I'm not sure how to solve this.
Updated by Marc Dequènes over 7 years ago
I pushed the backup_duck
branch with my test changes. I just moved Toushirou's db config file away anbd back again to work around the previous bug and was able to update all configs.
- PG backup on Orfeo worked fine
- MySQL backup on Toushirou hang: Burp hang when running the pre script while manually run it works well, no idea what's going on. I disabled the crontab for the time being
- file backup works well
So I need to check things more in depth and also see if automatic cron backups are fine. I will also add the other hosts soon.
Updated by Marc Dequènes over 7 years ago
The exclude paths were not taken into account. I pushed a change to fix this and other related problems. It does not have to be the final solution but I hope to get the backup done properly without taking ages (and more TB than Pilou is OK to provide).
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
- MySQL backup on Toushirou hang: Burp hang when running the pre script while manually run it works well, no idea what's going on. I disabled the crontab for the time being
Fixed by 8d12debba60888f655a3d246a4af2d082a8c2b1d.
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
So I've got this error:
TASK [burp : Render BURP clientconfdir configuration]
fatal: [Toushirou -> nicecity.duckcorp.org]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'keep_db_mysql'"}
submodule wasn't up to date, this error was fixed by 406a38a47748886eb9ade4e540dc8e05d1fa325b.
Updated by Marc Dequènes over 7 years ago
Could you have a look at #10 and #11 remarks when you have time too (no urgency).
I rebased my branch to get your fixes. There's no urgency to review it, but keep in mind these are production settings (instead of the original branch), so please deploy using it. I did this while you're busy but of course everything is up to discussion.
Updated by Marc Dequènes over 7 years ago
MySQL backup is working fine, thanks a lot :-).
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
New review on current code.
Why are
cron_*
,timer_*
andkeep_
variables notbackup_
prefixed like other such variables. Looks like PG config stuff, and even if now I know there is no such thing in our Ansible config, it might happen one day.
Updated (f2a79cc24f954fda184a22cf31c40255a272186f/053921e500b6948692245441ba7748b4da5accfd).
The play on
Orfeo:Toushirou
is I guess supposed to setup the generic file backup, but I see no roles or tasks defined.
Fixed (df297d537c2449928359ec67906604c1f46be4d4)
I will discuss the rest in #519 because it is related to the role's API.
Updated by Pierre-Louis Bonicoli over 7 years ago
Marc Dequènes wrote:
If i understand well,
include_vars
inclient_specific_conf_from_file.yml
loads the files inplaybooks/files/burp/Nicecity_*
.So what I see is a way to prepare the parameters for burp using
*_vars
parameters and a higher level template. This is not bad but does not allow to share config between similar hosts easily (for example all DB hosts). Why not generate the entry inbackups
for all hosts of a group instead? (using moving the content ofplaybooks/files/burp/Nicecity_Orfeo_postgres_db.yml
intogroup_vars/postgresql_servers/backup.yml
in thebackups
tree for example)
In order to share configuration:
# One client is able to have many client configurations: # ... defined by variables - include: client_specific_conf.yml with_dict: '{{ client.value.backups }}' loop_control: { loop_var: 'backup' } when: '"backups" in client.value' # ... and/or defined in YAML files - include: client_specific_conf_from_file.yml # FIXME: one client is able to define only one configuration using YAML file # TODO: username should not be key OR use add one level in backups dict with_fileglob: 'burp/{{ burp_clients._server }}_{{ inventory_hostname }}_{{ client.key }}_*.yml' loop_control: { loop_var: 'backup_conf_path' } when: '"backups" not in client.value'
- variables can be used
- symlinks can be used (when configuration is defined in YAML file).
Updated by Marc Dequènes about 7 years ago
Beware the branch is still in the original repository, so it should be ported to the new Open Infra one.
As for sharing, I think it will be easier for me to grasp it when we have a real life example. Thanks for the explanation though.
It would be nice to finish polishing the remaining few things and merge. The other improvements can be done later. This would avoid too much divergence and more future work,
Updated by Marc Dequènes about 7 years ago
- Assignee changed from Marc Dequènes to Pierre-Louis Bonicoli