Installation

This section describes how to setup your own DCOR production instance.

Ubuntu and CKAN

Please use an Ubuntu 20.04 installation for any development or production usage. This makes it easier to give support and track down issues.

Before proceeding with the installation of CKAN, install the following packages:

apt update
# CKAN requirements
apt install -y libpq5 redis-server nginx supervisor
# needed for building packages that DCOR depends on (dclab)
apt install -y gcc python3-dev
# additional tools that you might find useful, but are not actually required
apt install -y aptitude net-tools mlocate screen needrestart python-is-python3

Install CKAN:

wget https://packaging.ckan.org/python-ckan_2.9-py3-focal_amd64.deb
dpkg -i python-ckan_2.9-py3-focal_amd64.deb

Note

Do NOT setup file uploads when following the instructions at https://docs.ckan.org. DCOR has its own dedicated directories for data uploads. The command dcor inspect will try to setup/fix that for you.

Follow the remainder of the installation guide at https://docs.ckan.org/en/2.9/maintaining/installing/install-from-package.html#install-and-configure-postgresql. Make sure to note down the PostgreSQL password which you will need in the initialization step.

Make sure to initiate the CKAN database with

source /usr/lib/ckan/default/bin/activate
export CKAN_INI=/etc/ckan/default/ckan.ini
ckan db init

DCOR by default stores all data on /data. This makes it easier to control backups and separate the CKAN/DCOR software from the actual data. If you have not mounted a block device or a network share on /data, please create this directory with

mkdir /data

DCOR Extensions

Installation

Whenever you need to run the ckan/dcor commands or have to update Python packages, you have to first activate the CKAN virtual environment.

source /usr/lib/ckan/default/bin/activate

With the active environment, first install some basic requirements.

pip install --upgrade pip
pip install wheel

Then, install DCOR, which will install all extensions including their requirements.

pip install dcor_control

Initialization

The dcor_control package installed the entry point dcor which allows you to manage your DCOR installation. Just type dcor --help to find out what you can do with it.

For the initial setup, you have to run the inspect command. You can run this command on a routinely basis to make sure that your DCOR installation is setup correctly.

source /usr/lib/ckan/default/bin/activate
dcor inspect

Testing

For testing, common practice is to create separate test databases. We adapt the recipe from the CKAN docs to test the DCOR extensions (e.g. we don’t need datastore).

  • Activate the virtual environment:

    source /usr/lib/ckan/default/bin/activate
    
  • Install the requirements:

    pip install -r /usr/lib/ckan/default/src/ckan/dev-requirements.txt
    # https://github.com/ckan/ckan/issues/5570
    pip install pytest-ckan
    
  • Create the test database:

    sudo -u postgres createdb -O ckan_default ckan_test -E utf-8
    
  • Create ckan.ini for testing:

    cp /etc/ckan/default/ckan.ini /etc/ckan/default/test-dcor.ini
    

    Modify test-dcor.ini:

    #sqlalchemy.url = postgresql://ckan_default:passw@localhost/ckan_default
    sqlalchemy.url = postgresql://ckan_default:passw@localhost/ckan_test
    
    #solr_url=http://127.0.0.1:8983/solr
    solr_url=http://127.0.0.1:8983/solr/ckan
    
  • Configure Solr Multi-core.

  • Initialize the testing db:

    export CKAN_INI=/etc/ckan/default/test-dcor.ini
    ckan db init
    

You can then run the tests with e.g.:

export CKAN_INI=/etc/ckan/default/test-dcor.ini
pytest /path/to/ckanext-dcor_depot

SSL

You have two options. If you server is reachable through the internet, you should use Let’s encrypt (or a certificate from your organization) to set up SSL. If you are hosting your server on the intranet (clinics scenario), then you should create your own certificate and distribute it to your users

Creating an SSL certificate (Intranet only)

Start by creating your certificate (valid for 10 years):

openssl req -newkey rsa:4096 -x509 -sha256 -days 3650 -nodes -out fqdn.cert -keyout fqdn.key

where fqdn is your fully qualified domain name (FQDN) which maps to the server’s IP address. Make sure to enter it in the dialog (otherwise use the IP address). This makes connection tests easier (e.g. if you only have SSH access to the machine and need to use SSH tunneling to connect to the CKAN instance by mapping its FQDN in the /etc/hosts file to 127.0.0.1 on the testing client).

You may want to create an encrypted access token for your users.

Now proceed with the SSL configuration below, replacing “dcor.mpl.mpg.de” with your FQDN.

Configuring nginx (SSL and uWSGI proxy)

Encrypting data transfer should be a priority for you. If your server is available online, you can use e.g. Let’s Encrypt to obtain an SSL certificate. If you are hosting CKAN/DCOR internally in your organization, you will have to create a self-signed certificate and distribute the public key to the client machines manually.

First copy the certificate to /etc/ssl/private:

cp dcor.mpl.mpg.de.cert /etc/ssl/certs/
cp dcor.mpl.mpg.de.key /etc/ssl/private/

Note

If dclab, Shape-Out, or DCOR-Aid cannot connect to your CKAN instance, it might be because the certificate in /etc/ssl/certs/ does not contain the full certificate chain. In this case, just download the entire certificate chain using Firefox (right-lick on the shield symbol an look at the certificate - there should be a download option for the chained certificate somewhere) and replace the content of the .cert file with that.

Then, edit /etc/nginx/sites-enabled/ckan and replace its content with the following (change dcor.mpl.mpg.de to whatever domain you use):

# Note that nginx only caches GET and HEAD (not POST) by default:
# http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_methods
proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;

server {
   client_max_body_size 100G;
   # Use this if you don't have enough space on your root partition
   # for caching large uploads (rw-access to www-data).
   # client_body_temp_path /data/tmp/nginx/client_body 1 2;
   listen       443 ssl http2;
   listen       [::]:443 ssl http2;
   server_name  dcor.mpl.mpg.de;

   ssl_certificate "/etc/ssl/certs/dcor.mpl.mpg.de.cert";
   ssl_certificate_key "/etc/ssl/private/dcor.mpl.mpg.de.key";

   # Enables byte-range support for both cached and uncached responses
   # from the proxied server regardless of the "Accept-Ranges" field
   # in these responses. This is important for resuming downloads.
   proxy_force_ranges on;

   # Uncoment to avoid robots (only on development machines)
   #location = /robots.txt { return 200 "User-agent: *\nDisallow: /\n"; }

   # Do not cache downloads of .rtdc data
   location ~ \.(rtdc)$ {
       proxy_pass http://127.0.0.1:8080$request_uri;
       proxy_set_header Host $host;

       # Cache each and every download on disk to get load off of
       # the ckan workers (see ckan-uwsgi.ini).
       proxy_max_temp_file_size 100000m;

       # Use this if you don't have enough space on your root partition
       # for caching large downloads (rw-access to www-data).
       # proxy_temp_path /data/tmp/nginx/proxy 1 2;

       # Do not keep any files on disk (only temp files above).
       proxy_store off;
       proxy_cache off;
       gzip off;
   }

   # allow-list for ckan-related directories
   location ~ /(api|ckan-admin|dashboard|dataset|favicon.ico|fonts|group|images|login_generic|organization|revision|user|webassets) {
       proxy_pass http://127.0.0.1:8080$request_uri;
       proxy_set_header Host $host;
       proxy_read_timeout 7200;
       proxy_send_timeout 7200;
       proxy_cache cache;
       proxy_cache_bypass $cookie_auth_tkt;
       proxy_no_cache $cookie_auth_tkt;
       proxy_cache_valid 30m;
       proxy_cache_key $host$scheme$proxy_host$request_uri;
   }

   # ckan root
   location = / {
       proxy_pass http://127.0.0.1:8080/;
       proxy_set_header Host $host;
       proxy_cache cache;
       proxy_cache_bypass $cookie_auth_tkt;
       proxy_no_cache $cookie_auth_tkt;
       proxy_cache_valid 30m;
       proxy_cache_key $host$scheme$proxy_host$request_uri;
   }

   # Deny all access to other directories that bots search
   # (e.g. "/wp", "/wordpress", "/old", "/.git") which takes
   # load off of the uWSGI workers.
   location / {
       return 404;
   }

}

# Redirect all traffic to SSL
server {
   listen 80;
   listen [::]:80;
   server_name dcor.mpl.mpg.de;
   return 301 https://$host$request_uri;
}

# Optional: Reject traffic that is not directed at `dcor.mpl.mpg.de:80`
server {
   listen 80 default_server;
   listen [::]:80 default_server;
   server_name _;
   return 444;
}

# Optional: Reject traffic that is not directed at `dcor.mpl.mpg.de:443`
server {
listen       443 default_server;
   listen       [::]:443 default_server;
   server_name  _;
   return 444;
   ssl_certificate "/etc/ssl/certs/ssl-cert-snakeoil.pem";
   ssl_certificate_key "/etc/ssl/private/ssl-cert-snakeoil.key";
}

Now, we need to modify the CKAN uWSGI file at /etc/ckan/default/ckan-uwsgi.ini:

[uwsgi]

; Since we are behind a webserver (proxy), we use the socket variant.
; We use HTTP1.1 (keep-alives)
http11-socket        =  127.0.0.1:8080
uid                  =  www-data
gid                  =  www-data
wsgi-file            =  /etc/ckan/default/wsgi.py
virtualenv           =  /usr/lib/ckan/default
module               =  wsgi:application
master               =  true
pidfile              =  /tmp/%n.pid
harakiri             =  7200
max-requests         =  5000
vacuum               =  true
callable             =  application
buffer-size          =  32768

; Make sure all options in this file exist.
strict               =  true

; Disable post-buffering, because nginx buffers the entire upload
; anyway and no worker will be idle when consuming it from nginx.
post-buffering       =  0

; Set the number of workers to something > 1, otherwise
; only one client can connect via nginx to uWSGI at a time.
; See https://github.com/ckan/ckan/issues/5933
workers              =  4
; Use lazy apps to avoid the `__Global` error.
; See https://github.com/ckan/ckan/issues/5933#issuecomment-809114593
lazy-apps            =  true
; If we don't want to cache the files that users want to download
; (i.e. set `proxy_max_temp_file_size 0;` in nginx), then we have to
; set socket-timeout to a very large number (e.g. 7200).
; We may also want to increase this number if we the storage location for
; resources has a low write speed (e.g. NFS). From the uWSGI sources,
; it looks like the default value is 4s.
socket-timeout       =  500
; (Note that we are serving CKAN via http11-socket behind nginx).
; Otherwise, downloads will fail with `uwsgi_response_sendfile_do() TIMEOUT !!!`,
; because the client cannot download the file from nginx as fast as
; uWSGI can send the file to nginx. But in this case, we can really only
; have as many connections as we have workers.
; On the other hand, if we, set `proxy_max_temp_file_size 100000m;`
; in nginx, then all downloads will be cached by nginx. And nginx will
; handle all users. The purpose of setting `workers` to `4` in uWSGI
; is now only so that CKAN does not block for as long as it takes the
; system to copy the download from uwsgi to nginx's `proxy_temp_path`.
; In other words, CKAN will only be unresponsive if 4 downloads are
; started at the same time for as long as it takes the smallest download
; to be copied over the http socket from uWSGI to nginx.
; Good to know: nginx also caches uploads, so no uWSGI worker is
; blocked *during* an upload.

; Custom logging
; disable logging in general (files easily get above 50MB)
disable-logging      =  true
; enable logging for a few specific cases
log-4xx              =  true
log-5xx              =  true
log-ioerror          =  true
; set the log format to match that of CKAN
log-date             =  %%Y-%%m-%%d %%H:%%M:%%S
logformat-strftime   =  true
logformat            =  %(ftime) uWSGI %(addr) (%(proto) %(status)) %(method) %(uri) => %(size) bytes in %(msecs) msecs to %(uagent)

Unattended upgrades

Unattended upgrades offer a simple way of keeping the server up-to-date and patched against security vulnerabilities.

apt-get install unattended-upgrades apt-listchanges

Edit the file /etc/apt/apt.conf.d/50unattended-upgrades to your liking. The default settings should already work, but you might want to setup email notifications and automated reboots.

Note

If you have access to an internal email server and wish to get email notifications from your system, install

apt install bsd-mailx ssmtp

and edit /etc/ssmtp/ssmtp.conf:

Note that this is something different than CKAN email notifications.