Troubleshooting

  • When setting up CKAN error email notifications, emails are sent for every file accessed on the server. Set the logging level to “WARNING” in all sections in /etc/ckan/default/ckan.ini.

  • If you get the following errors in /var/log/ckan/ckan-uwsgi.stderr.log:

    Error processing line 1 of /usr/lib/ckan/default/lib/python3.8/site-packages/ckanext-dcor-theme-nspkg.pth:
    
      Traceback (most recent call last):
        File "/usr/lib/python3.8/site.py", line 175, in addpackage
          exec(line)
        File "<string>", line 1, in <module>
        File "<frozen importlib._bootstrap>", line 553, in module_from_spec
      AttributeError: 'NoneType' object has no attribute 'loader'
    
    Remainder of file ignored
    

    Not sure what is causing this, but it was solved for me by editing the relevant .pth file. Add a new line after the first semicolon.

    From

    import sys, types, os;has_mfs = sys.version_info > (3, 8);p = os.path.join(sys._getframe(1).$
    

    to

    import sys, types, os;
    has_mfs = sys.version_info > (3, 8);p = os.path.join(sys._getframe(1).$
    
    sed -i -- 's/os;has_mfs/os;\nhas_mfs/g' /usr/lib/ckan/default/lib/python3.8/site-packages/ckan*.pth
    
  • If you get import errors like this and you are running a development server:

    Traceback (most recent call last):
      File "/etc/ckan/default/wsgi.py", line 12, in <module>
        application = make_app(config)
      File "/usr/lib/ckan/default/src/ckan/ckan/config/middleware/__init__.py", line 56, in make_app
        load_environment(conf)
      File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 123, in load_environment
        p.load_all()
      File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 140, in load_all
        load(*plugins)
      File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 154, in load
        service = _get_service(plugin)
      File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 257, in _get_service
        raise PluginNotFoundException(plugin_name)
    ckan.plugins.core.PluginNotFoundException: dcor_schemas
    

    Please make sure that the ckan process/user has read (execute for directories) permission. The following might help, or you run UWSGI as root:

    chmod a+x /dcor-repos/*
    find /dcor-repos -type d -name ckanext |  xargs -0 chmod -R a+rx
    chmod -R a+rx /dcor-repos/dcor_control
    chmod -R a+rx /dcor-repos/dcor_shared
    
  • If you are having issues with HDF5 file locking and are storing your data on a network file storage:

    Traceback (most recent call last):
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/worker.py", line 812, in perform_job
        rv = job.perform()
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/job.py", line 588, in perform
        self._result = self._execute()
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/job.py", line 594, in _execute
        return self.func(*self.args, **self.kwargs)
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/ckanext/dcor_schemas/jobs.py", line 27, in set_dc_config_job
        with dclab.new_dataset(path) as ds:
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/load.py", line 63, in new_dataset
        return load_file(data, identifier=identifier, **kwargs)
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/load.py", line 22, in load_file
        return fmt(path, identifier=identifier, **kwargs)
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/fmt_hdf5.py", line 194, in __init__
        self._h5 = h5py.File(h5path, mode="r")
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/h5py/_hl/files.py", line 424, in __init__
        fid = make_fid(name, mode, userblock_size,
      File "/usr/lib/ckan/default/lib/python3.8/site-packages/h5py/_hl/files.py", line 190, in make_fid
        fid = h5f.open(name, flags, fapl=fapl)
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
      File "h5py/h5f.pyx", line 96, in h5py.h5f.open
    OSError: Unable to open file (unable to lock file, errno = 37, error message = 'No locks available')
    

    You have to disable file locking via the environment variable HDF5_USE_FILE_LOCKING=’FALSE’. The most convenient fix is to add the line:

    export HDF5_USE_FILE_LOCKING='FALSE'
    

    to /usr/lib/ckan/default/bin/activate.

    Also, you will have to set the environment variable for all configuration files (uwsgi and worker jobs in /etc/supervisor/conf.d/*.conf):

    # put this before the "command=" option.
    environment=HDF5_USE_FILE_LOCKING=FALSE
    

    Just to be sure, you could also add this to /etc/environment:

    HDF5_USE_FILE_LOCKING="FALSE"
    
  • If uploads to DCOR fail and you are getting these errors in the nginx logs:

    [crit] 983#983: *623 pwrite() "/var/lib/nginx/body/0000000001" failed (28: No space left on device)
    

    This means that your root partition does not have enough free space to cache uploaded files. A workaround is to move the data directly to the block storage on /data. Add this in the nginx configuration file (server section):

    client_body_temp_path /data/tmp/nginx 1 2;
    

    and make sure that www-data has rw access to this directory.

  • If your root partition is suddenly full, this might be due to the systemd journal in /var/logs. You can free up space by running:

    journalctl --vacuum-files=2
    

    To add a general limit on how large the journal may become, edit the file /etc/systemd/journald.conf and set:

    SystemMaxUse=200M
    

    It might also help to remove-purge the snapd package (Don’t do this if you are using snaps, e.g. for certbot!):

    apt purge snapd
    rm -rf /snap
    rm -rf /var/snap
    rm -rf /var/lib/snapd
    
  • Problems wih OSError: [Errno 28] No space left on device upon uploads of large files. The reason might be that uwsgi stores temporary files in /tmp. You could check this with:

    (default) root@server:/# lsof / | grep "/tmp"
    uwsgi      1301         www-data    7u   REG   0,28 2038633555 1304952 /tmp/#1304952 (deleted)
    uwsgi      1301         www-data   12u   REG   0,28 1558086333 1304953 /tmp/#1304953 (deleted)
    

    You could also check whether your CKAN installation is responsible for this (df -h shows less space than there should be) by restarting all services:

    supervisorctl restart all
    

    According to a PDF file that I found somewhere, uwsgi always stores its temporary files under /tmp, a behavior that can be controlled via the environment variable TMPDIR. Thus, the solution is to edit the uwsgi supervisor file /etc/supervisor/conf.d/ckan-uwsgi.conf and set this TPMDIR to something under /data:

    environment=SOMEOTHERVAR=FALSE,TMPDIR=/data/tmp/uwsgi
    
  • If downloads of large resources are aborted by the server after a short time, this might be because nginx caches the download on the root partition which does not have enough free space. You have to specify a cache location with sufficient free space in /etc/nginx/sites-enabled/ckan by uncommenting the line:

    proxy_temp_path /data/tmp/nginx/proxy 1 2;
    
  • If uploads fail with a timeout error message and in the logs you get:

    OSError: timeout during read(57344) on wsgi.input
    2021-09-07 09:20:43 uWSGI 127.0.0.1 (HTTP/1.0 500) POST /api/3/action/resource_create => 0 bytes in 8644 msecs to DCOR-Aid/0.6.4
    

    that probably means that the socket-timeout value for uWSGI is too low. A reason for that could be e.g. that the resources are written to a location with low write speed (e.g. NFS). A solution is to add the socket-timeout to /etc/ckan/default/ckan-uwsgi.ini:

    socket-timeout     = 7200
    
  • If uploads fail with the following error message in the ckan-uwsgi logs:

    2021-09-08 18:46:16 - [uwsgi-body-read] Error reading 6563 bytes. Content-Length: 15428164609 consumed: 2150065757 left: 13278098852 message: Client closed connection
    
    [...]
    
    OSError: error during read(8192) on wsgi.input
    

    The you probably have to disable proxy-buffering in nginx.

  • If you are getting RuntimeErrors in the CKAN logs on startup:

    RuntimeError: CKAN config option not found: /usr/lib/ckan/default/src/ckan/ckan.ini
    

    This is not a big problem, but to resolve it, you can add the CKAN_INI to the supervisor environment variable in /etc/supervisor/conf.d/ckan-uwsgi.conf:

    environment=SOMEVAR=FALSE,CKAN_INI=/etc/ckan/default/ckan.ini
    
  • If on CKAN>=2.10.1 you are getting errors about not being able to connect to SOLR on startup, such as:

    requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded
      with url: /solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json (Caused by NewConnectionError(
      '<urllib3.connection.HTTPConnection object at 0x7f18ec3f4310>: Failed to establish a new connection:
      [Errno 111] Connection refused'))
    2023-09-19 18:24:48,312 WARNI [ckan.lib.search] Problems were found while connecting to the SOLR server
    pysolr.SolrError: Failed to connect to server at http://localhost:8983/solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json:
      HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded with url:
      /solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection
      object at 0x7f18ec3f4310>: Failed to establish a new connection: [Errno 111] Connection refused'))
    

    This means that supervisor is starting before SOLR (or at the same time). The solution is to edit the supervisor systemd unit via:

    systemctl edit supervisor
    

    and add the SOLR depenendency like so:

    [Unit]
    Requires=solr.service
    After=solr.service