Troubleshooting

When setting up CKAN error email notifications, emails are sent for every file accessed on the server. Set the logging level to “WARNING” in all sections in /etc/ckan/default/ckan.ini.

If you get the following errors in /var/log/ckan/ckan-uwsgi.stderr.log:

Error processing line 1 of /usr/lib/ckan/default/lib/python3.8/site-packages/ckanext-dcor-theme-nspkg.pth:

  Traceback (most recent call last):
    File "/usr/lib/python3.8/site.py", line 175, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored

Not sure what is causing this, but it was solved for me by editing the relevant .pth file. Add a new line after the first semicolon.

From

import sys, types, os;has_mfs = sys.version_info > (3, 8);p = os.path.join(sys._getframe(1).$

to

import sys, types, os;
has_mfs = sys.version_info > (3, 8);p = os.path.join(sys._getframe(1).$

sed -i -- 's/os;has_mfs/os;\nhas_mfs/g' /usr/lib/ckan/default/lib/python3.8/site-packages/ckan*.pth

If you get import errors like this and you are running a development server:

Traceback (most recent call last):
  File "/etc/ckan/default/wsgi.py", line 12, in <module>
    application = make_app(config)
  File "/usr/lib/ckan/default/src/ckan/ckan/config/middleware/__init__.py", line 56, in make_app
    load_environment(conf)
  File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 123, in load_environment
    p.load_all()
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 140, in load_all
    load(*plugins)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 154, in load
    service = _get_service(plugin)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 257, in _get_service
    raise PluginNotFoundException(plugin_name)
ckan.plugins.core.PluginNotFoundException: dcor_schemas

Please make sure that the ckan process/user has read (execute for directories) permission. The following might help, or you run UWSGI as root:

chmod a+x /dcor-repos/*
find /dcor-repos -type d -name ckanext |  xargs -0 chmod -R a+rx
chmod -R a+rx /dcor-repos/dcor_control
chmod -R a+rx /dcor-repos/dcor_shared

If you are having issues with HDF5 file locking and are storing your data on a network file storage:

Traceback (most recent call last):
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/worker.py", line 812, in perform_job
    rv = job.perform()
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/job.py", line 588, in perform
    self._result = self._execute()
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/rq/job.py", line 594, in _execute
    return self.func(*self.args, **self.kwargs)
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/ckanext/dcor_schemas/jobs.py", line 27, in set_dc_config_job
    with dclab.new_dataset(path) as ds:
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/load.py", line 63, in new_dataset
    return load_file(data, identifier=identifier, **kwargs)
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/load.py", line 22, in load_file
    return fmt(path, identifier=identifier, **kwargs)
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/dclab/rtdc_dataset/fmt_hdf5.py", line 194, in __init__
    self._h5 = h5py.File(h5path, mode="r")
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/h5py/_hl/files.py", line 424, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/usr/lib/ckan/default/lib/python3.8/site-packages/h5py/_hl/files.py", line 190, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 37, error message = 'No locks available')

You have to disable file locking via the environment variable HDF5_USE_FILE_LOCKING=’FALSE’. The most convenient fix is to add the line:

export HDF5_USE_FILE_LOCKING='FALSE'

to /usr/lib/ckan/default/bin/activate.

Also, you will have to set the environment variable for all configuration files (uwsgi and worker jobs in /etc/supervisor/conf.d/*.conf):

# put this before the "command=" option.
environment=HDF5_USE_FILE_LOCKING=FALSE

Just to be sure, you could also add this to /etc/environment:

HDF5_USE_FILE_LOCKING="FALSE"

If uploads to DCOR fail and you are getting these errors in the nginx logs:
```
[crit] 983#983: *623 pwrite() "/var/lib/nginx/body/0000000001" failed (28: No space left on device)
```
This means that your root partition does not have enough free space to cache uploaded files. A workaround is to move the data directly to the block storage on /data. Add this in the nginx configuration file (server section):
```
client_body_temp_path /data/tmp/nginx 1 2;
```
and make sure that www-data has rw access to this directory.
If your root partition is suddenly full, this might be due to the systemd journal in /var/logs. You can free up space by running:
```
journalctl --vacuum-files=2
```
To add a general limit on how large the journal may become, edit the file /etc/systemd/journald.conf and set:
```
SystemMaxUse=200M
```
It might also help to remove-purge the snapd package (Don’t do this if you are using snaps, e.g. for certbot!):
```
apt purge snapd
rm -rf /snap
rm -rf /var/snap
rm -rf /var/lib/snapd
```
Problems wih OSError: [Errno 28] No space left on device upon uploads of large files. The reason might be that uwsgi stores temporary files in /tmp. You could check this with:
```
(default) root@server:/# lsof / | grep "/tmp"
uwsgi      1301         www-data    7u   REG   0,28 2038633555 1304952 /tmp/#1304952 (deleted)
uwsgi      1301         www-data   12u   REG   0,28 1558086333 1304953 /tmp/#1304953 (deleted)
```
You could also check whether your CKAN installation is responsible for this (df -h shows less space than there should be) by restarting all services:
```
supervisorctl restart all
```
According to a PDF file that I found somewhere, uwsgi always stores its temporary files under /tmp, a behavior that can be controlled via the environment variable TMPDIR. Thus, the solution is to edit the uwsgi supervisor file /etc/supervisor/conf.d/ckan-uwsgi.conf and set this TPMDIR to something under /data:
```
environment=SOMEOTHERVAR=FALSE,TMPDIR=/data/tmp/uwsgi
```
If downloads of large resources are aborted by the server after a short time, this might be because nginx caches the download on the root partition which does not have enough free space. You have to specify a cache location with sufficient free space in /etc/nginx/sites-enabled/ckan by uncommenting the line:
```
proxy_temp_path /data/tmp/nginx/proxy 1 2;
```
If uploads fail with a timeout error message and in the logs you get:
```
OSError: timeout during read(57344) on wsgi.input
2021-09-07 09:20:43 uWSGI 127.0.0.1 (HTTP/1.0 500) POST /api/3/action/resource_create => 0 bytes in 8644 msecs to DCOR-Aid/0.6.4
```
that probably means that the socket-timeout value for uWSGI is too low. A reason for that could be e.g. that the resources are written to a location with low write speed (e.g. NFS). A solution is to add the socket-timeout to /etc/ckan/default/ckan-uwsgi.ini:
```
socket-timeout     = 7200
```

If uploads fail with the following error message in the ckan-uwsgi logs:

2021-09-08 18:46:16 - [uwsgi-body-read] Error reading 6563 bytes. Content-Length: 15428164609 consumed: 2150065757 left: 13278098852 message: Client closed connection

[...]

OSError: error during read(8192) on wsgi.input

The you probably have to disable proxy-buffering in nginx.

If you are getting RuntimeErrors in the CKAN logs on startup:
```
RuntimeError: CKAN config option not found: /usr/lib/ckan/default/src/ckan/ckan.ini
```
This is not a big problem, but to resolve it, you can add the CKAN_INI to the supervisor environment variable in /etc/supervisor/conf.d/ckan-uwsgi.conf:
```
environment=SOMEVAR=FALSE,CKAN_INI=/etc/ckan/default/ckan.ini
```

If on CKAN>=2.10.1 you are getting errors about not being able to connect to SOLR on startup, such as:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded
  with url: /solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json (Caused by NewConnectionError(
  '<urllib3.connection.HTTPConnection object at 0x7f18ec3f4310>: Failed to establish a new connection:
  [Errno 111] Connection refused'))
2023-09-19 18:24:48,312 WARNI [ckan.lib.search] Problems were found while connecting to the SOLR server
pysolr.SolrError: Failed to connect to server at http://localhost:8983/solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json:
  HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded with url:
  /solr/ckan/select/?q=%2A%3A%2A&rows=1&wt=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection
  object at 0x7f18ec3f4310>: Failed to establish a new connection: [Errno 111] Connection refused'))

This means that supervisor is starting before SOLR (or at the same time). The solution is to edit the supervisor systemd unit via:

systemctl edit supervisor

and add the SOLR depenendency like so:

[Unit]
Requires=solr.service
After=solr.service