The intent of the test is to make sure that log_json defaults
to True when JUPYTER_ENABLE_JSON_LOGGING is set to "true" so
this makes the test more explicit and to pass when json-logging
is not installed so that does not interfere due to validation.
Make sure the default value is handled properly based
on environment variables. Also checks the validation
code based on whether or not json_logging is imported.
This commit updates the super usage. Because Python 2 is not supported
anymore, super usage can be updated such that super is called without
any arguments in the default case where super is called with the class
name and self.
Note that all usage of super has not been updated - a few cases which
smelled funny have been ignored.
On Python 3, the default source file encoding for Python files is utf-8
and because Python 2 is no longer supported, the utf8 coding cookies can
be removed
In order to use the same logger from the notebook app we
need to pass it through to the log_request function. To
do that we need to use a partial function so the `log` kwarg
is available. If not provided the default logger is the same
as before, the tornado web access logger.
This is necessary for the `log_json` settings to be used for
both the notebook app logs and the web app request logging.
There is still a todo needed here so that the individual
request fields get formatted separately rather than dumped
into the `msg` field.
This adds the `NotebookApp.log_json` CLI option which defaults to
False. The default value comes from the `JUPYTER_ENABLE_JSON_LOGGING`
environment variable. A validation hook is added such that if json
logging is enabled but the `json-logging` package is not installed,
an error is logged and the effective `log_json` value is False.
If enabled and the `json-logging` package is installed, the `init_logging`
method will initialize the non-web [1] json logging formatter.
[1] https://github.com/bobbui/json-logging-python#21-non-web-application-log
This gets us closer to logging parts of the request
as separate json fields but not all the way there.
For example:
```json
{"written_at": "2020-10-07T21:52:16.975Z", "written_ts": 1602107536975497000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 54}
```
This adds a new extra package dependency on `json-logging`
and an environment variable, which when set to "true" regardless
of case, will try to use the json-logging non-web app log formatter.
If the `json-logging` package is not installed but the environment
variable is True, something like this will be logged but it will not
crash the application:
```
$ ENABLE_JSON_LOGGING=true jupyter notebook
Unable to use json logging due to missing packages. Run "pip install notebook[json-logging]" to fix.
Traceback (most recent call last):
File "/home/osboxes/jupyter/notebook/notebook/notebookapp.py", line 144, in <module>
import json_logging
ModuleNotFoundError: No module named 'json_logging'
```
Initially I tried to add a new Bool config option to toggle this but
the problem is (from my limited understanding of traitlets and tornado)
is that `_log_formatter_cls` needs to be set early and trying to se the
log formatter later in `init_logging` is too late - or at least I couldn't
figure out a way to reset the log formatter dynamically (I tried calling
`_log_format_changed` [1] but that didn't work).
With this working you get logs like this:
```
{"written_at": "2020-10-07T21:10:51.606Z", "written_ts": 1602105051606265000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 9.26ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 49}
{"written_at": "2020-10-07T21:10:54.443Z", "written_ts": 1602105054443309000, "msg": "Starting buffering for f260ddd7-938c-42d0-ac3b-455bea76694f:49f30b53fc4b4ec6a8f2fb748a171613", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "kernelmanager", "line_no": 222}
{"written_at": "2020-10-07T21:10:54.446Z", "written_ts": 1602105054446264000, "msg": "Kernel shutdown: f260ddd7-938c-42d0-ac3b-455bea76694f", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "multikernelmanager", "line_no": 201}
```
An obvious improvement here would to be able to split the `msg` fields apart
so that we can log things like response status code, request method, request
URL, response_time_ms etc. That should be possible with the `JSONLogWebFormatter`
from `json-logging` but when I tried using that I was getting errors from
the library about a correlation id (which is based on a request header we
don't use). The `json-logging` library supports several web frameworks like
Flask but unfortunately does not have built in support for Tornado, but it does
support custom formatters so that might be a follow up option to improve on this.
[1] https://github.com/ipython/traitlets/blob/4.3.3/traitlets/config/application.py#L195Closes#5798