Source code for PyConf.tonic

###############################################################################
# (c) Copyright 2019-2021 CERN for the benefit of the LHCb Collaboration      #
#                                                                             #
# This software is distributed under the terms of the GNU General Public      #
# Licence version 3 (GPL Version 3), copied verbatim in the file "COPYING".   #
#                                                                             #
# In applying this licence, CERN does not waive the privileges and immunities #
# granted to it by virtue of its status as an Intergovernmental Organization  #
# or submit itself to any jurisdiction.                                       #
###############################################################################
"""
Tonic is a small Python package that aims to allow modular configuration whose
behaviour can be overridden in a way that is easy to comprehend and debug.

You might have seen uses of the ``@configurable`` decorator in Moore code. This
is part of Tonic. If you want to understand the motivation behind this, and
when you might want to use it, read on. The detailed documentation of the
`Tonic API`_ follows after that.

.. _tonic-design:

Designing flexible configuration
--------------------------------

We've structured the Moore configuration in a way that packages each small
step in separate functions. In this way, your HLT2 line calls a function that
makes pions, say, and this calls a function which produces tracks, which calls
a functions that loads raw data. Your line creates a stack of function calls::

    hlt2_line()
        calls make_pions()
            calls make_tracks()
                calls make_hits()
                    calls make_raw()

This separation of code helps to organise things.

But now you want to modify some behaviour of some function in the middle, say ``make_tracks``. How can you do this? There are a couple of ways you might come up with.

1. Modify the source code of the function directly.
2. Copy the function as some new version and use that.

Option 1 is perfectly valid, and is how many Moore developers work, as explained in :doc:`../tutorials/developing`.

Option 2 brings a problem. The *caller* of the function you copied is still
calling the original version. So, now you need to modify or copy that function.
And now you have to repeat this for the caller of *that* function! This gets
very cumbersome very quickly.

The ``@configurable`` decorator
-------------------------------

The `configurable <PyConf.tonic.configurable>` decorator is designed to help in these situations. It allows
you to override the arguments of a function wherever it happens to be called.
So even if it's called deep down the call stack, you still have some ability to
override its behaviour.

Take this representative example::

    from PyConf import configurable
    from PyConf.Algorithms import HitMaker, TrackMaker, SelectPions, ParticleCombiner
    from Moore.lines import DecisionLine


    def make_raw():
        pass


    def make_hits():
        return HitMaker(RawEvent=make_raw())


    @configurable
    def make_tracks(pT_threshold=200 * MeV):
        return TrackMaker(Hits=make_hits())


    @configurable
    def make_pions(max_pidk=5):
        return SelectPions(MaxPIDK=max_pidk, tracks=make_tracks())


    def dipion_line(name="Hlt2DiPionLine", prescale=1.0):
        pions = make_pions()
        dipions = ParticleCombiner(Decay="B0 -> pi+ pi-", Particles=[make_pions])
        return DecisionLine(name, algs=[dipions], prescale=prescale)

You can see the full chain going from the line to the raw event.

Now we want to study what effect changing the ``PIDK`` cut has on the rate of
our line. We could just modify the call to ``make_pions`` directly. This
requires modifying the source of the ``dipion_line`` function::

    def dipion_line(name="Hlt2DiPionLine", prescale=1.0):
        # Remember to uncomment this back when we're done!
        # pions = make_pions()
        pions = make_pions(max_pidk=0)
        dipions = ParticleCombiner(Decay="B0 -> pi+ pi-", Particles=[make_pions])
        return DecisionLine(name, algs=[dipions], prescale=prescale)

Instead of doing this, we can use the ``bind`` method that's made available on
all functions decorated with `configurable <PyConf.tonic.configurable>`::

    with make_pions.bind(max_pidk=0):
        line = dipion_line()

It's as easy as that! When you use ``bind`` with a `context manager`_ like
this, any calls to the 'bound function' (``make_pions`` in this case) will be
intercepted, and the argument value you've specified will override the original
value. In the example, the value of the ``max_pidk`` argument will be ``0``
inside the ``with`` block rather than the default of ``5``.

Multiple calls to `bind` can be used in the same ``with`` statement. This means
we could also modify the tracking threshold along with the PID cut::

    with make_pions.bind(max_pidk=0), make_tracks(pT_threshold=500 * MeV):
        line = dipion_line()

So, when should you use `configurable <PyConf.tonic.configurable>` to
decorate *your* functions? There are some cases when it *never* makes sense:

1. Functions with no arguments, as there's nothing to override.
2. Functions that you don't expect to be buried inside a call stack. Line
   functions, like our ``dipion_line``, are an example. They are usually
   called at the top level of script, so if we wanted to override some
   argument values we would just do so directly::

    line_standard = dipion_line()
    line_prescaled = dipion_line(name="Hlt2DiPionPrescaledLine", prescale=0.5)

Outside of these, it depends how you expect the function to be used. It's
generally safe to add `configurable <PyConf.tonic.configurable>`, but you can
also just omit it. We can always add it later if it turns out it's needed.

Remember that the standard development flow has the full source code checked
out; it's often easier just to modify it directly rather than jumping through
``bind`` calls!

.. note::

    Overriding of deeply nested components was something quite common when we
    used objects called ``Configurables``. These could be retrieved inside any
    scope based on a name: if you knew the name of the Configurable, you could
    retrieve it and modify its properties. This permits ultimate flexibility.

    The trouble with this approach is that any one can modify any Configurable
    at any time. It becomes tricky to keep track of exactly who is modifying
    what, and what piece of code sets the *final* value the Configurable ends
    up with. In an application like the trigger, it's very important to be able
    to understand exactly what's going on!

    Using the `configurable <PyConf.tonic.configurable>` decorator is an
    alternative that tries to make overriding more explicit. Everything
    happens in the callstack, and nothing outside it can mess around inside
    it. Using `bind` only modifies things within a very specific scope.

There are a couple of other useful features you can use when using
`configurable <PyConf.tonic.configurable>`, such as `tonic.debug <PyConf.tonic.configurable>` and `substitute`. This are described in
the `PyConf.tonic` documentation below.

.. _context manager: https://stackabuse.com/python-context-managers/

Tonic API
---------

Wrappers for defining functions that can be configured higher up the call stack.

Tonic provides the `@configurable <configurable>` decorator, which allows the
default values of keyword arguments to be overridden from higher up the
callstack with `bind`.

    >>> from PyConf import configurable
    >>> @configurable
    ... def f(a=1):
    ...     return a
    ...
    >>> with f.bind(a=2):
    ...     f()
    ...
    2
    >>> f()
    1

This allows for high-level configuration of behaviour deep within an
application; all that's needed is a reference to the `configurable` function
that one wishes to modify the behaviour of.

The idiomatic way of using tonic is define small, self-contained functions
which construct some object of interest. These functions should call other,
similarly self-contained functions to retrieve any components which are
dependencies. Like this, callers can override the behaviour of any function in
the call stack with `bind <_bind>`. Each function should expose configurable parameters
as keyword arguments.

To help debugging, bindings can be inspected using the `debug` context manager.

    >>> from PyConf import tonic
    >>> with tonic.debug():
    ...     f()
    ...     f(a=3)
    ...     with f.bind(a=2):
    ...         f()
    ...
    1
    3
    2

Functions marked `configurable` can also be substituted entirely with
`substitute`.

    >>> @configurable
    ... def echo(arg=123):
    ...     return arg
    ...
    >>> def echo_constant():
    ...     return 456
    ...
    >>> with echo.substitute(echo_constant):
    ...     echo()
    ...
    456
    >>> echo()
    123

tonic is named for Google's gin configuration framework [1]_ which served as
inspiration.

.. [1] https://github.com/google/gin-config
"""
from __future__ import absolute_import, division, print_function
import json
import logging
import os
import inspect
import sys
import warnings
from collections import namedtuple
from functools import partial
from traceback import extract_stack, format_exc
from contextlib import contextmanager
from types import FunctionType, BuiltinFunctionType
from wrapt import FunctionWrapper

__cache_disabled = False
__all_configurables = []
__bound_args_state = {}
log = logging.getLogger(__name__)


def __default_serializer(obj):
    if isinstance(obj, (FunctionType, BuiltinFunctionType)):
        # only hash objects where the hash depends on the contents
        return type(obj).__name__ + '#' + str(hash(obj))
    if isinstance(obj, frozenset):
        return list(obj)
    raise TypeError(repr(obj) + " is not serializable")


__cache_serializer = __default_serializer


[docs]def add_cache_serializer(f, *args, **kwargs): """Insert a function that can serialize custom objects. The function f is tried and if it raises TypeError, the previous global serializer function is attempted. Args: f (callable): ``f(obj, *args, **kwargs)`` must convert obj to a serializable representation or raise TypeError. """ if log.isEnabledFor(logging.DEBUG): log.debug("Adding cache serializer {}(*{}, **{})".format( _print_live_object(f), args, kwargs)) global __cache_serializer prev = __cache_serializer def cache_serializer(obj): try: return f(obj, *args, **kwargs) except TypeError as e: if sys.version_info[0] >= 3: return prev(obj) else: # Python 2 compatibility (print chain of exceptions) last_exception = format_exc() try: return prev(obj) except: raise TypeError( str(e) + '\nException occurred while handling another exception:\n\n' + last_exception) __cache_serializer = cache_serializer
def _print_live_object(x): try: name = x.__qualname__ except AttributeError: name = x.__name__ try: source = os.path.basename(inspect.getsourcefile(x)) try: loc = source + ":" + str(inspect.getsourcelines(x)[1]) except OSError: loc = source return '<{}() at {}>'.format(name, loc) except TypeError: return str(x) def _is_configurable(func): """Return True if `func` has been marked `@configurable`.""" return hasattr(func, "_bound_args_stack") def _has_bound_args(func): """Return True if `func` has bound arguments.""" return _is_configurable(func) and len(func._bound_args_stack) > 0 def _has_substitution(func): """Return True if `func` has been substituted.""" return _is_configurable(func) and func._substitute is not None def _keyword_params(func): """Return the function parameters that can be passed by name.""" try: sig = inspect.signature(func) kinds = (inspect.Parameter.POSITIONAL_OR_KEYWORD, inspect.Parameter.KEYWORD_ONLY) return [n for n, p in sig.parameters.items() if p.kind in kinds] except AttributeError: return inspect.getargspec(func).args def _has_var_keyword(func): """Return whether a function accepts **kwargs.""" try: sig = inspect.signature(func) return any(p.kind == inspect.Parameter.VAR_KEYWORD for p in sig.parameters.values()) except AttributeError: return bool(inspect.getargspec(func).keywords) def _bound_bind(configurable, scoped=True): """Return a `bind` method that is bound to a configurable. If scoped (default), the returned bind method returns a context manager. If not scoped, the return value of the method is None. Args: configurable: `@configurable` function to bind to. scoped (bool): If the bind is scoped or global. """ # Record when this configurable was called within a `bind` scope configurable._called = False keyword_params = _keyword_params(configurable) has_no_var_keyword = not _has_var_keyword(configurable) def bind(**kwargs): """Bind a value to the parameters of a configurable. The changes made to the default argument values implied by `.bind(...)` are only valid within the scope that the `.bind(...)` call is made. The changes made by `bind` then go 'out of scope' when leaving the scope. Scoping is implemented as a context manager. A warning is raised if the `bind` target function is not called within the context. Args: **kwargs: Parameters and values. """ if _has_substitution(configurable): warnings.warn( "bind call on {} will have no effect; substituted by {}". format(configurable, configurable._substitute)) bound_args_stack = configurable._bound_args_stack if not scoped and bound_args_stack: last_scoped = any(a.scoped for a in bound_args_stack[-1].values()) if last_scoped: raise RuntimeError( 'Cannot call global_bind after bind ({})'.format( next(iter(bound_args_stack[-1].values())))) if scoped: # Reset the called flag; will check it later and warn if the # function was not called within the `bind` scope configurable._called = False for param_name, param_value in kwargs.items(): if has_no_var_keyword and param_name not in keyword_params: raise ValueError("{} does not have a parameter '{}'".format( _print_live_object(configurable), param_name)) # TODO how can we do type checking here on `value`? if log.isEnabledFor(logging.DEBUG): # record stack trace to be shown in errors and warnings stack = extract_stack()[:-2] else: stack = None bound_args = { k: BoundArgument(v, scoped, [stack]) for k, v in kwargs.items() } # TODO can we detect some overriding selectors already here? configurable._bound_args_stack.append(bound_args) __bound_args_state[hash(configurable)] = configurable._bound_args_stack try: yield finally: if scoped: configurable._bound_args_stack.pop() if not configurable._bound_args_stack: del __bound_args_state[hash(configurable)] if not configurable._called: # Report the stack frame of the `bind` call for debugging. # Frame 0 is here, 1 is the bind partial, 2 is the bind call itself stack_frame = inspect.stack()[2] tb = ' File "{}", line {}, in {}\n {}'.format( stack_frame[1], stack_frame[2], stack_frame[3], stack_frame[4][0] if stack_frame[4] else '') warnings.warn( ('Bound function {} was not called within a bind. ' 'Stack trace:\n{}'.format( _print_live_object(configurable), tb))) if scoped: return contextmanager(bind) else: return lambda **kwargs: next(bind(**kwargs)) # TODO (RM): split context manager case and unscoped bind def _stack_warn_summary(stack): if stack is not None: return '{}:{}'.format(os.path.split(stack[-1][0])[1], stack[-1][1]) else: return '<enable debugging to get file:line>'
[docs]class BoundArgument( namedtuple('BoundArgument', ['value', 'scoped', 'stacks'])): def __str__(self): locs = map(_stack_warn_summary, self.stacks) return '{!r} ({})'.format(self.value, ', '.join(locs)) def __repr__(self): locs = map(_stack_warn_summary, self.stacks) return 'BoundArgument(value={!r}, scoped={!r}, stacks=<{}>)'.format( self.value, self.scoped, ', '.join(locs))
ForcedArgument = namedtuple('ForcedArgument', ['value'])
[docs]def forced(value): """Force bind an argument, overriding higher-level binds. """ return ForcedArgument(value)
def _update_bound_args(bound_args, updates, stacklevel): """Return updated bound arguments according to the precedence semantics. """ bound_args = bound_args.copy() for param, new_value in updates.items(): is_forced = isinstance(new_value.value, ForcedArgument) if is_forced: # strip the ForcedArgument wrapper new_value = BoundArgument(new_value.value.value, new_value.scoped, new_value.stacks) if param not in bound_args: bound_args[param] = new_value elif new_value.value != bound_args[param].value: bound_arg = BoundArgument( new_value.value, new_value.scoped, bound_args[param].stacks + new_value.stacks) # Higher-level binds take precedence over deeper binds verb = 'overridden by forced' if is_forced else 'shadows' warnings.warn( 'multiple matches for {}: higher-level {} {} {}'.format( param, bound_args[param], verb, new_value), stacklevel=stacklevel + 1) if is_forced: # unless forced("value") is used bound_args[param] = bound_arg return bound_args def _bound_parameters(configurable, stacklevel): """Return the parameters bound to configurable given the scope stacks.""" bound_args = {} for updates in configurable._bound_args_stack: bound_args = _update_bound_args( bound_args, updates, stacklevel=stacklevel + 1) return bound_args
[docs]def bound_parameters(configurable): """Return the parameters bound to configurable in the current stack scope.""" bound_args = _bound_parameters(configurable, stacklevel=2) return {k: v.value for k, v in bound_args.items()}
def _bound_substitute(configurable, scoped=True): def substitute(func): """Substitute the body of the bound configurable with `func`. After substitution, any call to the original configurable will be replaced with a call to `func`. Scoping is implemented as a context manager. >>> @configurable ... def echo(arg=123): ... return arg ... >>> def replacement(): ... return 456 ... >>> with echo.substitute(replacement): ... print(echo()) ... 456 Calling this on a configurable function that has already been substituted will raise a warning, and the original substitution will not be overridden. To further increase clarity, warnings are also raised if `bind` has already been called on the configurable before this method is invoked, and if `bind` is called after this method is invoked. Raises ------ ValueError: The substitute function `func` is marked `@configurable`. """ if _is_configurable(func): raise ValueError( "Substitute function should not be marked @configurable") if _has_bound_args(configurable): warnings.warn( "binds of {} will be hidden by substitution with {}".format( _print_live_object(configurable), func)) # Do nothing if the function already has a substitute noop = _has_substitution(configurable) if noop: warnings.warn("Function {} already has a substitute {}".format( _print_live_object(configurable), _print_live_object(configurable._substitute))) else: configurable._substitute = func yield if not noop: configurable._substitute = None return contextmanager(substitute) def _configurable_wrapper(wrapped, _, args, kwargs): """Wrapper for methods marked `@configurable`.""" if _has_substitution(wrapped): # do not expect calls to the substituted function to be cached return wrapped._substitute(*args, **kwargs) if args: wrapped_params = _keyword_params(wrapped) if len(args) > len(wrapped_params): raise TypeError( 'too many positional arguments given, expected <={}, gave {}'. format(len(wrapped_params), len(args))) named_args = dict(zip(wrapped_params, args)) duplicates = set(kwargs).intersection(named_args) if duplicates: raise TypeError('{} got multiple values for {}'.format( _print_live_object(wrapped), duplicates)) kwargs.update(named_args) if log.isEnabledFor(logging.DEBUG): stack = extract_stack()[:-1] else: stack = None direct_args = { k: BoundArgument(v, True, [stack]) for k, v in kwargs.items() } bound_args = _bound_parameters(wrapped, stacklevel=2) new_bound_args = _update_bound_args(bound_args, direct_args, stacklevel=2) kwargs = {k: v.value for k, v in new_bound_args.items()} if log.isEnabledFor(logging.DEBUG): # Prepare the debugging print-out current_frame = inspect.currentframe() caller_frame = inspect.getouterframes(current_frame, 2)[1] descriptors = [] for pname in sorted(kwargs.keys()): # Show how we arrived at the value we're going to use if pname in kwargs: if pname in bound_args and kwargs[pname] == bound_args[ pname].value: # Value was specified with `bind` ptype = 'bound' else: # Value was given at the call site ptype = 'given' else: # Default value is used ptype = 'default' d = '{} = {} ({})'.format(pname, kwargs[pname], ptype) descriptors.append(d) params = ( '\n ' + '\n '.join(descriptors)) if descriptors else ' NONE' log.debug( 'Calling @configurable {func} from {file}:{line} with non-default parameters:{params}' .format( func=_print_live_object(wrapped), file=caller_frame[3], line=caller_frame[2], params=params)) # Set the called flag so it can be used by `bind` wrapped._called = True if not wrapped._serialize or __cache_disabled: return wrapped(**kwargs) # bound_args_state = [c._bound_args_stack for c in __all_configurables] # bound_args_state = { # hash(c): c._bound_args_stack # for c in __all_configurables if c._bound_args_stack # } bound_args_state = __bound_args_state # TODO improve cache hits # - in principle we only need to collect the bound arguments of the # configurables that are called (which we can obtain from the first # call to `wrapped` with the given kwargs). # - because `wrapped` is in __all_configurables, `cache_key` will # be different depending on whether `wrapped.bind` is used on it, # even if `kwargs` are identical. Can we safely remove `wrapped` # from `bound_args_state`? if log.isEnabledFor(logging.DEBUG): log.debug("Calling {}({}) with bound args state {}".format( _print_live_object(wrapped), kwargs, bound_args_state)) try: cache_key = (wrapped._serialize(kwargs), wrapped._serialize(bound_args_state)) assert cache_key[0] is not None and cache_key[1] is not None log.debug("hash(cache_key) = {}".format(hash(cache_key))) # log.debug('cache_key = {}'.format(cache_key)) except TypeError as e: log.info("Cannot determine key to cache {}({})\n {}".format( _print_live_object(wrapped), kwargs, str(e))) cache_key = None if cache_key is None: return wrapped(**kwargs) try: result, result_key = wrapped._cache[cache_key] new_result_key = wrapped._serialize(result) if new_result_key != result_key: raise RuntimeError( "Cached result of {}({}) has been modified from\n" " {}\nto\n {}".format( _print_live_object(wrapped), kwargs, result_key, new_result_key)) log.debug("Cache hit!") return result except KeyError: log.debug("Cache miss!") pass result = wrapped(**kwargs) try: # obtain an immutable representation of result result_key = wrapped._serialize(result) wrapped._cache[cache_key] = (result, result_key) except TypeError as e: log.info( 'Cannot serialize return value of {}, not caching\n {}'.format( _print_live_object(wrapped), str(e))) wrapped._serialize = None # do not try to cache further calls return result
[docs]def configurable(wrapped=None, cached=False): """Mark a function as configurable. The behaviour of a configurable function can be modified using the bind syntax: >>> @configurable ... def f(a=1): ... return a ... >>> with f.bind(a=2): ... f() ... 2 >>> f() 1 """ if wrapped is None: # the function to be decorated hasn't been given yet # so we just collect the optional keyword arguments. return partial(configurable, cached=cached) __all_configurables.append(wrapped) wrapped._serialize = (partial(json.dumps, default=__cache_serializer) if cached else None) wrapped._cache = {} wrapped._bound_args_stack = [] wrapped.bind = _bound_bind(wrapped) wrapped.global_bind = _bound_bind(wrapped, scoped=False) wrapped._substitute = None wrapped.substitute = _bound_substitute(wrapped) # Change name of code object such that a custom name appears in # stack frames and in cProfile. This avoids the decorator to squish # call relationships into [f1,f2,...] -> decorator -> [g1,g2,...] # See https://stackoverflow.com/q/9386636/1630648 wrapper = FunctionType( _configurable_wrapper.__code__.replace( co_name=f"configurable({wrapped.__name__})"), _configurable_wrapper.__globals__) return FunctionWrapper(wrapped=wrapped, wrapper=wrapper)
[docs]@contextmanager def debug(): """Context manager that enables debug messaging from tonic.""" level = log.level log.setLevel(logging.DEBUG) try: yield finally: log.setLevel(level)
[docs]@contextmanager def disable_cache(): """Context manager that disables caching of configurable calls.""" global __cache_disabled __cache_disabled = True try: yield finally: __cache_disabled = False