gentag: Simple and powerful tagging for Python objects

https://travis-ci.org/xolox/python-gentag.svg?branch=master https://coveralls.io/repos/xolox/python-gentag/badge.svg?branch=master

The Python package gentag provides simple and powerful tagging for arbitrary Python objects. After defining your tags and associated objects you can query for the difference, intersection and union of tags to select specific objects. The package is currently tested on cPython 2.6, 2.7, 3.4, 3.5, 3.6 and PyPy (2.7).

Status

While the ideas behind gentag have been floating around in my head since 2012 I didn’t publish this as a standalone Python package until 2018 which explains why I’m publishing the initial version as a beta. Looking ahead towards the future:

  • It may be that the current version serves my needs fine and at some point I decide to replace the ‘beta’ label with a ‘stable’ label without making any substantive changes.
  • Releasing gentag is one step in the direction of releasing another Python package that I’ve been thinking about for a very long time now and if I turn out to have trouble integrating gentag into that package I won’t hesitate to make (potentially major) changes to gentag.

Installation

The gentag package is available on PyPI which means installation should be as simple as:

$ pip install gentag

There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).

Usage

The following sections give an overview of how to get started. For more details about the Python API please refer to the API documentation available on Read the Docs.

Creating a scope

To get started you have to create a Scope object:

>>> from gentag import Scope
>>> tags = Scope()

The purpose of Scope objects is to group together related tags into an evaluation context for tag expressions.

Defining tags

Scope instances allow you to define tags and associated objects:

>>> tags.define('archiving', ['deb', 'tar', 'zip'])
>>> tags.define('compression', ['bzip2', 'deb', 'gzip', 'lzma', 'zip'])
>>> tags.define('encryption', ['gpg', 'luks', 'zip'])

Querying tags

Once you’ve defined some tags and associated objects you can query them, for example here we query for the union of two tags:

>>> tags.evaluate('archiving | encryption')
['deb', 'gpg', 'luks', 'tar', 'zip']

These tag expressions can get arbitrarily complex:

>>> tags.evaluate('(archiving | encryption) & compression')
['deb', 'zip']

Supported operators

The following operators can be used to compose tags:

Operator Set operation
& intersection
| union
- difference
^ symmetric difference

These operators create new Tag objects that can be composed further. Although tags composed at runtime in Python syntax don’t have a name, it is possible define named composite tags using the Scope.define() method (see below).

The default tag

There’s one special tag that is always available under the name ‘all’. As you might have guessed it provides access to a set with all tagged objects:

>>> tags.evaluate('all')
['bzip2', 'deb', 'gpg', 'gzip', 'luks', 'lzma', 'tar', 'zip']

This can be useful to select all but a specific tag of objects:

>>> tags.evaluate('all - encryption')
['bzip2', 'deb', 'gzip', 'lzma', 'tar']

Named composite tags

The expressions shown in the querying tags section above demonstrate that tags can be composed using set operators. You can also define a named tag based on an expression:

>>> tags.define('flexible', 'archiving & compression & encryption')

Such named composite tags can be evaluated like regular tags:

>>> tags.evaluate('flexible')
['zip']

You can also nest composite tags inside other composite tags.

History

The example in the usage section isn’t actually very useful, this is partly because I didn’t want a complicated subject matter to distract readers from usage instructions :-).

The actual use case that triggered the ideas behind gentag presented itself to me in 2012 when I wanted to query a database of more than 200 Linux server names categorized by aspects such as:

  • The distributor id (a string like ‘debian’ or ‘ubuntu’).
  • The distribution codename (a string like ‘trusty’ or ‘xenial’).
  • The server’s role (database, mailserver, webserver, etc).
  • The server’s environment (production, development).

The easy selection of subsets of servers for my Python programs to operate on quickly evolved into my main interface for selecting groups of servers. Since then I’ve wanted to use similar functionality in other places, but found it too much work to develop one-off solutions. This is how gentag was born.

About the name

The name gentag stands for “generative tags”, because the package allows new tags to be composed (generated) from existing tags. I’d like to thank my colleague Seán Murphy for coming up with this name :-).

Contact

The latest version of gentag is available on PyPI and GitHub. The documentation is hosted on Read the Docs. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.

License

This software is licensed under the MIT license.

© 2018 Peter Odding.

API documentation

The following documentation is based on the source code of version 2.0 of the gentag package:

gentag

Simple and powerful tagging for Python objects.

class gentag.ObjectFactory(**kw)

A mapping of tag names to set objects.

This class is used by evaluate() during expression parsing to resolve tag names to the associated objects.

__getitem__(name)

Get the objects associated to the given tag.

Parameters:name – The name of the tag (a string).
Returns:A set of objects associated to the tag.
Raises:EmptyTagError when no associated objects are available.
tags

The TagFactory from which objects are retrieved.

Note

The tags property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named tags (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

class gentag.Scope(**kw)

To use gentag everything starts with a Scope object.

A Scope object groups together related Tag objects and provides methods to define new tags and evaluate tag expressions.

add_object(value, *tags)

Add an object to the scope.

Parameters:
  • value – The object to add (any hashable value).
  • tags – The names of tags to associate the object with.
define(name, value)

Define the value of a tag.

Parameters:
  • name – The name of the tag (a string).
  • value – A string containing an expression or an iterable of values associated to the given tag.
Returns:

The Tag object.

Raises:

ValueError for unsupported value types.

evaluate(expression)

Get the objects matching the given expression.

Parameters:expression – The tag expression to evaluate (a string).
Returns:A sorted list with matching objects.
Raises:TagExpressionError when the given expression cannot be evaluated due to a syntax error.

This method is a wrapper for evaluate_raw() that calls sorted() on the matching objects before returning them.

evaluate_raw(expression)

Get the objects matching the given expression.

Parameters:expression – The tag expression to evaluate (a string).
Returns:A set with matching objects.
Raises:TagExpressionError when the given expression cannot be evaluated due to a syntax error.

This method uses eval() to evaluate the expression given by the caller, however it overrides __builtins__ to avoid leaking any built-ins into the eval() call.

get_all_objects()

Get all objects defined in the scope.

Returns:A set of user defined objects.

This method iterates over the defined tags and collects all tagged objects. Because the evaluation of tags with an expression won’t change the result of get_all_objects() such tags are skipped for performance reasons.

objects

A mapping of tag names to set objects (an ObjectFactory instance).

parse(value)

Parse a string expression into a Tag object.

Parameters:value – The tag expression to parse (a string).
Returns:A Tag object.
Raises:ValueError for unsupported value types.

During normal use you won’t need the parse() method, in fact it’s not currently being used anywhere in gentag. This method was originally created with the idea of having define() parse string expressions up front to validate their syntax, however this approach has since been abandoned. The parse() method now remains because it may be useful to callers for unforeseen use cases.

sorted(objects)

Sort the given objects in a human friendly way.

Parameters:objects – The objects to sort (an iterable).
Returns:The sorted objects (a list).

If all of the objects are strings they are sorted using natural order sorting, otherwise the sorted() function is used.

tags

A mapping of tag names to Tag objects (an TagFactory instance).

Note

The tags property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

class gentag.Tag(**kw)

A Tag represents a set of objects with a common name.

There are three kinds of tags:

Simple tags:
When you set objects the tag becomes a ‘simple tag’ that associates the name of the tag to the given objects.
Composite tags:
When you set expression the tag becomes a ‘composite tag’ that associates the name of the tag to an expression that selects a subset of tagged objects.
The special default tag:
When identifier is set to DEFAULT_TAG_NAME the value of objects is a set that contains all tagged objects.
__and__(other)

Use compose() to create a Tag that gives the intersection of two Tag objects.

__iter__()

Iterate over the matching objects.

__or__(other)

Use compose() to create a Tag that gives the union of two tags.

__sub__(other)

Use compose() to create a Tag that gives the difference of two tags.

__xor__(other)

Use compose() to create a Tag that gives the symmetric difference of two tags.

compose(operator, other)

Create a composite tag.

Parameters:
  • operator – The operator used to compose the tags (a string).
  • other – The other Tag object.
Returns:

A new Tag object or NotImplemented (if other isn’t a Tag object).

The compose() method is a helper for __and__(), __or__(), __sub__() and __xor__() that generates an expression based on the id_or_expr values of the two Tag objects.

expression

A Python expression to select matching objects (a string or None).

Note

The expression property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

id_or_expr

The identifier (if set) or expression (a string).

The value of id_or_expr is used by compose() to generate expression values for composite Tag objects.

Note

The id_or_expr property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

identifier

An identifier based on name (a string or None).

Note

The identifier property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

name

A user defined name for the tag (a string or None).

Tags created using define() always have name set but tags composed using Python expression syntax are created without a name.

Note

The name property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

objects

The values associated to the tag (a set).

If objects isn’t set it defaults to a computed value:

  • If identifier is DEFAULT_TAG_NAME then get_all_objects() is used to get the associated values.
  • If expression is set it will be evaluated and the matching objects will be returned.
  • Otherwise a new, empty set is created, bound to the Tag and returned.

Note

The objects property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

scope

The Scope in which the tag has been defined.

Note

The scope property is a custom_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named scope (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

class gentag.TagFactory(**kw)

A mapping of tag names to Tag objects.

The names of tags are normalized using generate_id().

__getitem__(name)

Get or create a tag.

Parameters:name – The name of the tag (a string).
Returns:A Tag object.
__iter__()

Iterate over the defined Tag objects.

map

A dictionary with tags created by this TagFactory.

Note

The map property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

scope

The Scope that’s using this TagFactory.

Note

The scope property is a custom_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named scope (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

gentag.generate_id(value, normalized)

Generate a Python identifier from a user provided string.

Parameters:
  • value – The user provided string.
  • normalizedTrue to normalize the identifier to its canonical form without underscores, False to preserve some readability.
Returns:

The generated identifier (a string).

Raises:

ValueError when nothing remains of value after normalization.

If you just want a Python identifier from a user defined string you can use normalized=False:

>>> generate_id('Any user-defined string', normalized=False)
'any_user_defined_string'

However if you want to use the identifier for comparison or as a key in a dictionary then its better to use normalized=True:

>>> generate_id('Any user-defined string', normalized=True)
'anyuserdefinedstring'

The following example shows that values that would otherwise start with a digit are prefixed with an underscore, because Python identifiers cannot start with a digit:

>>> generate_id('42', normalized=True)
'_42'

gentag.exceptions

Custom exceptions raised by gentag.

exception gentag.exceptions.GenTagError

Base class for custom exceptions.

exception gentag.exceptions.EmptyTagError

Raised by __getitem__() when an empty tag is encountered during evaluation.

exception gentag.exceptions.TagExpressionError

Raised by evaluate_raw() when a string expression contains syntax errors.

gentag.tests

The test suite for gentag.

The online documentation includes the full source code of the test suite because it provides examples of how to use the gentag module.

class gentag.tests.GenTagTestCase(*args, **kw)

A unittest compatible container for the gentag test suite.

setUp()

Define a new Scope for every test method.

test_add_object()

Test the add_object() method.

test_default_tag()

Test that the default tag matches all tagged objects.

test_define_expression()

Test the definition of a composite tag.

test_define_unsupported()

Test the definition of a tag with an unsupported value type.

test_difference_expression()

Get the difference of two tags (using a Python expression).

test_difference_string()

Get the difference of two tags (using a string expression).

test_empty_tag()

Test which exception is raised when a tag has no associated objects.

test_generate_id()

Test the generate_id() function.

test_intersection_expression()

Get the intersection of two tags (using a Python expression).

test_intersection_string()

Get the intersection of two tags (using a string expression).

test_iterable()

Test that tags can be iterated to get th matching objects.

test_not_implemented()

Check that composition with other types is forbidden.

test_parentheses()

Check that parentheses are used appropriately.

test_parse_expression()

Test the parsing of expressions.

test_parse_invalid()

Test which exception is raised on invalid types by parse().

test_sorting()

Test natural order sorting of string objects.

test_symmetric_difference_expression()

Get the symmetric difference of two tags (using a Python expression).

test_symmetric_difference_string()

Get the symmetric difference of two tags (using a string expression).

test_syntax_error()

Test which exception is raised on syntax errors.

test_union_expression()

Get the union of two tags (using a Python expression).

test_union_string()

Get the union of two tags (using a string expression).