gentag: Simple and powerful tagging for Python objects¶
The Python package gentag provides simple and powerful tagging for arbitrary Python objects. After defining your tags and associated objects you can query for the difference, intersection and union of tags to select specific objects. The package is currently tested on cPython 2.6, 2.7, 3.4, 3.5, 3.6 and PyPy (2.7).
Status¶
While the ideas behind gentag have been floating around in my head since 2012 I didn’t publish this as a standalone Python package until 2018 which explains why I’m publishing the initial version as a beta. Looking ahead towards the future:
- It may be that the current version serves my needs fine and at some point I decide to replace the ‘beta’ label with a ‘stable’ label without making any substantive changes.
- Releasing gentag is one step in the direction of releasing another Python package that I’ve been thinking about for a very long time now and if I turn out to have trouble integrating gentag into that package I won’t hesitate to make (potentially major) changes to gentag.
Installation¶
The gentag package is available on PyPI which means installation should be as simple as:
$ pip install gentag
There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).
Usage¶
The following sections give an overview of how to get started. For more details about the Python API please refer to the API documentation available on Read the Docs.
Creating a scope¶
To get started you have to create a Scope object:
>>> from gentag import Scope
>>> tags = Scope()
The purpose of Scope objects is to group together related tags into an evaluation context for tag expressions.
Defining tags¶
Scope instances allow you to define tags and associated objects:
>>> tags.define('archiving', ['deb', 'tar', 'zip'])
>>> tags.define('compression', ['bzip2', 'deb', 'gzip', 'lzma', 'zip'])
>>> tags.define('encryption', ['gpg', 'luks', 'zip'])
Querying tags¶
Once you’ve defined some tags and associated objects you can query them, for example here we query for the union of two tags:
>>> tags.evaluate('archiving | encryption')
['deb', 'gpg', 'luks', 'tar', 'zip']
These tag expressions can get arbitrarily complex:
>>> tags.evaluate('(archiving | encryption) & compression')
['deb', 'zip']
Supported operators¶
The following operators can be used to compose tags:
Operator | Set operation |
---|---|
& |
intersection |
| |
union |
- |
difference |
^ |
symmetric difference |
These operators create new Tag objects that can be composed further. Although tags composed at runtime in Python syntax don’t have a name, it is possible define named composite tags using the Scope.define() method (see below).
The default tag¶
There’s one special tag that is always available under the name ‘all’. As you might have guessed it provides access to a set with all tagged objects:
>>> tags.evaluate('all')
['bzip2', 'deb', 'gpg', 'gzip', 'luks', 'lzma', 'tar', 'zip']
This can be useful to select all but a specific tag of objects:
>>> tags.evaluate('all - encryption')
['bzip2', 'deb', 'gzip', 'lzma', 'tar']
Named composite tags¶
The expressions shown in the querying tags section above demonstrate that tags can be composed using set operators. You can also define a named tag based on an expression:
>>> tags.define('flexible', 'archiving & compression & encryption')
Such named composite tags can be evaluated like regular tags:
>>> tags.evaluate('flexible')
['zip']
You can also nest composite tags inside other composite tags.
History¶
The example in the usage section isn’t actually very useful, this is partly because I didn’t want a complicated subject matter to distract readers from usage instructions :-).
The actual use case that triggered the ideas behind gentag presented itself to me in 2012 when I wanted to query a database of more than 200 Linux server names categorized by aspects such as:
- The distributor id (a string like ‘debian’ or ‘ubuntu’).
- The distribution codename (a string like ‘trusty’ or ‘xenial’).
- The server’s role (database, mailserver, webserver, etc).
- The server’s environment (production, development).
The easy selection of subsets of servers for my Python programs to operate on quickly evolved into my main interface for selecting groups of servers. Since then I’ve wanted to use similar functionality in other places, but found it too much work to develop one-off solutions. This is how gentag was born.
About the name¶
The name gentag stands for “generative tags”, because the package allows new tags to be composed (generated) from existing tags. I’d like to thank my colleague Seán Murphy for coming up with this name :-).
Contact¶
The latest version of gentag is available on PyPI and GitHub. The documentation is hosted on Read the Docs. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.
API documentation¶
The following documentation is based on the source code of version 2.0 of the gentag package:
gentag
¶
Simple and powerful tagging for Python objects.
-
class
gentag.
ObjectFactory
(**kw)¶ A mapping of tag names to
set
objects.This class is used by
evaluate()
during expression parsing to resolve tag names to the associatedobjects
.-
__getitem__
(name)¶ Get the objects associated to the given tag.
Parameters: name – The name of the tag (a string). Returns: A set
of objects associated to the tag.Raises: EmptyTagError
when no associated objects are available.
The
TagFactory
from which objects are retrieved.Note
The
tags
property is arequired_property
. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named tags (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.
-
-
class
gentag.
Scope
(**kw)¶ To use
gentag
everything starts with aScope
object.A
Scope
object groups together relatedTag
objects and provides methods to define new tags and evaluate tag expressions.-
add_object
(value, *tags)¶ Add an object to the scope.
Parameters: - value – The object to add (any hashable value).
- tags – The names of tags to associate the object with.
-
define
(name, value)¶ Define the value of a tag.
Parameters: - name – The name of the tag (a string).
- value – A string containing an expression or an iterable of values associated to the given tag.
Returns: The
Tag
object.Raises: ValueError
for unsupported value types.
-
evaluate
(expression)¶ Get the objects matching the given expression.
Parameters: expression – The tag expression to evaluate (a string). Returns: A sorted list
with matching objects.Raises: TagExpressionError
when the given expression cannot be evaluated due to a syntax error.This method is a wrapper for
evaluate_raw()
that callssorted()
on the matching objects before returning them.
-
evaluate_raw
(expression)¶ Get the objects matching the given expression.
Parameters: expression – The tag expression to evaluate (a string). Returns: A set
with matching objects.Raises: TagExpressionError
when the given expression cannot be evaluated due to a syntax error.This method uses
eval()
to evaluate the expression given by the caller, however it overrides__builtins__
to avoid leaking any built-ins into theeval()
call.
-
get_all_objects
()¶ Get all objects defined in the scope.
Returns: A set
of user defined objects.This method iterates over the defined tags and collects all tagged objects. Because the evaluation of tags with an
expression
won’t change the result ofget_all_objects()
such tags are skipped for performance reasons.
-
objects
¶ A mapping of tag names to
set
objects (anObjectFactory
instance).
-
parse
(value)¶ Parse a string expression into a
Tag
object.Parameters: value – The tag expression to parse (a string). Returns: A Tag
object.Raises: ValueError
for unsupported value types.During normal use you won’t need the
parse()
method, in fact it’s not currently being used anywhere ingentag
. This method was originally created with the idea of havingdefine()
parse string expressions up front to validate their syntax, however this approach has since been abandoned. Theparse()
method now remains because it may be useful to callers for unforeseen use cases.
-
sorted
(objects)¶ Sort the given objects in a human friendly way.
Parameters: objects – The objects to sort (an iterable). Returns: The sorted objects (a list). If all of the objects are strings they are sorted using natural order sorting, otherwise the
sorted()
function is used.
A mapping of tag names to
Tag
objects (anTagFactory
instance).Note
The
tags
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
-
-
class
gentag.
Tag
(**kw)¶ A
Tag
represents a set ofobjects
with a commonname
.There are three kinds of tags:
- Simple tags:
- When you set
objects
the tag becomes a ‘simple tag’ that associates the name of the tag to the given objects. - Composite tags:
- When you set
expression
the tag becomes a ‘composite tag’ that associates the name of the tag to an expression that selects a subset of tagged objects. - The special default tag:
- When
identifier
is set toDEFAULT_TAG_NAME
the value ofobjects
is aset
that contains all tagged objects.
-
__iter__
()¶ Iterate over the matching objects.
-
compose
(operator, other)¶ Create a composite tag.
Parameters: - operator – The operator used to compose the tags (a string).
- other – The other
Tag
object.
Returns: A new
Tag
object orNotImplemented
(if other isn’t aTag
object).The
compose()
method is a helper for__and__()
,__or__()
,__sub__()
and__xor__()
that generates anexpression
based on theid_or_expr
values of the twoTag
objects.
-
expression
¶ A Python expression to select matching objects (a string or
None
).Note
The
expression
property is amutable_property
. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can usedel
ordelattr()
.
-
id_or_expr
¶ The
identifier
(if set) orexpression
(a string).The value of
id_or_expr
is used bycompose()
to generateexpression
values for compositeTag
objects.Note
The
id_or_expr
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
-
identifier
¶ An identifier based on
name
(a string orNone
).Note
The
identifier
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
-
name
¶ A user defined name for the tag (a string or
None
).Tags created using
define()
always havename
set but tags composed using Python expression syntax are created without aname
.Note
The
name
property is amutable_property
. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can usedel
ordelattr()
.
-
objects
¶ The values associated to the tag (a
set
).If
objects
isn’t set it defaults to a computed value:- If
identifier
isDEFAULT_TAG_NAME
thenget_all_objects()
is used to get the associated values. - If
expression
is set it will be evaluated and the matching objects will be returned. - Otherwise a new, empty
set
is created, bound to theTag
and returned.
Note
The
objects
property is amutable_property
. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can usedel
ordelattr()
.- If
-
scope
¶ The
Scope
in which the tag has been defined.Note
The
scope
property is acustom_property
. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named scope (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.
-
class
gentag.
TagFactory
(**kw)¶ A mapping of tag names to
Tag
objects.The names of tags are normalized using
generate_id()
.-
__getitem__
(name)¶ Get or create a tag.
Parameters: name – The name of the tag (a string). Returns: A Tag
object.
-
map
¶ A dictionary with tags created by this
TagFactory
.Note
The
map
property is alazy_property
. This property’s value is computed once (the first time it is accessed) and the result is cached.
-
scope
¶ The
Scope
that’s using thisTagFactory
.Note
The
scope
property is acustom_property
. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named scope (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.
-
-
gentag.
generate_id
(value, normalized)¶ Generate a Python identifier from a user provided string.
Parameters: Returns: The generated identifier (a string).
Raises: ValueError
when nothing remains of value after normalization.If you just want a Python identifier from a user defined string you can use normalized=False:
>>> generate_id('Any user-defined string', normalized=False) 'any_user_defined_string'
However if you want to use the identifier for comparison or as a key in a dictionary then its better to use normalized=True:
>>> generate_id('Any user-defined string', normalized=True) 'anyuserdefinedstring'
The following example shows that values that would otherwise start with a digit are prefixed with an underscore, because Python identifiers cannot start with a digit:
>>> generate_id('42', normalized=True) '_42'
gentag.exceptions
¶
Custom exceptions raised by gentag
.
-
exception
gentag.exceptions.
GenTagError
¶ Base class for custom exceptions.
-
exception
gentag.exceptions.
EmptyTagError
¶ Raised by
__getitem__()
when an empty tag is encountered during evaluation.
-
exception
gentag.exceptions.
TagExpressionError
¶ Raised by
evaluate_raw()
when a string expression contains syntax errors.
gentag.tests
¶
The test suite for gentag
.
The online documentation includes the full source code of the test suite
because it provides examples of how to use the gentag
module.
-
class
gentag.tests.
GenTagTestCase
(*args, **kw)¶ A
unittest
compatible container for thegentag
test suite.-
test_add_object
()¶ Test the
add_object()
method.
-
test_default_tag
()¶ Test that the default tag matches all tagged objects.
-
test_define_expression
()¶ Test the definition of a composite tag.
-
test_define_unsupported
()¶ Test the definition of a tag with an unsupported value type.
-
test_difference_expression
()¶ Get the difference of two tags (using a Python expression).
-
test_difference_string
()¶ Get the difference of two tags (using a string expression).
-
test_empty_tag
()¶ Test which exception is raised when a tag has no associated objects.
-
test_generate_id
()¶ Test the
generate_id()
function.
-
test_intersection_expression
()¶ Get the intersection of two tags (using a Python expression).
-
test_intersection_string
()¶ Get the intersection of two tags (using a string expression).
-
test_iterable
()¶ Test that tags can be iterated to get th matching objects.
-
test_not_implemented
()¶ Check that composition with other types is forbidden.
-
test_parentheses
()¶ Check that parentheses are used appropriately.
-
test_parse_expression
()¶ Test the parsing of expressions.
-
test_sorting
()¶ Test natural order sorting of string objects.
-
test_symmetric_difference_expression
()¶ Get the symmetric difference of two tags (using a Python expression).
-
test_symmetric_difference_string
()¶ Get the symmetric difference of two tags (using a string expression).
-
test_syntax_error
()¶ Test which exception is raised on syntax errors.
-
test_union_expression
()¶ Get the union of two tags (using a Python expression).
-
test_union_string
()¶ Get the union of two tags (using a string expression).
-