Introduction to boolean_parser

boolean_parser is a Python package for parsing strings that contain conditional expressions combined with boolean logic operators. It uses the pyparsing Python package for all syntax parsing grammar definitions.

Parsing a String Condition

The boolean_parser.parse() function is a convenience function available to start parsing strings right away using the built-in Parser classes.

# import the parse function
>>> from boolean_parser import parse

# parse a string condition
>>> res = parse('x > 3')
>>> res
x>3

The parsed result is a boolean_parser.actions.clause.Condition object containing extracted information about the condition. The data attribute contains a dictionary of all relevant information, while the parameter name, operand, and value are available as attributes on the Condition class.

>>> # show the extracted data
>>> res.data
{'parameter': 'x', 'operator': '>', 'value': '3'}

>>> print(res.name, res.operator, res.value)
x > 3

boolean_parser knows that the string “x > 3” is a condition and parses it correctly due to how the conditional “clause” is defined using pyparsing grammer constructors. See Clause Elements for more information about the available clauses in boolean_parser.

Boolean Joins

To combine conditional expressions you can join them using and, or or not boolean operands within the string. When using boolean operands, the parsed result is a nested set of Boolean Condition classes, i.e. BoolAnd, BoolOr, BoolNot which contain the Conditions and preserve the order of the condition hiearchy.

>>> example = 'x > 3 and y <= 2 or not z != 5'
>>> res = parse(example)
>>> res
or_(and_(x>3, y<=2), not_(z!=5))

The order of operations here is OR which joins and AND and a NOT condition. You can use parantheses to control condition hierarchy. Without parantheses, precedence reads left to right.

>>> # without parantheses
>>> parse('x > 3 and y <= 2 or not z != 5')
or_(and_(x>3, y<=2), not_(z!=5))

>>> # with parantheses
>>> parse('x > 3 and (y <= 2 or not z != 5)')
and_(x>3, or_(y<=2, not_(z!=5)))

Boolean Condition objects contain only three parameters: params, conditions, and logicop. params contains a list of all parameters within itself. conditions contains a list of all conditions within itself. logicop indicates the current boolean operator.

>>> # parse a boolean expression
>>> example = 'x > 3 and y <= 2 or not z != 5'
>>> res = parse(example)
>>> res
or_(and_(x>3, y<=2), not_(z!=5))

>>> # show the logic operator
>>> res.logicop
'or'

>>> # list the parameters within boolean OR
>>> res.params
['y', 'x', 'z']

>>> # list the conditions within the boolean OR
>>> res.conditions
[and_(x>3, y<=2), not_(z!=5)]

You can drill down through the conditions until you get to the underling conditions.

>>> # access the first "boolean and" condition
>>> booland = res.conditions[0]
>>> booland
and_(x>3, y<=2)

>>> # show the conditions inside the boolean and
>>> booland.conditions
[x>3, y<=2]

>>> # access the first "x > 3" condition and print data
>>> xcond = booland.conditions[0]
>>> xcond.data
{'parameter': 'x', 'operator': '>', 'value': '3'}

Using a Parser

The boolean_parser.parse() convenience function is essentially a wrapper around the Parser classes in boolean_parser.parsers. The base parser is boolean_parser.parsers.base.Parser. The default parser used by the parse convenience function is the boolean_parser.parsers.sqla.SQLAParser. To change the parser used by the function, set the base keyword argument.

>>> # use the default SQLAlchemy Parser
>>> res = parse('x > 1')

>>> # use the core base Parser
>>> res = parse('x > 1', base='base')

You can also interact with the Parser class directly.

>>> from boolean_parser.parsers import Parser
>>> pp = Parser('x > 1')

To parse the input string expression, use the parse method, which performs exactly the same as the parse convenience function.

>>> # parse the expression
>>> res = pp.parse()
>>> res
x>1

The original input string expression, as well as the extracted parameters and conditions are accessible via the original_input, params, and conditions attributes, respectively.

>>> pp.original_input
'x > 1'

>>> pp.params
x

>>> pp.conditions
x>1

Parsing SQLAlchemy Filters

The SQLAParser class for SQLAlchemy provides an additional filter function that converts a parsed boolean string into a SQLAlchemy filter condition useable in SQLAlchemy queries. Otherwise it behaves exactly the same as the core Parser.

Note

The following is a toy example. All references to module “database.models”, “datamodel.session”, and class “TableModel”, etc are to be replaced by your own code containing your database code and SQLAlchemy models and connections. These are not importable in the boolean_parser package.

Suppose we have a database with a table “table” and columns “x”, and “y”. The SQLAlchemy database session is defined in a database module, along with our SQLAlchemy ORM models, including a “TableModel”, defined in a database.models module. We want to parse the string expression “table.x > 5 and table.y < 2” and use it as a filter in a SLQLAlchemy query. First we import our database session, ORM TableModel and the parse function, and parse the string expression using the boolean_parser.

>>> # import our database session and Model Class
>>> from database.models import TableModel
>>> from database import session
>>>
>>> # import the parser
>>> from boolean_parser import parse

>>> # create the parser and parse a sql condition
>>> res = parse('table.x > 5 and table.y < 2')
>>> res
>>> and_(x>5, y<2)

Attached to our parsed results is a SQLAMixin.filter method which accepts a list of SQLAlchemy ORM Models as input. It then traverses the parsed result, converting boolean operations, parameters names, and conditional expressions into the appropriate relevant SQLAlchemy syntax. The returned object is now a SQLAlchemy object, usually a Boolean ClauseList, or a BinaryExpression, objects that represent SQLalchemy filter clause elements.

>>> # generate the sqlalchemy filter
>>> ff = res.filter(TableModel)
>>> type(ff)
>>> sqlalchemy.sql.elements.BooleanClauseList

>>> # display the SQLAlchemy filter
>>> print(ff.compile(compile_kwargs={'literal_binds': True}))
>>> table.x > 5 AND table.y < 2

You can pass the filter expression directly into the SQLAlchemy filter method during a query.

>>> # perform the sqlalchemy query
>>> session.query(TableModel).filter(ff).all()

SQLAParser supports aliased SQLAlchemy models as well.

>>> from sqlalchemy.orm import aliased
>>> from database.models import TableModel

>>> # create a new model aliased from TableModel
>>> new_table = aliased(TableModel, name='newtable')

>>> res = parse('table.x > 5 and newtable.y < 2')
>>> ff = res.filter([TableModel, new_table])
>>> print(ff.compile(compile_kwargs={'literal_binds': True}))
>>> table.x > 5 AND newtable.y < 2

Supported Operand Syntax

The following table shows the currently supported string operand syntax and its SQLAlchemy operand expresssion.

Operand Syntax

String Syntax

Descriptive

Example

SQLA Equivalent

<

less than

table.x < 5

table.x < 5

<=

less than or equal to

table.x <= 5

table.x <= 5

>

greater than

table.x > 5

table.x > 5

>=

greater than or equal to

table.x >= 5

table.x >= 5

=

equal

table.x = 5

table.x = 5

==

strict equal

table.x == 5

table.x = 5

!=

not equal to

table.x != 5

table.x != 5

==

strict equal

table.x == Bear

table.x = Bear

=

ilike

table.x = Bear

table.x ilike ‘%Bear%’

=

ilike

table.x = ‘*Bear’

table.x ilike ‘%Bear’

=

ilike

table.x = ‘Bear*’

table.x ilike ‘Bear%’

=

is null

table.x = Null

table.x.is_(None)

!=

is not null

table.x != Null

table.x.is_not(None)

between

between A and B

table.x between 1 and 10

table.x between 1 and 10

&

bitwise & (and)

table.x & 5

table.x.op(&)(5) > 0

|

bitwise | (or)

table.x | 5

table.x.op(|)(5) > 0

=

is a boolean

table.x = True

table.x = true

=

is a date object

table.x = 2020-01-01

table.x = “2020-01-01”

==

is a datetime object

table.x == 2020-01-01T00:00:00

table.x == “2020-01-01 00:00:00”

Building a Custom Parser

A custom parser can be built by passing in a list of pyparsing clause elements, and optional clause actions, into the build_parser class method of the base Parser class. Let’s look at an example of how to build a parser to parse a simple street name. We’ll break this example down in the following subsections.

>>> # import the base Parser class
>>> from boolean_parser.parsers import Parser

>>> # define the address clause element with pyparsing grammar constructors
>>> import pyparsing as pp
>>> snumber = pp.Word(pp.nums).setResultsName('street_number')
>>> sname = pp.Word(pp.alphas).setResultsName('street_name')
>>> stype = pp.oneOf(['street', 'avenue', 'circle']).setResultsName('street_type')
>>> street = pp.Group(snumber + sname + stype).setResultsName('street')

>>> # rebuild the Parser with the new street clause
>>> Parser.build_parser(clauses=[street])

>>> parser = Parser()
>>> res = parser.parse('2525 redwood street')
>>> res.asDict()
{'street_number': '2525', 'street_name': 'redwood', 'street_type': 'street'}

Clause Elements

Clause elements are defined using pyparsing grammar constructors. Clause elements are best defined from the bottom up, starting with the most simple structures and grouping them together. Street names can be broken down into the syntax “street_number street_name street_type”. Let’s define each component and build our “street” clause element.

>>> import pyparsing as pp

>>> # define a street number as a "word" of any combination of digits 0-9
>>> snumber = pp.Word(pp.nums).setResultsName('street_number')

>>> # define the street name as a "word" of any combination of alphabet characters
>>> sname = pp.Word(pp.alphas).setResultsName('street_name')

>>> # define the type of street as one option in a set of choices
>>> stype = pp.oneOf(['street', 'avenue', 'circle']).setResultsName('street_type')

>>> # group the components together into a final street clause element
>>> street = pp.Group(snumber + sname + stype).setResultsName('street')

We use the pyparsing.ParserElement.setResultsName() to assign a label to each component. This helps break up complex clauses into easily identifable components, and allows us to use the pyparsing.ParseResults.asDict() method to create a dictionary of named parameters. The pyparsing.ParserElement.parseString() method is the recommended way of parsing a string.

>>> street.parseString('2 blue avenue').asDict()
{'street': {'street_number': '2',
 'street_name': 'blue',
 'street_type': 'avenue'}}

Build the new Parser

Now that we have a clause defined, we can use the build_parser class method on Parser to contruct a new parser class capable of parsing street names. We pass in the street clause as a list input to the clauses keyword argument.

>>> # rebuild the Parser with the new street clause
>>> Parser.build_parser(clauses=[street])

Now we instantiate our new parser and call parse on any input “street” string.

>>> parser = Parser()
>>> res = parser.parse('2525 redwood street')
>>> res.asDict()
{'street_number': '2525', 'street_name': 'redwood', 'street_type': 'street'}

>>> res = parser.parse('2 blue avenue')
>>> res.asDict()
{'street_number': '2', 'street_name': 'blue', 'street_type': 'avenue'}

Clause Precendence

The clauses input to build_parser can be a list of any number of constructed clause elements. These clauses are combined into a single clause using pyparsing.MatchFirst() which combines clauses with “ORS”, i.e. “clause1 | clause2 | clause3”. MatchFirst will parse and return the first string match it finds that matches one of the input clauses.

>>> from boolean_parser.parsers import Parser
>>> from boolean_parser.clauses import words

>>> # build a parser with the street and built-in words clauses
>>> Parser.build_parser(clauses=[street, words])
>>> parser = Parser()

>>> # the first parser match is a street
>>> parser.parse('2 blue avenue hammer')
(['2', 'blue', 'avenue'], {'street_number': ['2'], 'street_name': ['blue'], 'street_type': ['avenue']})

>>> # the first parser match is a word
>>> parser.parse('hammer 2 blue avenue')
(['hammer'], {'parameter': ['hammer']})

Parse Actions

“Parsing Actions” are actions to be performed on a clause element after a successful parsed match. One or more actions can be assigned to each clause element. When you build a parser, by default there are no special actions applied and the parser returns a standard pyparsing.ParseResults object. This can be overridden by providing the actions keyword argument with a list of actions to assign to each clause element. The list of actions must be of equal length to the input list of clauses. Actions can be any callable. Let’s define a function action that prints the street name during parsing.

>>> # define the action function
>>> def print_name(data):
>>>    print('The street_name is:', data[0].asDict()['street_name'])

>>> Parser.build_parser(clauses=[e], actions=[print_name])
>>> parser = Parser()
The street_name is: blue
(['2', 'blue', 'avenue'], {'street_number': ['2'], 'street_name': ['blue'], 'street_type': ['avenue']})

When passing in actions to build_parser, it calls pyparsing.ParserElement.setParseAction() to assign that action to the relevant clause element. We can also define more complex action classes and combine multiple actions together. Let’s define a Street class that handles the parsed result. When we pass this class in as an action, the parser will return a new instance of Street as the parsed result. Actions for a single clause element can be chained together by passing in a list or tuple of actions for each clause element. Let’s add the Street action to the print_name action.

>>> # define the action class
>>> class Street(object):
>>>     def __init__(self, data):
>>>         dd = data[0].asDict()
>>>         self.name = dd['street_name']
>>>         self.number = dd['street_number']
>>>         self.type = dd['street_type']
>>>     def __repr__(self):
>>>         return f'<Street ({self.number} {self.name} {self.type})>'

>>> Parser.build_parser(clauses=[e], actions=[[print_name, Street]])
>>> parser = Parser()
>>> parser.parse('2 blue avenue')
The street_name is: blue
<Street (2 blue avenue)>

Boolean Joins

The base Parser class by default contains boolean operator precedence in the order of NOTS-> ANDS-> ORS. The default boolean classes uses can be overridden by providing a list of boolean class objects to the bools keyword argument. These classes will be used to provide the boolean logic and must be in the same order of [nots, ands, ors].

>>> res = parser.parse('2525 redwood street and 34 blue avenue')
The street_name is: redwood
The street_name is: blue
>>> res
and_(<Street (2525 redwood street)>, <Street (34 blue avenue)>)

>>> type(res)
boolean_parser.actions.boolean.BoolAnd

>>> res.conditions[0]
<Street (2525 redwood street)>

>>> res.conditions[1]
<Street (34 blue avenue)>