Simplifying AND OR Boolean Expression
Solution 1:
Imo, you will need to use a parser here, e.g. PLY
. You need to define all of your bricks and can then build a syntax tree with which you can do whatever you want.
An example could be:
import ply.lex as lex
# List of token names. This is always required
tokens = (
'VARIABLE',
'WHITESPACE',
'OR',
'AND',
'NOT',
'PAR_OPEN',
'PAR_CLOSE',
)
# Regular expression rules for simple tokens
t_VARIABLE = r'\b[a-z]+\b'
t_WHITESPACE = r'\s+'
t_OR = r'\bOR\b'
t_AND = r'\bAND\b'
t_NOT = r'\bNOT\b'
t_PAR_OPEN = r'\('
t_PAR_CLOSE = r'\)'
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
lexer = lex.lex()
lexer.input("a OR (b AND c)")
while True:
token = lexer.token()
if not token:
break
else:
print(token)
This would yield
LexToken(VARIABLE,'a',1,0)
LexToken(WHITESPACE,' ',1,1)
LexToken(OR,'OR',1,2)
LexToken(WHITESPACE,' ',1,4)
LexToken(PAR_OPEN,'(',1,5)
LexToken(VARIABLE,'b',1,6)
LexToken(WHITESPACE,' ',1,7)
LexToken(AND,'AND',1,8)
LexToken(WHITESPACE,' ',1,11)
LexToken(VARIABLE,'c',1,12)
LexToken(PAR_CLOSE,')',1,13)
It will even work with nested parentheses and you can then analyze smaller parts (e.g. from PAR_OPEN
to PAR_CLOSE
, etc.).
Solution 2:
Pyparsing makes it easy to define an expression parser for this kind of notation. Think of your AND, OR, etc. keywords like they were operators and your terms like "security", etc. as operands in an infix notation grammar, and you can use pyparsing's infixNotation
grammar generator to define your parser:
sample = "security OR ((internet OR online OR paperless) AND (bank*)) AND (mobile OR cell OR phone OR access) OR easy OR online WITHIN bank OR transaction OR mumbai OR delhi NEAR/10 agar OR (online OR internet) AND (bank) OR not OR (apple) EXCLUDE (mongo)"
import pyparsing as pp
# enable packrat parsing, since this infix notation gets more complex than usual
pp.ParserElement.enablePackrat()
term = pp.Word(pp.alphas + '*')
SLASH = pp.Suppress('/')
AND = pp.Keyword("AND")
OR = pp.Keyword("OR")
WITHIN = pp.Keyword("WITHIN")
EXCLUDE = pp.Keyword("EXCLUDE")
NEAR_op = pp.Group(pp.Keyword("NEAR") + SLASH + pp.pyparsing_common.integer)
expr = pp.infixNotation(term,
[
(NEAR_op, 2, pp.opAssoc.LEFT),
(WITHIN, 2, pp.opAssoc.LEFT),
(AND, 2, pp.opAssoc.LEFT),
(OR, 2, pp.opAssoc.RIGHT),
(EXCLUDE, 2, pp.opAssoc.LEFT),
])
expr.parseString(sample).pprint()
Prints:
[[['security',
'OR',
[[[['internet', 'OR', ['online', 'OR', 'paperless']], 'AND', 'bank*'],
'AND',
['mobile', 'OR', ['cell', 'OR', ['phone', 'OR', 'access']]]],
'OR',
['easy',
'OR',
[['online', 'WITHIN', 'bank'],
'OR',
['transaction',
'OR',
['mumbai',
'OR',
[['delhi', ['NEAR', 10], 'agar'],
'OR',
[[['online', 'OR', 'internet'], 'AND', 'bank'],
'OR',
['not', 'OR', 'apple']]]]]]]]],
'EXCLUDE',
'mongo']]
(Disclaimer: I am the author of pyparsing.)
GitHub Page: https://github.com/pyparsing/pyparsing
Docs: https://pyparsing-docs.readthedocs.io/en/latest/
Install: pip install pyparsing
Post a Comment for "Simplifying AND OR Boolean Expression"