regex: bug fixes, docs
							parent
							
								
									ad7bc37672
								
							
						
					
					
						commit
						36660ce749
					
				|  | @ -4,14 +4,14 @@ | |||
| 
 | ||||
| ## introduction | ||||
| 
 | ||||
| Write here the introduction | ||||
| Write here the introduction... not today!! -_- | ||||
| 
 | ||||
| ## Basic assumption | ||||
| 
 | ||||
| In this release, during the writing of the code some assumption are made and are valid for all the features. | ||||
| In this release, during the writing of the code some assumptions are made and are valid for all the features. | ||||
| 
 | ||||
| 1. The matching stops at the end of the string not at the newline chars. | ||||
| 2. The basic element of this regex engine are the tokens, in query string a simple char is a token. The token is the atomic unit of this regex engine. | ||||
| 2. The basic elements of this regex engine are the tokens, in a query string a simple char is a token. The token is the atomic unit of this regex engine. | ||||
| 
 | ||||
| ## Match positional limiter | ||||
| 
 | ||||
|  | @ -21,11 +21,11 @@ The module supports the following features: | |||
| 
 | ||||
| `^` (Caret.) Matches at the start of the string | ||||
| 
 | ||||
| `?` Matches at the end of the string | ||||
| `$` Matches at the end of the string | ||||
| 
 | ||||
| ## Tokens | ||||
| 
 | ||||
| The tokens are the atomic unit used by this regex engine and can be ones of the following: | ||||
| The tokens are the atomic units used by this regex engine and can be ones of the following: | ||||
| 
 | ||||
| ### Simple char | ||||
| 
 | ||||
|  | @ -33,11 +33,11 @@ this token is a simple single character like `a`. | |||
| 
 | ||||
| ### Char class (cc) | ||||
| 
 | ||||
| The cc match all the chars specified in its inside, it is delimited by square brackets `[ ]` | ||||
| The cc match all the chars specified inside, it is delimited by square brackets `[ ]` | ||||
| 
 | ||||
| the sequence of chars in the class is evaluated with an OR operation. | ||||
| 
 | ||||
| For example the following cc `[abc]` match any char that is `a` or `b` or `c` but doesn't match `C` or `z`. | ||||
| For example, the following cc `[abc]` match any char that is `a` or `b` or `c` but doesn't match `C` or `z`. | ||||
| 
 | ||||
| Inside a cc is possible to specify a "range" of chars, for example `[ad-f]` is equivalent to write `[adef]`.  | ||||
| 
 | ||||
|  | @ -68,17 +68,17 @@ A meta-char can match different type of chars. | |||
| 
 | ||||
| Each token can have a quantifier that specify how many times the char can or must be matched. | ||||
| 
 | ||||
| **Short quantifier** | ||||
| #### **Short quantifier** | ||||
| 
 | ||||
| - `?` match 0 or 1 time, `a?b` match both `ab` or `b` | ||||
| - `+` match at minimum 1 time, `a+` match both `aaa` or `a` | ||||
| - `*` match 0 or more time, `a*b` match both `aaab` or `ab` or `b` | ||||
| 
 | ||||
| **Long quantifier** | ||||
| #### **Long quantifier** | ||||
| 
 | ||||
| - `{x}` match exactly x time, `a{2}` match `aa` but doesn't match `aaa` or `a` | ||||
| - `{min,}` match at minimum min time, `a{2,}` match `aaa` or `aa` but doesn't match `a` | ||||
| - `{,max}` match at least 1 time and maximum max time, `a{,2}` match `a` and `aa` but doesn't match `aaa` | ||||
| - `{,max}` match at least 0 time and maximum max time, `a{,2}` match `a` and `aa` but doesn't match `aaa` | ||||
| - `{min,max}` match from min times to max times, `a{2,3}` match `aa` and `aaa` but doesn't match `a` or `aaaa` | ||||
| 
 | ||||
| a long quantifier may have a `greedy off` flag that is the `?` char after the brackets, `{2,4}?` means to match the minimum number possible tokens in this case 2. | ||||
|  | @ -102,7 +102,7 @@ the dot char match any char until the next token match is satisfied. | |||
| 
 | ||||
| the token `|` is a logic OR operation between two consecutive tokens, `a|b` match a char that is `a` or `b`. | ||||
| 
 | ||||
| The or token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a` the test the group `(b)` and if the group doesn't match test the token `c`. | ||||
| The OR token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a` then test the group `(b)` and if the group doesn't match test the token `c`. | ||||
| 
 | ||||
| **note: The OR work at token level! It doesn't work at concatenation level!** | ||||
| 
 | ||||
|  | @ -181,16 +181,16 @@ re.flag = regex.F_BIN | |||
| 
 | ||||
| ### Initializer | ||||
| 
 | ||||
| These function are helper that create the `RE` struct, a `RE` struct can be created manually if you needed. | ||||
| These functions are helper that create the `RE` struct, a `RE` struct can be created manually if you needed. | ||||
| 
 | ||||
| **Simplified initializer** | ||||
| #### **Simplified initializer** | ||||
| 
 | ||||
| ```v | ||||
| // regex create a regex object from the query string and compile it | ||||
| pub fn regex(in_query string) (RE,int,int) | ||||
| ``` | ||||
| 
 | ||||
| **Base initializer** | ||||
| #### **Base initializer** | ||||
| 
 | ||||
| ```v | ||||
| // new_regex create a REgex of small size, usually sufficient for ordinary use | ||||
|  | @ -199,13 +199,13 @@ pub fn new_regex() RE | |||
| // new_regex_by_size create a REgex of large size, mult specify the scale factor of the memory that will be allocated | ||||
| pub fn new_regex_by_size(mult int) RE | ||||
| ``` | ||||
| After the base initializer use, the regex expression must be compiled with: | ||||
| After a base initializer is used, the regex expression must be compiled with: | ||||
| ```v | ||||
| // compile return (return code, index) where index is the index of the error in the query string if return code is an error code | ||||
| pub fn (re mut RE) compile(in_txt string) (int,int) | ||||
| ``` | ||||
| 
 | ||||
| ### Functions | ||||
| ### Operative Functions | ||||
| 
 | ||||
| These are the operative functions | ||||
| 
 | ||||
|  | @ -227,7 +227,7 @@ pub fn (re mut RE) replace(in_txt string, repl string) string | |||
| 
 | ||||
| This module has few small utilities to help the writing of regex expressions. | ||||
| 
 | ||||
| **Syntax errors highlight** | ||||
| ### **Syntax errors highlight** | ||||
| 
 | ||||
| the following example code show how to visualize the syntax errors in the compilation phase: | ||||
| 
 | ||||
|  | @ -256,7 +256,7 @@ if re_err != COMPILE_OK { | |||
| 
 | ||||
| ``` | ||||
| 
 | ||||
| **Compiled code** | ||||
| ### **Compiled code** | ||||
| 
 | ||||
| It is possible view the compiled code calling the function `get_query()` the result will be something like this: | ||||
| 
 | ||||
|  | @ -279,7 +279,7 @@ PC:  2 ist: 88000000 PROG_END {  0,  0} | |||
| 
 | ||||
| `{m,n}` is the quantifier, the greedy off flag  `?`  will be showed if present in the token | ||||
| 
 | ||||
| **Log debug** | ||||
| ### **Log debug** | ||||
| 
 | ||||
| The log debugger allow to print the status of the regex parser when the parser is running. | ||||
| 
 | ||||
|  | @ -338,6 +338,21 @@ the columns have the following meaning: | |||
| 
 | ||||
| `{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition, `?` is the greedy off flag if present | ||||
| 
 | ||||
| ### **Custom Logger output** | ||||
| 
 | ||||
| The debug functions output uses the `stdout` as default, it is possible to  provide an alternative output setting a custom output function: | ||||
| 
 | ||||
| ```v | ||||
| // custom print function, the input will be the regex debug string | ||||
| fn custom_print(txt string) { | ||||
| 	println("my log: $txt") | ||||
| } | ||||
| 
 | ||||
| mut re := new_regex() | ||||
| re.log_func = custom_print  // every debug output from now will call this function | ||||
| 
 | ||||
| ``` | ||||
| 
 | ||||
| ## Example code | ||||
| 
 | ||||
| Here there is a simple code to perform some basically match of strings | ||||
|  |  | |||
|  | @ -200,7 +200,6 @@ pub fn (re RE) get_parse_error_string(err int) string { | |||
| 	} | ||||
| } | ||||
| 
 | ||||
| 
 | ||||
| // utf8_str convert and utf8 sequence to a printable string
 | ||||
| [inline] | ||||
| fn utf8_str(ch u32) string { | ||||
|  | @ -231,7 +230,7 @@ mut: | |||
| 	ist u32 = u32(0) | ||||
| 
 | ||||
| 	// char
 | ||||
| 	ch u32                 = u32(0)// char of the token if any
 | ||||
| 	ch u32                 = u32(0)  // char of the token if any
 | ||||
| 	ch_len byte            = byte(0) // char len
 | ||||
| 
 | ||||
| 	// Quantifiers / branch
 | ||||
|  | @ -245,7 +244,7 @@ mut: | |||
| 	// counters for quantifier check (repetitions)
 | ||||
| 	rep int = 0 | ||||
| 
 | ||||
| 	// validator function pointer and control char
 | ||||
| 	// validator function pointer
 | ||||
| 	validator fn (byte) bool | ||||
| 
 | ||||
| 	// groups variables
 | ||||
|  | @ -280,9 +279,9 @@ pub const ( | |||
| 
 | ||||
| struct StateDotObj{ | ||||
| mut: | ||||
| 	i  int                = 0   // char index in the input buffer
 | ||||
| 	pc int                = 0   // program counter saved
 | ||||
| 	mi int                = 0   // match_index saved
 | ||||
| 	i  int                = -1  // char index in the input buffer
 | ||||
| 	pc int                = -1   // program counter saved
 | ||||
| 	mi int                = -1   // match_index saved
 | ||||
| 	group_stack_index int = -1  // group index stack pointer saved
 | ||||
| } | ||||
| 
 | ||||
|  | @ -648,7 +647,7 @@ fn (re RE) parse_quantifier(in_txt string, in_i int) (int, int, int, bool) { | |||
| 
 | ||||
| 		// min parsing skip if comma present
 | ||||
| 		if status == .start && ch == `,` { | ||||
| 			q_min = 1 // default min in a {} quantifier is 1
 | ||||
| 			q_min = 0 // default min in a {} quantifier is 0
 | ||||
| 			status = .comma_checked | ||||
| 			i++ | ||||
| 			continue | ||||
|  | @ -998,6 +997,7 @@ pub fn (re mut RE) compile(in_txt string) (int,int) { | |||
| 	// Post processing
 | ||||
| 	//******************************************
 | ||||
| 
 | ||||
| 
 | ||||
| 	// count IST_DOT_CHAR to set the size of the state stack
 | ||||
| 	mut pc1 := 0 | ||||
| 	mut tmp_count := 0 | ||||
|  | @ -1007,9 +1007,9 @@ pub fn (re mut RE) compile(in_txt string) (int,int) { | |||
| 		} | ||||
| 		pc1++ | ||||
| 	} | ||||
| 
 | ||||
| 	// init the state stack
 | ||||
| 	re.state_stack = [StateDotObj{}].repeat(tmp_count+1) | ||||
| 	 | ||||
| 	re.state_stack = [StateDotObj{}].repeat(tmp_count+1)	 | ||||
| 	 | ||||
| 	// OR branch
 | ||||
| 	// a|b|cd
 | ||||
|  | @ -1279,7 +1279,8 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 
 | ||||
| 	mut pc := -1                     // program counter
 | ||||
| 	mut state := StateObj{}          // actual state
 | ||||
| 	mut ist := u32(0)                // Program Counter
 | ||||
| 	mut ist := u32(0)                // actual instruction
 | ||||
| 	mut l_ist := u32(0)              // last matched instruction
 | ||||
| 
 | ||||
| 	mut group_stack      := [-1].repeat(re.group_max) | ||||
| 	mut group_data       := [-1].repeat(re.group_max) | ||||
|  | @ -1359,7 +1360,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 								tmp_gr := re.prog[re.prog[pc].goto_pc].group_rep | ||||
| 								buf2.write("GROUP_START #:${tmp_gi} rep:${tmp_gr} ") | ||||
| 							} else if ist == IST_GROUP_END { | ||||
| 								buf2.write("GROUP_END   #:${re.prog[pc].group_id} deep:${group_index} ") | ||||
| 								buf2.write("GROUP_END   #:${re.prog[pc].group_id} deep:${group_index}") | ||||
| 							} | ||||
| 						} | ||||
| 						if re.prog[pc].rep_max == MAX_QUANTIFIER { | ||||
|  | @ -1417,17 +1418,10 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 			} | ||||
| 
 | ||||
| 			// manage IST_DOT_CHAR
 | ||||
| 			if re.state_stack_index >= 0 { | ||||
| 				//C.printf("DOT CHAR text end management!\n")
 | ||||
| 				// if DOT CHAR is not the last instruction and we are still going, then no match!!
 | ||||
| 				if pc < re.prog.len && re.prog[pc+1].ist != IST_PROG_END { | ||||
| 					return NO_MATCH_FOUND,0 | ||||
| 				} | ||||
| 			} | ||||
| 
 | ||||
| 			m_state == .end | ||||
| 			break | ||||
| 			return NO_MATCH_FOUND,0 | ||||
| 			//return NO_MATCH_FOUND,0
 | ||||
| 		} | ||||
| 
 | ||||
| 		// starting and init
 | ||||
|  | @ -1475,12 +1469,13 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 		// check if stop 
 | ||||
| 		if m_state == .stop { | ||||
| 			// if we are in restore state ,do it and restart
 | ||||
| 			if re.state_stack_index >= 0 {	 | ||||
| 			//C.printf("re.state_stack_index %d\n",re.state_stack_index )
 | ||||
| 			if re.state_stack_index >=0 && re.state_stack[re.state_stack_index].pc >= 0 { | ||||
| 				i = re.state_stack[re.state_stack_index].i | ||||
| 				pc = re.state_stack[re.state_stack_index].pc | ||||
| 				state.match_index =	re.state_stack[re.state_stack_index].mi | ||||
| 				group_index = re.state_stack[re.state_stack_index].group_stack_index | ||||
| 				 | ||||
| 
 | ||||
| 				m_state = .ist_load | ||||
| 				continue | ||||
| 			} | ||||
|  | @ -1499,12 +1494,22 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 			// program end
 | ||||
| 			if ist == IST_PROG_END { | ||||
| 				// if we are in match exit well
 | ||||
| 				 | ||||
| 				if group_index >= 0 && state.match_index >= 0 { | ||||
| 					group_index = -1 | ||||
| 				} | ||||
| 								 | ||||
| 
 | ||||
| 				// we have a DOT MATCH on going
 | ||||
| 				//C.printf("IST_PROG_END l_ist: %08x\n", l_ist)
 | ||||
| 				if re.state_stack_index>=0 && l_ist == IST_DOT_CHAR { | ||||
| 					m_state = .stop | ||||
| 					continue | ||||
| 				} | ||||
| 
 | ||||
| 				re.state_stack_index = -1 | ||||
| 				m_state = .stop | ||||
| 				continue | ||||
| 				 | ||||
| 			} | ||||
| 
 | ||||
| 			// check GROUP start, no quantifier is checkd for this token!!
 | ||||
|  | @ -1527,7 +1532,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 					//C.printf("g.id: %d group_index: %d\n", re.prog[pc].group_id, group_index)
 | ||||
| 					if group_index >= 0 { | ||||
| 	 					start_i   := group_stack[group_index] | ||||
| 	 					group_stack[group_index]=-1 | ||||
| 	 					//group_stack[group_index]=-1
 | ||||
| 
 | ||||
| 	 					// save group results
 | ||||
| 						g_index := re.prog[pc].group_id*2 | ||||
|  | @ -1537,6 +1542,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 							re.groups[g_index] = 0 | ||||
| 						} | ||||
| 						re.groups[g_index+1] = i | ||||
| 						//C.printf("GROUP %d END [%d, %d]\n", re.prog[pc].group_id, re.groups[g_index], re.groups[g_index+1])
 | ||||
| 					} | ||||
| 					 | ||||
| 					re.prog[pc].group_rep++ // increase repetitions
 | ||||
|  | @ -1568,6 +1574,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 			else if ist == IST_DOT_CHAR { | ||||
| 				//C.printf("IST_DOT_CHAR rep: %d\n", re.prog[pc].rep)
 | ||||
| 				state.match_flag = true | ||||
| 				l_ist = u32(IST_DOT_CHAR) | ||||
| 
 | ||||
| 				if first_match < 0 { | ||||
| 					first_match = i | ||||
|  | @ -1575,12 +1582,23 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 				state.match_index = i | ||||
| 				re.prog[pc].rep++	 | ||||
| 
 | ||||
| 				if re.prog[pc].rep == 1 { | ||||
| 				//if re.prog[pc].rep >= re.prog[pc].rep_min && re.prog[pc].rep <= re.prog[pc].rep_max {
 | ||||
| 				if re.prog[pc].rep >= 0 && re.prog[pc].rep <= re.prog[pc].rep_max { | ||||
| 					//C.printf("DOT CHAR save state : %d\n", re.state_stack_index)
 | ||||
| 					// save the state
 | ||||
| 					re.state_stack_index++ | ||||
| 					 | ||||
| 					// manage first dot char
 | ||||
| 					if re.state_stack_index < 0 { | ||||
| 						re.state_stack_index++ | ||||
| 					} | ||||
| 
 | ||||
| 					re.state_stack[re.state_stack_index].pc = pc | ||||
| 					re.state_stack[re.state_stack_index].mi = state.match_index | ||||
| 					re.state_stack[re.state_stack_index].group_stack_index = group_index | ||||
| 				} else { | ||||
| 					re.state_stack[re.state_stack_index].pc = -1 | ||||
| 					re.state_stack[re.state_stack_index].mi = -1 | ||||
| 					re.state_stack[re.state_stack_index].group_stack_index = -1 | ||||
| 				} | ||||
| 
 | ||||
| 				if re.prog[pc].rep >= 1 && re.state_stack_index >= 0 { | ||||
|  | @ -1590,19 +1608,11 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 				// manage * and {0,} quantifier
 | ||||
| 				if re.prog[pc].rep_min > 0 { | ||||
| 					i += char_len // next char
 | ||||
| 					l_ist = u32(IST_DOT_CHAR) | ||||
| 				} | ||||
| 				 | ||||
| 				if re.prog[pc+1].ist !=  IST_GROUP_END { | ||||
| 					m_state = .ist_next | ||||
| 					continue | ||||
| 				}  | ||||
| 				// IST_DOT_CHAR is the last instruction, get all
 | ||||
| 				else { | ||||
| 					//C.printf("We are the last one!\n")
 | ||||
| 					pc--  | ||||
| 					m_state = .ist_next_ks | ||||
| 					continue | ||||
| 				} | ||||
| 
 | ||||
| 				m_state = .ist_next | ||||
| 				continue | ||||
| 
 | ||||
| 			} | ||||
| 
 | ||||
|  | @ -1622,6 +1632,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 
 | ||||
| 				if cc_res { | ||||
| 					state.match_flag = true | ||||
| 					l_ist = u32(IST_CHAR_CLASS_POS) | ||||
| 					 | ||||
| 					if first_match < 0 { | ||||
| 						first_match = i | ||||
|  | @ -1645,6 +1656,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 				//C.printf("BSLS in_ch: %c res: %d\n", ch, tmp_res)
 | ||||
| 				if tmp_res { | ||||
| 					state.match_flag = true | ||||
| 					l_ist = u32(IST_BSLS_CHAR) | ||||
| 					 | ||||
| 					if first_match < 0 { | ||||
| 						first_match = i | ||||
|  | @ -1669,6 +1681,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 				if re.prog[pc].ch == ch | ||||
| 				{ | ||||
| 					state.match_flag = true | ||||
| 					l_ist = u32(IST_SIMPLE_CHAR) | ||||
| 					 | ||||
| 					if first_match < 0 { | ||||
| 						first_match = i | ||||
|  | @ -1857,7 +1870,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 			} | ||||
| 
 | ||||
| 			// no other options
 | ||||
| 			//C.printf("NO_MATCH_FOUND\n")
 | ||||
| 			//C.printf("ist_quant_n NO_MATCH_FOUND\n")
 | ||||
| 			result = NO_MATCH_FOUND | ||||
| 			m_state = .stop | ||||
| 			continue | ||||
|  | @ -1873,12 +1886,6 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) { | |||
| 
 | ||||
| 			rep := re.prog[pc].rep | ||||
| 			 | ||||
| 			// clear the actual dot char capture state
 | ||||
| 			if re.state_stack_index >= 0 { | ||||
| 				//C.printf("Drop the DOT_CHAR state!\n")
 | ||||
| 				re.state_stack_index-- | ||||
| 			} | ||||
| 
 | ||||
| 			// under range
 | ||||
| 			if rep > 0 && rep < re.prog[pc].rep_min { | ||||
| 				//C.printf("ist_quant_p UNDER RANGE\n")
 | ||||
|  |  | |||
|  | @ -33,15 +33,13 @@ match_test_suite = [ | |||
| 	TestItem{"this is a good sample.",r"( ?\w+){,4}",0,14}, | ||||
| 	TestItem{"this is a good sample.",r"( ?\w+){,5}",0,21}, | ||||
| 	TestItem{"this is a good sample.",r"( ?\w+){2,3}",0,9}, | ||||
| 	TestItem{"this is a good sample.",r"(\s?\w+){2,3}",0,9}, | ||||
| 	TestItem{"this is a good sample.",r".*i(\w)+",0,4}, | ||||
| 	TestItem{"this is a good sample.",r"(\s?\w+){2,3}",0,9},	 | ||||
| 	TestItem{"this these those.",r"(th[ei]se?\s|\.)+",0,11}, | ||||
| 	TestItem{"this these those ",r"(th[eio]se? ?)+",0,17}, | ||||
| 	TestItem{"this these those ",r"(th[eio]se? )+",0,17}, | ||||
| 	TestItem{"this,these,those. over",r"(th[eio]se?[,. ])+",0,17}, | ||||
| 	TestItem{"soday,this,these,those. over",r"(th[eio]se?[,. ])+",6,23}, | ||||
| 	TestItem{"soday,this,these,those. over",r".*,(th[eio]se?[,. ])+",0,23}, | ||||
| 	TestItem{"soday,this,these,thesa.thesi over",r".*,(th[ei]se?[,. ])+(thes[ai][,. ])+",0,29}, | ||||
| 	 | ||||
| 	TestItem{"cpapaz",r"(c(pa)+z)",0,6}, | ||||
| 	TestItem{"this is a cpapaz over",r"(c(pa)+z)",10,16}, | ||||
| 	TestItem{"this is a cpapapez over",r"(c(p[ae])+z)",10,18}, | ||||
|  | @ -56,16 +54,23 @@ match_test_suite = [ | |||
| 	TestItem{"this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}",5,21}, | ||||
| 	TestItem{"1234this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}$",9,25}, | ||||
| 	TestItem{"this cpapaz adce aabe third",r"(c(pa)+z)(\s[\a]+){2}",5,21}, | ||||
| 	TestItem{"123cpapaz ole. pippo",r"(c(pa)+z)(\s+\a+[\.,]?)+",3,20}, | ||||
| 	 | ||||
| 	TestItem{"this is a good sample.",r".*i(\w)+",0,4}, | ||||
| 	TestItem{"soday,this,these,those. over",r".*,(th[eio]se?[,. ])+",0,23}, | ||||
| 	TestItem{"soday,this,these,thesa.thesi over",r".*,(th[ei]se?[,. ])+(thes[ai][,. ])+",0,29}, | ||||
| 	TestItem{"cpapaz ole. pippo,",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18}, | ||||
| 	TestItem{"cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,17}, | ||||
| 	TestItem{"cpapaz ole. pippo, 852",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18}, | ||||
| 	TestItem{"123cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20}, | ||||
| 	TestItem{"...cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20}, | ||||
| 	TestItem{"123cpapaz ole. pippo",r"(c(pa)+z)(\s+\a+[\.,]?)+",3,20}, | ||||
| 	 | ||||
| 	TestItem{"cpapaz ole. pippo,",r".*c.+ole.*pi",0,14}, | ||||
| 	TestItem{"cpapaz ole. pipipo,",r".*c.+ole.*p([ip])+o",0,18}, | ||||
| 	TestItem{"cpapaz ole. pipipo",r"^.*c.+ol?e.*p([ip])+o$",0,18}, | ||||
| 	TestItem{"abbb",r"ab{2,3}?",0,3}, | ||||
| 	TestItem{" pippo pera",r"\s(.*)pe(.*)",0,11}, | ||||
| 	TestItem{" abb",r"\s(.*)",0,4}, | ||||
| 
 | ||||
| 	// negative
 | ||||
| 	TestItem{"zthis ciao",r"((t[hieo]+se?)\s*)+",-1,0}, | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue