regex: bug fixes, docs
parent
ad7bc37672
commit
36660ce749
|
@ -4,14 +4,14 @@
|
||||||
|
|
||||||
## introduction
|
## introduction
|
||||||
|
|
||||||
Write here the introduction
|
Write here the introduction... not today!! -_-
|
||||||
|
|
||||||
## Basic assumption
|
## Basic assumption
|
||||||
|
|
||||||
In this release, during the writing of the code some assumption are made and are valid for all the features.
|
In this release, during the writing of the code some assumptions are made and are valid for all the features.
|
||||||
|
|
||||||
1. The matching stops at the end of the string not at the newline chars.
|
1. The matching stops at the end of the string not at the newline chars.
|
||||||
2. The basic element of this regex engine are the tokens, in query string a simple char is a token. The token is the atomic unit of this regex engine.
|
2. The basic elements of this regex engine are the tokens, in a query string a simple char is a token. The token is the atomic unit of this regex engine.
|
||||||
|
|
||||||
## Match positional limiter
|
## Match positional limiter
|
||||||
|
|
||||||
|
@ -21,11 +21,11 @@ The module supports the following features:
|
||||||
|
|
||||||
`^` (Caret.) Matches at the start of the string
|
`^` (Caret.) Matches at the start of the string
|
||||||
|
|
||||||
`?` Matches at the end of the string
|
`$` Matches at the end of the string
|
||||||
|
|
||||||
## Tokens
|
## Tokens
|
||||||
|
|
||||||
The tokens are the atomic unit used by this regex engine and can be ones of the following:
|
The tokens are the atomic units used by this regex engine and can be ones of the following:
|
||||||
|
|
||||||
### Simple char
|
### Simple char
|
||||||
|
|
||||||
|
@ -33,11 +33,11 @@ this token is a simple single character like `a`.
|
||||||
|
|
||||||
### Char class (cc)
|
### Char class (cc)
|
||||||
|
|
||||||
The cc match all the chars specified in its inside, it is delimited by square brackets `[ ]`
|
The cc match all the chars specified inside, it is delimited by square brackets `[ ]`
|
||||||
|
|
||||||
the sequence of chars in the class is evaluated with an OR operation.
|
the sequence of chars in the class is evaluated with an OR operation.
|
||||||
|
|
||||||
For example the following cc `[abc]` match any char that is `a` or `b` or `c` but doesn't match `C` or `z`.
|
For example, the following cc `[abc]` match any char that is `a` or `b` or `c` but doesn't match `C` or `z`.
|
||||||
|
|
||||||
Inside a cc is possible to specify a "range" of chars, for example `[ad-f]` is equivalent to write `[adef]`.
|
Inside a cc is possible to specify a "range" of chars, for example `[ad-f]` is equivalent to write `[adef]`.
|
||||||
|
|
||||||
|
@ -68,17 +68,17 @@ A meta-char can match different type of chars.
|
||||||
|
|
||||||
Each token can have a quantifier that specify how many times the char can or must be matched.
|
Each token can have a quantifier that specify how many times the char can or must be matched.
|
||||||
|
|
||||||
**Short quantifier**
|
#### **Short quantifier**
|
||||||
|
|
||||||
- `?` match 0 or 1 time, `a?b` match both `ab` or `b`
|
- `?` match 0 or 1 time, `a?b` match both `ab` or `b`
|
||||||
- `+` match at minimum 1 time, `a+` match both `aaa` or `a`
|
- `+` match at minimum 1 time, `a+` match both `aaa` or `a`
|
||||||
- `*` match 0 or more time, `a*b` match both `aaab` or `ab` or `b`
|
- `*` match 0 or more time, `a*b` match both `aaab` or `ab` or `b`
|
||||||
|
|
||||||
**Long quantifier**
|
#### **Long quantifier**
|
||||||
|
|
||||||
- `{x}` match exactly x time, `a{2}` match `aa` but doesn't match `aaa` or `a`
|
- `{x}` match exactly x time, `a{2}` match `aa` but doesn't match `aaa` or `a`
|
||||||
- `{min,}` match at minimum min time, `a{2,}` match `aaa` or `aa` but doesn't match `a`
|
- `{min,}` match at minimum min time, `a{2,}` match `aaa` or `aa` but doesn't match `a`
|
||||||
- `{,max}` match at least 1 time and maximum max time, `a{,2}` match `a` and `aa` but doesn't match `aaa`
|
- `{,max}` match at least 0 time and maximum max time, `a{,2}` match `a` and `aa` but doesn't match `aaa`
|
||||||
- `{min,max}` match from min times to max times, `a{2,3}` match `aa` and `aaa` but doesn't match `a` or `aaaa`
|
- `{min,max}` match from min times to max times, `a{2,3}` match `aa` and `aaa` but doesn't match `a` or `aaaa`
|
||||||
|
|
||||||
a long quantifier may have a `greedy off` flag that is the `?` char after the brackets, `{2,4}?` means to match the minimum number possible tokens in this case 2.
|
a long quantifier may have a `greedy off` flag that is the `?` char after the brackets, `{2,4}?` means to match the minimum number possible tokens in this case 2.
|
||||||
|
@ -102,7 +102,7 @@ the dot char match any char until the next token match is satisfied.
|
||||||
|
|
||||||
the token `|` is a logic OR operation between two consecutive tokens, `a|b` match a char that is `a` or `b`.
|
the token `|` is a logic OR operation between two consecutive tokens, `a|b` match a char that is `a` or `b`.
|
||||||
|
|
||||||
The or token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a` the test the group `(b)` and if the group doesn't match test the token `c`.
|
The OR token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a` then test the group `(b)` and if the group doesn't match test the token `c`.
|
||||||
|
|
||||||
**note: The OR work at token level! It doesn't work at concatenation level!**
|
**note: The OR work at token level! It doesn't work at concatenation level!**
|
||||||
|
|
||||||
|
@ -181,16 +181,16 @@ re.flag = regex.F_BIN
|
||||||
|
|
||||||
### Initializer
|
### Initializer
|
||||||
|
|
||||||
These function are helper that create the `RE` struct, a `RE` struct can be created manually if you needed.
|
These functions are helper that create the `RE` struct, a `RE` struct can be created manually if you needed.
|
||||||
|
|
||||||
**Simplified initializer**
|
#### **Simplified initializer**
|
||||||
|
|
||||||
```v
|
```v
|
||||||
// regex create a regex object from the query string and compile it
|
// regex create a regex object from the query string and compile it
|
||||||
pub fn regex(in_query string) (RE,int,int)
|
pub fn regex(in_query string) (RE,int,int)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Base initializer**
|
#### **Base initializer**
|
||||||
|
|
||||||
```v
|
```v
|
||||||
// new_regex create a REgex of small size, usually sufficient for ordinary use
|
// new_regex create a REgex of small size, usually sufficient for ordinary use
|
||||||
|
@ -199,13 +199,13 @@ pub fn new_regex() RE
|
||||||
// new_regex_by_size create a REgex of large size, mult specify the scale factor of the memory that will be allocated
|
// new_regex_by_size create a REgex of large size, mult specify the scale factor of the memory that will be allocated
|
||||||
pub fn new_regex_by_size(mult int) RE
|
pub fn new_regex_by_size(mult int) RE
|
||||||
```
|
```
|
||||||
After the base initializer use, the regex expression must be compiled with:
|
After a base initializer is used, the regex expression must be compiled with:
|
||||||
```v
|
```v
|
||||||
// compile return (return code, index) where index is the index of the error in the query string if return code is an error code
|
// compile return (return code, index) where index is the index of the error in the query string if return code is an error code
|
||||||
pub fn (re mut RE) compile(in_txt string) (int,int)
|
pub fn (re mut RE) compile(in_txt string) (int,int)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Functions
|
### Operative Functions
|
||||||
|
|
||||||
These are the operative functions
|
These are the operative functions
|
||||||
|
|
||||||
|
@ -227,7 +227,7 @@ pub fn (re mut RE) replace(in_txt string, repl string) string
|
||||||
|
|
||||||
This module has few small utilities to help the writing of regex expressions.
|
This module has few small utilities to help the writing of regex expressions.
|
||||||
|
|
||||||
**Syntax errors highlight**
|
### **Syntax errors highlight**
|
||||||
|
|
||||||
the following example code show how to visualize the syntax errors in the compilation phase:
|
the following example code show how to visualize the syntax errors in the compilation phase:
|
||||||
|
|
||||||
|
@ -256,7 +256,7 @@ if re_err != COMPILE_OK {
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Compiled code**
|
### **Compiled code**
|
||||||
|
|
||||||
It is possible view the compiled code calling the function `get_query()` the result will be something like this:
|
It is possible view the compiled code calling the function `get_query()` the result will be something like this:
|
||||||
|
|
||||||
|
@ -279,7 +279,7 @@ PC: 2 ist: 88000000 PROG_END { 0, 0}
|
||||||
|
|
||||||
`{m,n}` is the quantifier, the greedy off flag `?` will be showed if present in the token
|
`{m,n}` is the quantifier, the greedy off flag `?` will be showed if present in the token
|
||||||
|
|
||||||
**Log debug**
|
### **Log debug**
|
||||||
|
|
||||||
The log debugger allow to print the status of the regex parser when the parser is running.
|
The log debugger allow to print the status of the regex parser when the parser is running.
|
||||||
|
|
||||||
|
@ -338,6 +338,21 @@ the columns have the following meaning:
|
||||||
|
|
||||||
`{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition, `?` is the greedy off flag if present
|
`{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition, `?` is the greedy off flag if present
|
||||||
|
|
||||||
|
### **Custom Logger output**
|
||||||
|
|
||||||
|
The debug functions output uses the `stdout` as default, it is possible to provide an alternative output setting a custom output function:
|
||||||
|
|
||||||
|
```v
|
||||||
|
// custom print function, the input will be the regex debug string
|
||||||
|
fn custom_print(txt string) {
|
||||||
|
println("my log: $txt")
|
||||||
|
}
|
||||||
|
|
||||||
|
mut re := new_regex()
|
||||||
|
re.log_func = custom_print // every debug output from now will call this function
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
## Example code
|
## Example code
|
||||||
|
|
||||||
Here there is a simple code to perform some basically match of strings
|
Here there is a simple code to perform some basically match of strings
|
||||||
|
|
|
@ -200,7 +200,6 @@ pub fn (re RE) get_parse_error_string(err int) string {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
// utf8_str convert and utf8 sequence to a printable string
|
// utf8_str convert and utf8 sequence to a printable string
|
||||||
[inline]
|
[inline]
|
||||||
fn utf8_str(ch u32) string {
|
fn utf8_str(ch u32) string {
|
||||||
|
@ -231,7 +230,7 @@ mut:
|
||||||
ist u32 = u32(0)
|
ist u32 = u32(0)
|
||||||
|
|
||||||
// char
|
// char
|
||||||
ch u32 = u32(0)// char of the token if any
|
ch u32 = u32(0) // char of the token if any
|
||||||
ch_len byte = byte(0) // char len
|
ch_len byte = byte(0) // char len
|
||||||
|
|
||||||
// Quantifiers / branch
|
// Quantifiers / branch
|
||||||
|
@ -245,7 +244,7 @@ mut:
|
||||||
// counters for quantifier check (repetitions)
|
// counters for quantifier check (repetitions)
|
||||||
rep int = 0
|
rep int = 0
|
||||||
|
|
||||||
// validator function pointer and control char
|
// validator function pointer
|
||||||
validator fn (byte) bool
|
validator fn (byte) bool
|
||||||
|
|
||||||
// groups variables
|
// groups variables
|
||||||
|
@ -280,9 +279,9 @@ pub const (
|
||||||
|
|
||||||
struct StateDotObj{
|
struct StateDotObj{
|
||||||
mut:
|
mut:
|
||||||
i int = 0 // char index in the input buffer
|
i int = -1 // char index in the input buffer
|
||||||
pc int = 0 // program counter saved
|
pc int = -1 // program counter saved
|
||||||
mi int = 0 // match_index saved
|
mi int = -1 // match_index saved
|
||||||
group_stack_index int = -1 // group index stack pointer saved
|
group_stack_index int = -1 // group index stack pointer saved
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -648,7 +647,7 @@ fn (re RE) parse_quantifier(in_txt string, in_i int) (int, int, int, bool) {
|
||||||
|
|
||||||
// min parsing skip if comma present
|
// min parsing skip if comma present
|
||||||
if status == .start && ch == `,` {
|
if status == .start && ch == `,` {
|
||||||
q_min = 1 // default min in a {} quantifier is 1
|
q_min = 0 // default min in a {} quantifier is 0
|
||||||
status = .comma_checked
|
status = .comma_checked
|
||||||
i++
|
i++
|
||||||
continue
|
continue
|
||||||
|
@ -998,6 +997,7 @@ pub fn (re mut RE) compile(in_txt string) (int,int) {
|
||||||
// Post processing
|
// Post processing
|
||||||
//******************************************
|
//******************************************
|
||||||
|
|
||||||
|
|
||||||
// count IST_DOT_CHAR to set the size of the state stack
|
// count IST_DOT_CHAR to set the size of the state stack
|
||||||
mut pc1 := 0
|
mut pc1 := 0
|
||||||
mut tmp_count := 0
|
mut tmp_count := 0
|
||||||
|
@ -1007,10 +1007,10 @@ pub fn (re mut RE) compile(in_txt string) (int,int) {
|
||||||
}
|
}
|
||||||
pc1++
|
pc1++
|
||||||
}
|
}
|
||||||
|
|
||||||
// init the state stack
|
// init the state stack
|
||||||
re.state_stack = [StateDotObj{}].repeat(tmp_count+1)
|
re.state_stack = [StateDotObj{}].repeat(tmp_count+1)
|
||||||
|
|
||||||
|
|
||||||
// OR branch
|
// OR branch
|
||||||
// a|b|cd
|
// a|b|cd
|
||||||
// d exit point
|
// d exit point
|
||||||
|
@ -1279,7 +1279,8 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
|
|
||||||
mut pc := -1 // program counter
|
mut pc := -1 // program counter
|
||||||
mut state := StateObj{} // actual state
|
mut state := StateObj{} // actual state
|
||||||
mut ist := u32(0) // Program Counter
|
mut ist := u32(0) // actual instruction
|
||||||
|
mut l_ist := u32(0) // last matched instruction
|
||||||
|
|
||||||
mut group_stack := [-1].repeat(re.group_max)
|
mut group_stack := [-1].repeat(re.group_max)
|
||||||
mut group_data := [-1].repeat(re.group_max)
|
mut group_data := [-1].repeat(re.group_max)
|
||||||
|
@ -1359,7 +1360,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
tmp_gr := re.prog[re.prog[pc].goto_pc].group_rep
|
tmp_gr := re.prog[re.prog[pc].goto_pc].group_rep
|
||||||
buf2.write("GROUP_START #:${tmp_gi} rep:${tmp_gr} ")
|
buf2.write("GROUP_START #:${tmp_gi} rep:${tmp_gr} ")
|
||||||
} else if ist == IST_GROUP_END {
|
} else if ist == IST_GROUP_END {
|
||||||
buf2.write("GROUP_END #:${re.prog[pc].group_id} deep:${group_index} ")
|
buf2.write("GROUP_END #:${re.prog[pc].group_id} deep:${group_index}")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if re.prog[pc].rep_max == MAX_QUANTIFIER {
|
if re.prog[pc].rep_max == MAX_QUANTIFIER {
|
||||||
|
@ -1417,17 +1418,10 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// manage IST_DOT_CHAR
|
// manage IST_DOT_CHAR
|
||||||
if re.state_stack_index >= 0 {
|
|
||||||
//C.printf("DOT CHAR text end management!\n")
|
|
||||||
// if DOT CHAR is not the last instruction and we are still going, then no match!!
|
|
||||||
if pc < re.prog.len && re.prog[pc+1].ist != IST_PROG_END {
|
|
||||||
return NO_MATCH_FOUND,0
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
m_state == .end
|
m_state == .end
|
||||||
break
|
break
|
||||||
return NO_MATCH_FOUND,0
|
//return NO_MATCH_FOUND,0
|
||||||
}
|
}
|
||||||
|
|
||||||
// starting and init
|
// starting and init
|
||||||
|
@ -1475,7 +1469,8 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
// check if stop
|
// check if stop
|
||||||
if m_state == .stop {
|
if m_state == .stop {
|
||||||
// if we are in restore state ,do it and restart
|
// if we are in restore state ,do it and restart
|
||||||
if re.state_stack_index >= 0 {
|
//C.printf("re.state_stack_index %d\n",re.state_stack_index )
|
||||||
|
if re.state_stack_index >=0 && re.state_stack[re.state_stack_index].pc >= 0 {
|
||||||
i = re.state_stack[re.state_stack_index].i
|
i = re.state_stack[re.state_stack_index].i
|
||||||
pc = re.state_stack[re.state_stack_index].pc
|
pc = re.state_stack[re.state_stack_index].pc
|
||||||
state.match_index = re.state_stack[re.state_stack_index].mi
|
state.match_index = re.state_stack[re.state_stack_index].mi
|
||||||
|
@ -1499,14 +1494,24 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
// program end
|
// program end
|
||||||
if ist == IST_PROG_END {
|
if ist == IST_PROG_END {
|
||||||
// if we are in match exit well
|
// if we are in match exit well
|
||||||
|
|
||||||
if group_index >= 0 && state.match_index >= 0 {
|
if group_index >= 0 && state.match_index >= 0 {
|
||||||
group_index = -1
|
group_index = -1
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// we have a DOT MATCH on going
|
||||||
|
//C.printf("IST_PROG_END l_ist: %08x\n", l_ist)
|
||||||
|
if re.state_stack_index>=0 && l_ist == IST_DOT_CHAR {
|
||||||
m_state = .stop
|
m_state = .stop
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
|
re.state_stack_index = -1
|
||||||
|
m_state = .stop
|
||||||
|
continue
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
// check GROUP start, no quantifier is checkd for this token!!
|
// check GROUP start, no quantifier is checkd for this token!!
|
||||||
else if ist == IST_GROUP_START {
|
else if ist == IST_GROUP_START {
|
||||||
group_index++
|
group_index++
|
||||||
|
@ -1527,7 +1532,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
//C.printf("g.id: %d group_index: %d\n", re.prog[pc].group_id, group_index)
|
//C.printf("g.id: %d group_index: %d\n", re.prog[pc].group_id, group_index)
|
||||||
if group_index >= 0 {
|
if group_index >= 0 {
|
||||||
start_i := group_stack[group_index]
|
start_i := group_stack[group_index]
|
||||||
group_stack[group_index]=-1
|
//group_stack[group_index]=-1
|
||||||
|
|
||||||
// save group results
|
// save group results
|
||||||
g_index := re.prog[pc].group_id*2
|
g_index := re.prog[pc].group_id*2
|
||||||
|
@ -1537,6 +1542,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
re.groups[g_index] = 0
|
re.groups[g_index] = 0
|
||||||
}
|
}
|
||||||
re.groups[g_index+1] = i
|
re.groups[g_index+1] = i
|
||||||
|
//C.printf("GROUP %d END [%d, %d]\n", re.prog[pc].group_id, re.groups[g_index], re.groups[g_index+1])
|
||||||
}
|
}
|
||||||
|
|
||||||
re.prog[pc].group_rep++ // increase repetitions
|
re.prog[pc].group_rep++ // increase repetitions
|
||||||
|
@ -1568,6 +1574,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
else if ist == IST_DOT_CHAR {
|
else if ist == IST_DOT_CHAR {
|
||||||
//C.printf("IST_DOT_CHAR rep: %d\n", re.prog[pc].rep)
|
//C.printf("IST_DOT_CHAR rep: %d\n", re.prog[pc].rep)
|
||||||
state.match_flag = true
|
state.match_flag = true
|
||||||
|
l_ist = u32(IST_DOT_CHAR)
|
||||||
|
|
||||||
if first_match < 0 {
|
if first_match < 0 {
|
||||||
first_match = i
|
first_match = i
|
||||||
|
@ -1575,12 +1582,23 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
state.match_index = i
|
state.match_index = i
|
||||||
re.prog[pc].rep++
|
re.prog[pc].rep++
|
||||||
|
|
||||||
if re.prog[pc].rep == 1 {
|
//if re.prog[pc].rep >= re.prog[pc].rep_min && re.prog[pc].rep <= re.prog[pc].rep_max {
|
||||||
|
if re.prog[pc].rep >= 0 && re.prog[pc].rep <= re.prog[pc].rep_max {
|
||||||
|
//C.printf("DOT CHAR save state : %d\n", re.state_stack_index)
|
||||||
// save the state
|
// save the state
|
||||||
|
|
||||||
|
// manage first dot char
|
||||||
|
if re.state_stack_index < 0 {
|
||||||
re.state_stack_index++
|
re.state_stack_index++
|
||||||
|
}
|
||||||
|
|
||||||
re.state_stack[re.state_stack_index].pc = pc
|
re.state_stack[re.state_stack_index].pc = pc
|
||||||
re.state_stack[re.state_stack_index].mi = state.match_index
|
re.state_stack[re.state_stack_index].mi = state.match_index
|
||||||
re.state_stack[re.state_stack_index].group_stack_index = group_index
|
re.state_stack[re.state_stack_index].group_stack_index = group_index
|
||||||
|
} else {
|
||||||
|
re.state_stack[re.state_stack_index].pc = -1
|
||||||
|
re.state_stack[re.state_stack_index].mi = -1
|
||||||
|
re.state_stack[re.state_stack_index].group_stack_index = -1
|
||||||
}
|
}
|
||||||
|
|
||||||
if re.prog[pc].rep >= 1 && re.state_stack_index >= 0 {
|
if re.prog[pc].rep >= 1 && re.state_stack_index >= 0 {
|
||||||
|
@ -1590,19 +1608,11 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
// manage * and {0,} quantifier
|
// manage * and {0,} quantifier
|
||||||
if re.prog[pc].rep_min > 0 {
|
if re.prog[pc].rep_min > 0 {
|
||||||
i += char_len // next char
|
i += char_len // next char
|
||||||
|
l_ist = u32(IST_DOT_CHAR)
|
||||||
}
|
}
|
||||||
|
|
||||||
if re.prog[pc+1].ist != IST_GROUP_END {
|
|
||||||
m_state = .ist_next
|
m_state = .ist_next
|
||||||
continue
|
continue
|
||||||
}
|
|
||||||
// IST_DOT_CHAR is the last instruction, get all
|
|
||||||
else {
|
|
||||||
//C.printf("We are the last one!\n")
|
|
||||||
pc--
|
|
||||||
m_state = .ist_next_ks
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1622,6 +1632,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
|
|
||||||
if cc_res {
|
if cc_res {
|
||||||
state.match_flag = true
|
state.match_flag = true
|
||||||
|
l_ist = u32(IST_CHAR_CLASS_POS)
|
||||||
|
|
||||||
if first_match < 0 {
|
if first_match < 0 {
|
||||||
first_match = i
|
first_match = i
|
||||||
|
@ -1645,6 +1656,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
//C.printf("BSLS in_ch: %c res: %d\n", ch, tmp_res)
|
//C.printf("BSLS in_ch: %c res: %d\n", ch, tmp_res)
|
||||||
if tmp_res {
|
if tmp_res {
|
||||||
state.match_flag = true
|
state.match_flag = true
|
||||||
|
l_ist = u32(IST_BSLS_CHAR)
|
||||||
|
|
||||||
if first_match < 0 {
|
if first_match < 0 {
|
||||||
first_match = i
|
first_match = i
|
||||||
|
@ -1669,6 +1681,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
if re.prog[pc].ch == ch
|
if re.prog[pc].ch == ch
|
||||||
{
|
{
|
||||||
state.match_flag = true
|
state.match_flag = true
|
||||||
|
l_ist = u32(IST_SIMPLE_CHAR)
|
||||||
|
|
||||||
if first_match < 0 {
|
if first_match < 0 {
|
||||||
first_match = i
|
first_match = i
|
||||||
|
@ -1857,7 +1870,7 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// no other options
|
// no other options
|
||||||
//C.printf("NO_MATCH_FOUND\n")
|
//C.printf("ist_quant_n NO_MATCH_FOUND\n")
|
||||||
result = NO_MATCH_FOUND
|
result = NO_MATCH_FOUND
|
||||||
m_state = .stop
|
m_state = .stop
|
||||||
continue
|
continue
|
||||||
|
@ -1873,12 +1886,6 @@ pub fn (re mut RE) match_base(in_txt byteptr, in_txt_len int ) (int,int) {
|
||||||
|
|
||||||
rep := re.prog[pc].rep
|
rep := re.prog[pc].rep
|
||||||
|
|
||||||
// clear the actual dot char capture state
|
|
||||||
if re.state_stack_index >= 0 {
|
|
||||||
//C.printf("Drop the DOT_CHAR state!\n")
|
|
||||||
re.state_stack_index--
|
|
||||||
}
|
|
||||||
|
|
||||||
// under range
|
// under range
|
||||||
if rep > 0 && rep < re.prog[pc].rep_min {
|
if rep > 0 && rep < re.prog[pc].rep_min {
|
||||||
//C.printf("ist_quant_p UNDER RANGE\n")
|
//C.printf("ist_quant_p UNDER RANGE\n")
|
||||||
|
|
|
@ -34,14 +34,12 @@ match_test_suite = [
|
||||||
TestItem{"this is a good sample.",r"( ?\w+){,5}",0,21},
|
TestItem{"this is a good sample.",r"( ?\w+){,5}",0,21},
|
||||||
TestItem{"this is a good sample.",r"( ?\w+){2,3}",0,9},
|
TestItem{"this is a good sample.",r"( ?\w+){2,3}",0,9},
|
||||||
TestItem{"this is a good sample.",r"(\s?\w+){2,3}",0,9},
|
TestItem{"this is a good sample.",r"(\s?\w+){2,3}",0,9},
|
||||||
TestItem{"this is a good sample.",r".*i(\w)+",0,4},
|
|
||||||
TestItem{"this these those.",r"(th[ei]se?\s|\.)+",0,11},
|
TestItem{"this these those.",r"(th[ei]se?\s|\.)+",0,11},
|
||||||
TestItem{"this these those ",r"(th[eio]se? ?)+",0,17},
|
TestItem{"this these those ",r"(th[eio]se? ?)+",0,17},
|
||||||
TestItem{"this these those ",r"(th[eio]se? )+",0,17},
|
TestItem{"this these those ",r"(th[eio]se? )+",0,17},
|
||||||
TestItem{"this,these,those. over",r"(th[eio]se?[,. ])+",0,17},
|
TestItem{"this,these,those. over",r"(th[eio]se?[,. ])+",0,17},
|
||||||
TestItem{"soday,this,these,those. over",r"(th[eio]se?[,. ])+",6,23},
|
TestItem{"soday,this,these,those. over",r"(th[eio]se?[,. ])+",6,23},
|
||||||
TestItem{"soday,this,these,those. over",r".*,(th[eio]se?[,. ])+",0,23},
|
|
||||||
TestItem{"soday,this,these,thesa.thesi over",r".*,(th[ei]se?[,. ])+(thes[ai][,. ])+",0,29},
|
|
||||||
TestItem{"cpapaz",r"(c(pa)+z)",0,6},
|
TestItem{"cpapaz",r"(c(pa)+z)",0,6},
|
||||||
TestItem{"this is a cpapaz over",r"(c(pa)+z)",10,16},
|
TestItem{"this is a cpapaz over",r"(c(pa)+z)",10,16},
|
||||||
TestItem{"this is a cpapapez over",r"(c(p[ae])+z)",10,18},
|
TestItem{"this is a cpapapez over",r"(c(p[ae])+z)",10,18},
|
||||||
|
@ -56,16 +54,23 @@ match_test_suite = [
|
||||||
TestItem{"this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}",5,21},
|
TestItem{"this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}",5,21},
|
||||||
TestItem{"1234this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}$",9,25},
|
TestItem{"1234this cpapaz adce aabe",r"(c(pa)+z)(\s[\a]+){2}$",9,25},
|
||||||
TestItem{"this cpapaz adce aabe third",r"(c(pa)+z)(\s[\a]+){2}",5,21},
|
TestItem{"this cpapaz adce aabe third",r"(c(pa)+z)(\s[\a]+){2}",5,21},
|
||||||
|
TestItem{"123cpapaz ole. pippo",r"(c(pa)+z)(\s+\a+[\.,]?)+",3,20},
|
||||||
|
|
||||||
|
TestItem{"this is a good sample.",r".*i(\w)+",0,4},
|
||||||
|
TestItem{"soday,this,these,those. over",r".*,(th[eio]se?[,. ])+",0,23},
|
||||||
|
TestItem{"soday,this,these,thesa.thesi over",r".*,(th[ei]se?[,. ])+(thes[ai][,. ])+",0,29},
|
||||||
TestItem{"cpapaz ole. pippo,",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18},
|
TestItem{"cpapaz ole. pippo,",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18},
|
||||||
TestItem{"cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,17},
|
TestItem{"cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,17},
|
||||||
TestItem{"cpapaz ole. pippo, 852",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18},
|
TestItem{"cpapaz ole. pippo, 852",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,18},
|
||||||
TestItem{"123cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20},
|
TestItem{"123cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20},
|
||||||
TestItem{"...cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20},
|
TestItem{"...cpapaz ole. pippo",r".*(c(pa)+z)(\s+\a+[\.,]?)+",0,20},
|
||||||
TestItem{"123cpapaz ole. pippo",r"(c(pa)+z)(\s+\a+[\.,]?)+",3,20},
|
|
||||||
TestItem{"cpapaz ole. pippo,",r".*c.+ole.*pi",0,14},
|
TestItem{"cpapaz ole. pippo,",r".*c.+ole.*pi",0,14},
|
||||||
TestItem{"cpapaz ole. pipipo,",r".*c.+ole.*p([ip])+o",0,18},
|
TestItem{"cpapaz ole. pipipo,",r".*c.+ole.*p([ip])+o",0,18},
|
||||||
TestItem{"cpapaz ole. pipipo",r"^.*c.+ol?e.*p([ip])+o$",0,18},
|
TestItem{"cpapaz ole. pipipo",r"^.*c.+ol?e.*p([ip])+o$",0,18},
|
||||||
TestItem{"abbb",r"ab{2,3}?",0,3},
|
TestItem{"abbb",r"ab{2,3}?",0,3},
|
||||||
|
TestItem{" pippo pera",r"\s(.*)pe(.*)",0,11},
|
||||||
|
TestItem{" abb",r"\s(.*)",0,4},
|
||||||
|
|
||||||
// negative
|
// negative
|
||||||
TestItem{"zthis ciao",r"((t[hieo]+se?)\s*)+",-1,0},
|
TestItem{"zthis ciao",r"((t[hieo]+se?)\s*)+",-1,0},
|
||||||
|
|
Loading…
Reference in New Issue