Tuesday, 20 August 2013

Parenthesized repetitions in Python regular expressions

Parenthesized repetitions in Python regular expressions

I have the following string (say the variable name is "str")
(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1
2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31
32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110
111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130
131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151)))
((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31
32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63
64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92
93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
151)))'
from which I would like to get
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35
46 47 48 56 59 61 65 69 71 84 105 107 130)']
using re.findall() function in Python.
I tried the following
m = re.findall(r'TEST\s\((\d+\s?)*\)', str)
for which I get the result
['148', '130']
which is a list of only the last numbers of each set of numbers I want. I
don't know why my regexp is wrong. Can someone please help me fix this
problem?
Thanks!

No comments:

Post a Comment