regex guru required
On 09/11/17 17:55, Dave Liquorice wrote:
Given the string:
td class="l"span title="Word_1"Word_1/span/tdtd
class="l"span title=""/span/tdtd class="l"Word_3/tdtd
class="l" style="color: green;"Word_4/td
What regex magicary for PHP's preg_match_all that can extract just
the text of the "Word_n" fields *including* the empty Word_2. That is
I want a list or four variables filled with Word_1, Word_2, Word_3
and Word_4 even when a field is empty. The "words" change and so does
the color. The actual string is longer but all subsequent fields
follow the same format as Word_3.
Just dumping everything between and or collecting everything
between and doesn't work as there are effectively empty matches
between adjacent tags. So you end up with
$1 = ""
$2 = "Word_1"
$3 = ""
$4 = ""
$5 = ""
$6 = "" (This would be "Word_2" if it wasn't empty)
$7 = ""
$8 = "Word_3"
$9 = ""
$10 = "Word_4"
Rather than:
$1 = "Word_1"
$2 = "" (This would be "Word_2" if it wasn't empty)
$3 = "Word_3"
$4 = "Word_4"
Reliably finding the end of each word is easy with: (.*?)\/[s|t]
Finding the begining is what I'm stuck on
\"(.*?)\/[s|t] fails as it leaves the span title tag in place.
\"([^].*?)\/[s|t] fails as it strips the empty Word_2
do it in stages.
Find what is between td/td first.
Then eliminate anything between and
Whats left, if anything, will be the wanted words
--
Canada is all right really, though not for the whole weekend.
"Saki"
|