View Single Post
  #5   Report Post  
Posted to uk.d-i-y
The Natural Philosopher[_2_] The Natural Philosopher[_2_] is offline
external usenet poster
 
Posts: 39,563
Default regex guru required

On 09/11/17 17:55, Dave Liquorice wrote:
Given the string:

td class="l"span title="Word_1"Word_1/span/tdtd
class="l"span title=""/span/tdtd class="l"Word_3/tdtd
class="l" style="color: green;"Word_4/td

What regex magicary for PHP's preg_match_all that can extract just
the text of the "Word_n" fields *including* the empty Word_2. That is
I want a list or four variables filled with Word_1, Word_2, Word_3
and Word_4 even when a field is empty. The "words" change and so does
the color. The actual string is longer but all subsequent fields
follow the same format as Word_3.

Just dumping everything between and or collecting everything
between and doesn't work as there are effectively empty matches
between adjacent tags. So you end up with

$1 = ""
$2 = "Word_1"
$3 = ""
$4 = ""
$5 = ""
$6 = "" (This would be "Word_2" if it wasn't empty)
$7 = ""
$8 = "Word_3"
$9 = ""
$10 = "Word_4"

Rather than:

$1 = "Word_1"
$2 = "" (This would be "Word_2" if it wasn't empty)
$3 = "Word_3"
$4 = "Word_4"

Reliably finding the end of each word is easy with: (.*?)\/[s|t]

Finding the begining is what I'm stuck on

\"(.*?)\/[s|t] fails as it leaves the span title tag in place.

\"([^].*?)\/[s|t] fails as it strips the empty Word_2

do it in stages.

Find what is between td/td first.

Then eliminate anything between and

Whats left, if anything, will be the wanted words



--
Canada is all right really, though not for the whole weekend.

"Saki"