Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This turned out to be rather complex and I can't say if my solution works for any document. The main issue is as indicated in the Question, that RegexMatch.FirstIndex can not be used to determine were the actually Match is within the MS Word Document. This is due to the fact that regex matching is done on range.Text property (String) and that string just contains different amount of characters than the range object does and hence Indexes don't match.</p> <p>So my solution is for each match, I do a Find in the whole document for that match. the find methods gives a Range object from which the correct page can be determined.</p> <p>In my special case a match could be the same thing also different value. Example: <code>343</code>in my case would be the same as <code>Prefix-343</code>. A second issue was that the matches must be sorted eg <code>123</code>before <code>324</code>regardless which one occurs first in the document.</p> <p>If you require the Sort Functionality you will also need the following to "modules":</p> <p>SortDictionary Function:</p> <p><a href="http://www.cpearson.com/excel/CollectionsAndDictionaries.htm" rel="nofollow noreferrer">http://www.cpearson.com/excel/CollectionsAndDictionaries.htm</a></p> <p>Module "modQSortInPlace":</p> <p><a href="http://www.cpearson.com/Zips/modQSortInPlace.zip" rel="nofollow noreferrer">http://www.cpearson.com/Zips/modQSortInPlace.zip</a></p> <p>If no sort is needed you don't need them but you need to remove the according function call <code>SortDictionary Dict, True</code>from my code.</p> <p>Now to my code. Soem parts you can remove, especially the formatting one. This is specific to my case. Also if your match is "unique", eg. not prefix or so you can simplify the code too. You will need to reference the "Microsoft Scripting Library".</p> <pre><code>Option Explicit Sub ExtractRNumbers() Dim Dict As Scripting.Dictionary Set Dict = CreateObject("Scripting.dictionary") Dim regExp, Match, Matches Dim rNumber As String Dim range As range Set regExp = CreateObject("VBScript.RegExp") regExp.Pattern = "\b(R-)?\d{2}-\d{4,5}(-\d)?\b" regExp.IgnoreCase = False regExp.Global = True ' determine main section, only extract R-Numbers from main section ' and not the Table of contents as example ' main section = section with most characters Dim section As section Dim maxSectionSize As Long Dim sectionSize As Long Dim sectionIndex As Integer Dim currentIndex As Integer maxSectionSize = 0 currentIndex = 1 For Each section In ActiveDocument.Sections sectionSize = Len(section.range.text) If sectionSize &gt; maxSectionSize Then maxSectionSize = sectionSize sectionIndex = currentIndex End If currentIndex = currentIndex + 1 Next Set Matches = regExp.Execute(ActiveDocument.Sections(sectionIndex).range.text) For Each Match In Matches ' If the Document contains Tables, ActiveDocument.range.Text will contain ' BEL charachters (chr(7)) that probably define the table structure. The issue ' is that then Match.FirstIndex does not point to the actual first charachter ' of a Match in the Document. ' Also there are other things (unknwon) that lead to the same issue, eg. ' Match.FirstIndex can not be used to find the actual "matching word" within the ' document. Because of that below commented apporach does not work on a generic document ' Set range = ActiveDocument.range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value)) ' page = range.Information(wdActiveEndAdjustedPageNumber) ' Maybe there is a simpler solution but this works more or less ' the exception beign tables again. see http://support.microsoft.com/kb/274003 ' After a match is found the whole document is searched using the find method. ' For each find result the page number is put into an array (if it is not in the array yet) ' Then the match is formatted properly. ' After formatting, it is checked if the match was previously already found ' ' If not, we add a new entry to the dictionary (key = formatted match, value = array of page numbers) ' ' If match was already found before (but potentially in a different format! eg R-87-1000 vs 87-1000 as example), ' all additional pages are added to the already found pages. Set range = ActiveDocument.Sections(sectionIndex).range With range.Find .text = Match.Value .MatchWholeWord = True .MatchCase = True .Wrap = wdFindStop End With Dim page As Variant Dim pages() As Integer Dim index As Integer index = 0 ReDim pages(0) Do While range.Find.Execute() = True page = range.Information(wdActiveEndAdjustedPageNumber) If Not IsInArray(page, pages) Then ReDim Preserve pages(index) pages(index) = page index = index + 1 End If Loop ' FORMAT TO PROPER R-NUMBER: This is specific to my case rNumber = Match.Value If Not rNumber Like "R-*" Then rNumber = "R-" &amp; rNumber End If ' remove possible batch number as r-number If Len(rNumber) &gt; 11 Then rNumber = Left(rNumber, Len(rNumber) - 2) End If ' END FORMAT If Not Dict.Exists(rNumber) Then Dict.Add rNumber, pages Else Dim existingPages() As Integer existingPages = Dict(rNumber) For Each page In pages If Not IsInArray(page, existingPages) Then ' add additonal pages. this means that the previous match ' was formatted different, eg R-87-1000 vs 87-1000 as example ReDim Preserve existingPages(UBound(existingPages) + 1) existingPages(UBound(existingPages)) = page Dict(rNumber) = existingPages End If Next End If Next 'sort dictionary by key (R-Number) SortDictionary Dict, True Dim fso Set fso = CreateObject("Scripting.FileSystemObject") Dim stream ' Create a TextStream. Set stream = fso.CreateTextFile(ActiveDocument.Path &amp; "\" &amp; ActiveDocument.Name &amp; "-rNumbers.txt", True) Dim key As Variant Dim output As String Dim i As Integer For Each key In Dict.Keys() output = key &amp; vbTab pages = Dict(key) For i = LBound(pages) To UBound(pages) output = output &amp; pages(i) &amp; ", " Next output = Left(output, Len(output) - 2) stream.WriteLine output Next Set Dict = Nothing stream.Close End Sub Private Function IsInArray(page As Variant, pages As Variant) As Boolean Dim i As Integer IsInArray = False For i = LBound(pages) To UBound(pages) If pages(i) = page Then IsInArray = True Exit For End If Next End Function </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload