1111 * with the main goal of pulling out information from those matches, or replacing
1212 * them with something else.
1313 *
14- * There are four classes and three objects, with most of them being members of
15- * Regex companion object. [[scala.util.matching.Regex ]] is the class users instantiate
16- * to do regular expression matching.
14+ * [[scala.util.matching.Regex ]] is the class users instantiate to do regular expression matching.
1715 *
18- * The remaining classes and objects in the package are used in the following way:
19- *
20- * * The companion object to [[scala.util.matching.Regex ]] just contains the other members.
16+ * The companion object to [[scala.util.matching.Regex ]] contains supporting members:
2117 * * [[scala.util.matching.Regex.Match ]] makes more information about a match available.
22- * * [[scala.util.matching.Regex.MatchIterator ]] is used to iterate over multiple matches .
18+ * * [[scala.util.matching.Regex.MatchIterator ]] is used to iterate over matched strings .
2319 * * [[scala.util.matching.Regex.MatchData ]] is just a base trait for the above classes.
2420 * * [[scala.util.matching.Regex.Groups ]] extracts group from a [[scala.util.matching.Regex.Match ]]
2521 * without recomputing the match.
26- * * [[scala.util.matching.Regex.Match ]] converts a [[scala.util.matching.Regex.Match ]]
27- * into a [[java.lang.String ]].
28- *
2922 */
3023package scala .util .matching
3124
@@ -35,6 +28,7 @@ import java.util.regex.{ Pattern, Matcher }
3528/** A regular expression is used to determine whether a string matches a pattern
3629 * and, if it does, to extract or transform the parts that match.
3730 *
31+ * === Usage ===
3832 * This class delegates to the [[java.util.regex ]] package of the Java Platform.
3933 * See the documentation for [[java.util.regex.Pattern ]] for details about
4034 * the regular expression syntax for pattern strings.
@@ -53,6 +47,7 @@ import java.util.regex.{ Pattern, Matcher }
5347 * Since escapes are not processed in multi-line string literals, using triple quotes
5448 * avoids having to escape the backslash character, so that `"\\d"` can be written `"""\d"""`.
5549 *
50+ * === Extraction ===
5651 * To extract the capturing groups when a `Regex` is matched, use it as
5752 * an extractor in a pattern match:
5853 *
@@ -92,48 +87,68 @@ import java.util.regex.{ Pattern, Matcher }
9287 * }
9388 * }}}
9489 *
90+ * === Find Matches ===
9591 * To find or replace matches of the pattern, use the various find and replace methods.
96- * There is a flavor of each method that produces matched strings and
97- * another that produces `Match` objects.
92+ * For each method, there is a version for working with matched strings and
93+ * another for working with `Match` objects.
9894 *
9995 * For example, pattern matching with an unanchored `Regex`, as in the previous example,
100- * is the same as using `findFirstMatchIn`, except that the findFirst methods return an `Option`,
101- * or `None` for no match:
96+ * can also be accomplished using `findFirstMatchIn`. The ` findFirst` methods return an `Option`
97+ * which is non-empty if a match is found, or `None` for no match:
10298 *
10399 * {{{
104100 * val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15"
105- * val firstDate = date findFirstIn dates getOrElse "No date found."
106- * val firstYear = for (m <- date findFirstMatchIn dates) yield m group 1
101+ * val firstDate = date. findFirstIn( dates). getOrElse( "No date found.")
102+ * val firstYear = for (m <- date. findFirstMatchIn( dates)) yield m. group(1)
107103 * }}}
108104 *
109105 * To find all matches:
110106 *
111107 * {{{
112- * val allYears = for (m <- date findAllMatchIn dates) yield m group 1
108+ * val allYears = for (m <- date. findAllMatchIn( dates)) yield m. group(1)
113109 * }}}
114110 *
115- * But `findAllIn` returns a special iterator of strings that can be queried for the `MatchData`
116- * of the last match:
111+ * To iterate over the matched strings, use `findAllIn`, which returns a special iterator
112+ * that can be queried for the `MatchData` of the last match:
117113 *
118114 * {{{
119- * val mi = date findAllIn dates
120- * val oldies = mi filter (_ => (mi group 1).toInt < 1960) map (s => s"$s: An oldie but goodie.")
115+ * val mi = date.findAllIn(dates)
116+ * while (mi.hasNext) {
117+ * val d = mi.next
118+ * if (mi.group(1).toInt < 1960) println(s"$d: An oldie but goodie.")
121119 * }}}
122120 *
123121 * Note that `findAllIn` finds matches that don't overlap. (See [[findAllIn ]] for more examples.)
124122 *
125123 * {{{
126124 * val num = """(\d+)""".r
127- * val all = (num findAllIn "123").toList // List("123"), not List("123", "23", "3")
125+ * val all = num.findAllIn("123").toList // List("123"), not List("123", "23", "3")
126+ * }}}
127+ *
128+ * Also, the "current match" of a `MatchIterator` may be advanced by either `hasNext` or `next`.
129+ * By comparison, the `Iterator[Match]` returned by `findAllMatchIn` or `findAllIn.matchData`
130+ * produces `Match` objects that remain valid after the iterator is advanced.
131+ *
132+ * {{{
133+ * val ns = num.findAllIn("1 2 3")
134+ * ns.start // 0
135+ * ns.hasNext // true
136+ * ns.start // 2
137+ * val ms = num.findAllMatchIn("1 2 3")
138+ * val m = ms.next()
139+ * m.start // 0
140+ * ms.hasNext // true
141+ * m.start // still 0
128142 * }}}
129143 *
144+ * === Replace Text ===
130145 * Text replacement can be performed unconditionally or as a function of the current match:
131146 *
132147 * {{{
133- * val redacted = date replaceAllIn (dates, "XXXX-XX-XX")
134- * val yearsOnly = date replaceAllIn (dates, m => m group 1 )
135- * val months = (0 to 11) map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
136- * val reformatted = date replaceAllIn (dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
148+ * val redacted = date. replaceAllIn(dates, "XXXX-XX-XX")
149+ * val yearsOnly = date. replaceAllIn(dates, m => m. group(1) )
150+ * val months = (0 to 11). map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
151+ * val reformatted = date. replaceAllIn(dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
137152 * }}}
138153 *
139154 * Pattern matching the `Match` against the `Regex` that created it does not reapply the `Regex`.
@@ -142,7 +157,7 @@ import java.util.regex.{ Pattern, Matcher }
142157 *
143158 * {{{
144159 * val docSpree = """2011(?:-\d{2}){2}""".r
145- * val docView = date replaceAllIn (dates, _ match {
160+ * val docView = date. replaceAllIn(dates, _ match {
146161 * case docSpree() => "Historic doc spree!"
147162 * case _ => "Something else happened"
148163 * })
@@ -338,22 +353,22 @@ class Regex private[matching](val pattern: Pattern, groupNames: String*) extends
338353 * {{{
339354 * val hat = "hat[^a]+".r
340355 * val hathaway = "hathatthattthatttt"
341- * val hats = ( hat findAllIn hathaway).toList // List(hath, hattth)
342- * val pos = ( hat findAllMatchIn hathaway map (_.start)) .toList // List(0, 7)
356+ * val hats = hat. findAllIn( hathaway).toList // List(hath, hattth)
357+ * val pos = hat. findAllMatchIn( hathaway). map(_.start).toList // List(0, 7)
343358 * }}}
344359 *
345360 * To return overlapping matches, it is possible to formulate a regular expression
346361 * with lookahead (`?=`) that does not consume the overlapping region.
347362 *
348363 * {{{
349364 * val madhatter = "(h)(?=(at[^a]+))".r
350- * val madhats = ( madhatter findAllMatchIn hathaway map {
365+ * val madhats = madhatter. findAllMatchIn( hathaway). map {
351366 * case madhatter(x,y) => s"$x$y"
352- * }) .toList // List(hath, hatth, hattth, hatttt)
367+ * }.toList // List(hath, hatth, hattth, hatttt)
353368 * }}}
354369 *
355- * Attempting to retrieve match information before performing the first match
356- * or after exhausting the iterator results in [[java.lang.IllegalStateException ]].
370+ * Attempting to retrieve match information after exhausting the iterator
371+ * results in [[java.lang.IllegalStateException ]].
357372 * See [[scala.util.matching.Regex.MatchIterator ]] for details.
358373 *
359374 * @param source The text to match against.
@@ -743,49 +758,76 @@ object Regex {
743758
744759 /** A class to step through a sequence of regex matches.
745760 *
746- * All methods inherited from [[scala.util.matching.Regex.MatchData ]] will throw
747- * a [[java.lang.IllegalStateException ]] until the matcher is initialized. The
748- * matcher can be initialized by calling `hasNext` or `next()` or causing these
749- * methods to be called, such as by invoking `toString` or iterating through
750- * the iterator's elements.
761+ * This is an iterator that returns the matched strings.
762+ *
763+ * Queries about match data pertain to the current state of the underlying
764+ * matcher, which is advanced by calling `hasNext` or `next`.
765+ *
766+ * When matches are exhausted, queries about match data will throw
767+ * [[java.lang.IllegalStateException ]].
751768 *
752769 * @see [[java.util.regex.Matcher ]]
753770 */
754771 class MatchIterator (val source : CharSequence , val regex : Regex , val groupNames : Seq [String ])
755772 extends AbstractIterator [String ] with Iterator [String ] with MatchData { self =>
756773
757774 protected [Regex ] val matcher = regex.pattern.matcher(source)
758- private var nextSeen = false
759775
760- /** Is there another match? */
776+ // 0 = not yet matched, 1 = matched, 2 = advanced to match, 3 = no more matches
777+ private [this ] var nextSeen = 0
778+
779+ /** Return true if `next` will find a match.
780+ * As a side effect, advance the underlying matcher if necessary;
781+ * queries about the current match data pertain to the underlying matcher.
782+ */
761783 def hasNext : Boolean = {
762- if (! nextSeen) nextSeen = matcher.find()
763- nextSeen
784+ nextSeen match {
785+ case 0 => nextSeen = if (matcher.find()) 1 else 3
786+ case 1 => ()
787+ case 2 => nextSeen = 0 ; hasNext
788+ case 3 => ()
789+ }
790+ nextSeen == 1 // otherwise, 3
764791 }
765792
766- /** The next matched substring of `source`. */
793+ /** The next matched substring of `source`.
794+ * As a side effect, advance the underlying matcher if necessary.
795+ */
767796 def next (): String = {
768- if (! hasNext) throw new NoSuchElementException
769- nextSeen = false
797+ nextSeen match {
798+ case 0 => if (! hasNext) throw new NoSuchElementException ; next()
799+ case 1 => nextSeen = 2
800+ case 2 => nextSeen = 0 ; next()
801+ case 3 => throw new NoSuchElementException
802+ }
770803 matcher.group
771804 }
772805
806+ /** Report emptiness. */
773807 override def toString = super [AbstractIterator ].toString
774808
809+ // ensure we're at a match
810+ private [this ] def ensure (): Unit = nextSeen match {
811+ case 0 => if (! hasNext) throw new IllegalStateException
812+ case 1 => ()
813+ case 2 => ()
814+ case 3 => throw new IllegalStateException
815+ }
816+
775817 /** The index of the first matched character. */
776- def start : Int = matcher.start
818+ def start : Int = { ensure() ; matcher.start }
777819
778820 /** The index of the first matched character in group `i`. */
779- def start (i : Int ): Int = matcher.start(i)
821+ def start (i : Int ): Int = { ensure() ; matcher.start(i) }
780822
781823 /** The index of the last matched character. */
782- def end : Int = matcher.end
824+ def end : Int = { ensure() ; matcher.end }
783825
784826 /** The index following the last matched character in group `i`. */
785- def end (i : Int ): Int = matcher.end(i)
827+ def end (i : Int ): Int = { ensure() ; matcher.end(i) }
786828
787829 /** The number of subgroups. */
788- def groupCount = matcher.groupCount
830+ def groupCount = { ensure() ; matcher.groupCount }
789831
790832 /** Convert to an iterator that yields MatchData elements instead of Strings. */
791833 def matchData : Iterator [Match ] = new AbstractIterator [Match ] {
0 commit comments