Vim Find Character Continue Past Newline
Regular Expressions
This chapter will discuss regular expressions (regexp) and related features in detail. As discussed in earlier chapters:
-
/searchpatternsearch the given pattern in the forward direction -
?searchpatternsearch the given pattern in the backward direction -
:range s/searchpattern/replacestring/flagssearch and replace-
:sis short for:substitutecommand - the delimiter after
replacestringis optional if you are not using flags
-
Documentation links:
- :h usr_27.txt — search commands and patterns
- :h pattern-searches — reference manual for Patterns and search commands
- :h :substitute — reference manual for
:substitutecommand
Recall that you need to add
/prefix for built-in help on regular expressions, :h /^ for example.
Flags
-
greplace all occurrences within a matching line- by default, only the first matching portion will be replaced
-
cask for confirmation before each replacement -
iignore case forsearchpattern -
Idon't ignore case forsearchpattern
These flags are applicable for the substitute command but not / or ? searches. Flags can also be combined, for example:
-
s/cat/Dog/gireplace every occurrence ofcatwithDog- Case is ignored, so
Cat,cAt,CAT, etc will also be replaced - Note that
idoesn't affect the case of the replacement string
- Case is ignored, so
See :h s_flags for a complete list of flags and more details about them.
Anchors
By default, regexp will match anywhere in the text. You can use line and word anchors to specify additional restrictions regarding the position of matches. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a \ (discussed in Escaping metacharacters section later in this chapter).
-
^restricts the match to the start-of-line-
^ThismatchesThis is a samplebut notDo This
-
-
$restricts the match to the end-of-line-
)$matchesapple (5)but notdef greeting():
-
-
^$match empty line -
\<patternrestricts the match to the start of a word- word characters include alphabets, digits and underscore
-
\<hismatcheshisorto-hisorhistorybut notthisor_hist
-
pattern\>restricts the match to the end of a word-
his\>matcheshisorto-hisorthisbut nothistoryor_hist
-
-
\<pattern\>restricts the match between start of a word and end of a word-
\<his\>matcheshisorto-hisbut notthisorhistoryor_hist
-
End-of-line can be
\r(carriage return),\n(newline) or\r\ndepending on your system andfileformatsetting.
See :h pattern-atoms for more details.
-
.match any single character other than end-of-line-
c.tmatchescatorcotorc2torc^torc.torc;tbut notcantoractorsit
-
-
\_.match any single character, including end-of-line
As seen above, matching end-of-line character requires special attention. Which is why examples and descriptions in this chapter will assume you are operating line wise unless otherwise mentioned. You'll later see how
\_is used in many more places to include end-of-line in the matches.
Greedy Quantifiers
Quantifiers can be applied to literal characters, dot metacharacter, groups, backreferences and character classes. Basic examples are shown below, more will be discussed in the sections to follow.
-
*match zero or more times-
abc*matchesaborabcorabcccorabccccccbut notbc -
Error.*validmatchesError: invalid inputbut notvalid Error -
s/a.*b/X/replacestable bottle buswithtXussincea.*bmatches from the firstato the lastb
-
-
\+match one or more times-
abc\+matchesabcorabcccbut notaborbc
-
-
\?match zero or one times-
\=can also be used, helpful if you are searching backwards with the?command -
abc\?matchesaborabc. This will matchabcccorabccccccas well, but only theabcportion -
s/abc\?/X/replacesabccwithXc
-
-
\{m,n}matchmtontimes (inclusive)-
ab\{1,4}cmatchesabcorabbcorxabbbczbut notacorabbbbbc - if you are familiar with BRE, you can also use
\{m,n\}(ending brace is escaped)
-
-
\{m,}match at leastmtimes-
ab\{3,}cmatchesxabbbczorabbbbbcbut notacorabcorabbc
-
-
\{,n}match up tontimes (including0times)-
ab\{,2}cmatchesabcoracorabbcbut notxabbbczorabbbbbc
-
-
\{n}match exactlyntimes-
ab\{3}cmatchesxabbbczbut notabbcorabbbbbc
-
Greedy quantifiers will consume as much as possible, provided the overall pattern is also matched. That's how the Error.*valid example worked. If .* had consumed everything after Error, there wouldn't be any more characters to try to match valid. How the regexp engine handles matching varying amount of characters depends on the implementation details (backtracking, NFA, etc).
See :h pattern-overview for more details.
If you are familiar with other regular expression flavors like Perl, Python, etc, you'd be surprised by the use of
\in the above examples. If you use\vvery magic modifier (discussed later in this chapter), the\won't be needed.
Non-greedy Quantifiers
Non-greedy quantifiers match as minimally as possible, provided the overall pattern is also matched.
-
\{-}match zero or more times as minimally as possible-
s/t.\{-}a/X/greplacesthat is quite a fabricated talewithXX fabricaXle- the matching portions are
tha,t is quite aandted ta
- the matching portions are
-
s/t.*a/X/greplacesthat is quite a fabricated talewithXlesince*is greedy
-
-
\{-m,n}matchmtontimes as minimally as possible-
morncan be left out as seen in the Greedy Quantifiers section -
s/.\{-2,5}/X/replaces123456789withX3456789(here.matched 2 times) -
s/.\{-2,5}6/X/replaces123456789withX789(here.matched 5 times to satisfy overall pattern)
-
See :h pattern-overview and stackoverflow: non-greedy matching for more details.
Character Classes
To create a custom placeholder for a limited set of characters, you can enclose them inside [] metacharacters. Character classes have their own versions of metacharacters and provide special predefined sets for common use cases.
-
[aeiou]match any lowercase vowel character -
[^aeiou]match any character other than lowercase vowels -
[a-d]match any ofaorborcord- the range metacharacter
-can be applied between any two characters
- the range metacharacter
-
\amatch any alphabet character[a-zA-Z] -
\Amatch other than alphabets[^a-zA-Z] -
\lmatch lowercase alphabets[a-z] -
\Lmatch other than lowercase alphabets[^a-z] -
\umatch uppercase alphabets[A-Z] -
\Umatch other than uppercase alphabets[^A-Z] -
\dmatch any digit character[0-9] -
\Dmatch other than digits[^0-9] -
\omatch any octal character[0-7] -
\Omatch other than octals[^0-7] -
\xmatch any hexadecimal character[0-9a-fA-F] -
\Xmatch other than hexadecimals[^0-9a-fA-F] -
\hmatch alphabets and underscore[a-zA-Z_] -
\Hmatch other than alphabets and underscore[^a-zA-Z_] -
\wmatch any word character (alphabets, digits, underscore)[a-zA-Z0-9_]- this definition is same as seen earlier with word boundaries
-
\Wmatch other than word characters[^a-zA-Z0-9_] -
\smatch space and tab characters[ \t] -
\Smatch other than space and tab characters[^ \t]
Here are some examples with character classes:
-
c[ou]tmatchescotorcut -
\<[ot][on]\>matchesoooronortoortnas whole words only -
^[on]\{2,}$matchesnoornonornoonoronetc as whole lines only -
s/"[^"]\+"/X/greplaces"mango" and "(guava)"withX and X -
s/\d\+/-/greplacesSample123string777numberswithSample-string-numbers -
s/\<0*[1-9]\d\{2,}\>/X/greplaces0501 035 26 98234withX 035 26 X(matches numbers >=100 with optional leading zeros) -
s/\W\+/ /greplacesload2;err_msg--\antwithload2 err_msg ant
To include the end-of-line character, use
\_instead of\for any of the above escape sequences. For example,\_swill help you match across lines. Similarly, use\_[]for bracketed classes.
![]()
The above escape sequences do not have special meaning within bracketed classes. For example,
[\d\s]will only match\ordors. You can use named character sets in such scenarios. For example,[[:digit:][:blank:]]to match digits or space or tab characters. See :h :alnum: for full list and more details.
The predefined sets are also better in terms of performance compared to bracketed versions. And there are more such sets than the ones discussed above. See :h character-classes for more details.
Alternation and Grouping
Alternation helps you to match multiple terms and they can have their own anchors as well (since each alternative is a regexp pattern). Often, there are some common things among the regular expression alternatives. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions.
-
\|match either of the specified patterns-
min\|maxmatchesminormax -
one\|two\|threematchesoneortwoorthree -
\<par\>\|er$matches whole wordparor a line ending wither
-
-
\(pattern\)group a pattern to apply quantifiers, create a terser regexp by taking out common elements, etc-
a\(123\|456\)bis equivalent toa123b\|a456b -
hand\(y\|ful\)matcheshandyorhandful -
hand\(y\|ful\)\?matcheshandorhandyorhandful -
\(to\)\+matchestoortotoortototoand so on -
re\(leas\|ceiv\)\?edmatchesreedorreleasedorreceived
-
There's some tricky situations when using alternation. Say, you want to match are or spared — which one should get precedence? The bigger word spared or the substring are inside it or based on something else? The alternative which matches earliest in the input gets precedence, irrespective of the order of the alternatives.
-
s/are\|spared/X/greplacesrare spared areawithrX X Xa-
s/spared\|are/X/gwill also give the same results
-
In case of matches starting from the same location, for example spa and spared, the leftmost alternative gets precedence. Sort by longest term first if don't want shorter terms to take precedence.
-
s/spa\|spared/**/greplacesspared sparewith**red **re -
s/spared\|spa/**/greplacesspared sparewith** **re
Backreference
The groupings seen in the previous section are also known as capture groups. The string captured by these groups can be referred later using backreference \N where N is the capture group you want. Backreferences can be used in both search and replacement sections.
-
\(pattern\)capture group for later use via backreferences -
\%(pattern\)non-capturing group - leftmost group is
1, second leftmost group is2and so on (maximum9groups) -
\1backreference to the first capture group -
\2backreference to the second capture group -
\9backreference to the ninth capture group -
&or\0backreference to the entire matched portion
Here are some examples:
-
\(\a\)\1matches two consecutive repeated alphabets likeee,TT,ppand so on- recall that
\arefers to[a-zA-Z]
- recall that
-
\(\a\)\1\+matches two or more consecutive repeated alphabets likeee,ttttt,PPPPPPPPand so on -
s/\d\+/(&)/greplaces52 apples 31 mangoeswith(52) apples (31) mangoes(surround digits with parentheses) -
s/\(\w\+\),\(\w\+\)/\2,\1/greplacesgood,bad 42,24withbad,good 24,42(swap words separated by comma) -
s/\(_\)\?_/\1/greplaces_foo_ __123__ _baz_withfoo _123_ baz(matches one or two underscores, deletes one underscore) -
s/\(\d\+\)\%(abc\)\+\(\d\+\)/\2:\1/replaces12abcabcabc24with24:12(matches digits separated by one or moreabcsequences, swaps the numbers with:as the separator)- note the use of non-capturing group for
abcsince it isn't needed later -
s/\(\d\+\)\(abc\)\+\(\d\+\)/\3:\1/does the same if only capturing groups are used
- note the use of non-capturing group for
Referring to text matched by a capture group with a quantifier will give only the last match, not entire match. Use a capture group around the grouping and quantifier together to get the entire matching portion. In such cases, the inner grouping is an ideal candidate to use non-capturing group.
-
s/a \(\d\{3}\)\+/b (\1)/replacesa 123456789withb (789)-
a 4839235will be replaced withb (923)5
-
-
s/a \(\%(\d\{3}\)\+\)/b (\1)/replacesa 123456789withb (123456789)-
a 4839235will be replaced withb (483923)5
-
Lookarounds
Lookarounds help to create custom anchors and add conditions within the searchpattern. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions.
Vim's syntax is different than those usually found in programming languages like Perl, Python and JavaScript. The syntax starting with
\@is always added as a suffix to the pattern atom used in the assertion. For example,(?!\d)and(?<=pat.*)in other languages are specified as\d\@!and\(pat.*\)\@<=respectively in Vim.
-
\@!negative lookahead assertion-
ice\d\@!matchesiceas long as it is not immediately followed by a digit character, for exampleiceoriced!oricet5orice.123but notice42orice123 -
s/ice\d\@!/X/greplacesiceiceice2withXXice2 -
s/par\(.*\<par\>\)\@!/X/greplacesparwithXas long as whole wordparis not present later in the line, for exampleparse and par and sparseis converted toparse and X and sXse -
at\(\(go\)\@!.\)*parmatchescat,dog,parrotbut notcat,god,parrot(i.e. matchatfollowed byparas long asgoisn't present in between, this is an example of negating a grouping)
-
-
\@<!negative lookbehind assertion-
_\@<!icematchesiceas long as it is not immediately preceded by a_character, for exampleiceor_(ice)or42icebut not_ice -
\(cat.*\)\@<!dogmatchesdogas long ascatis not present earlier in the line, for examplefox,parrot,dog,catbut notfox,cat,dog,parrot
-
-
\@=positive lookahead assertion-
ice\d\@=matchesiceas long as it is immediately followed by a digit character, for exampleice42orice123but noticeoriced!oricet5orice.123 -
s/ice\d\@=/X/greplacesice ice_2 ice2 icedwithice ice_2 X2 iced
-
-
\@<=positive lookbehind assertion-
_\@<=icematchesiceas long as it is immediately preceded by a_character, for example_iceor(_ice)but noticeor_(ice)or42ice
-
![]()
![]()
You can also specify number of bytes to search for lookbehind patterns. This will significantly speed up the matching process. You have to specify the number between
@and<characters. For example,_\@1<=icewill lookback only one byte beforeicefor matching purposes.\(cat.*\)\@10<!dogwill lookback only ten bytes beforedogto check the given assertion.
Atomic Grouping
As discussed earlier, both greedy and non-greedy quantifiers will try to satisfy the overall pattern by varying the amount of characters matched by the quantifiers. You can use atomic grouping if you do not want a specific sub-pattern to ever give back characters it has already matched. Similar to lookarounds, you need to use \@> as a suffix, for example \(pattern\)\@>.
-
s/\(0*\)\@>\d\{3,\}/(&)/greplaces only numbers >= 100 irrespective of any number of leading zeros, for example0501 035 154is converted to(0501) 035 (154)-
\(0*\)\@>matches the0character zero or more times, but it will not give up this portion to satisfy overall pattern -
s/0*\d\{3,\}/(&)/greplaces0501 035 154with(0501) (035) (154)(here035is matched because0*will match zero times to satisfy the overall pattern)
-
Some regexp engines provide this feature as possessive quantifiers.
Set start and end of the match
Some of the positive lookbehind and lookahead usage can be replaced with \zs and \ze respectively.
-
\zsset the start of the match (portion before\zswon't be part of the match)-
s/\<\w\zs\w*\W*//greplacessea eat car rat eel teawithsecret - same as
s/\(\<\w\)\@<=\w*\W*//gors/\(\<\w\)\w*\W*/\1/g
-
-
\zeset the end of the match (portion after\zewon't be part of the match)-
s/ice\ze\d/X/greplacesice ice_2 ice2 icedwithice ice_2 X2 iced - same as
s/ice\d\@=/X/gors/ice\(\d\)/X\1/g
-
As per :h \zs and :h \ze, these "Can be used multiple times, the last one encountered in a matching branch is used."
Magic modifiers
These escape sequences change certain aspects of the syntax and behavior of the search pattern that comes after such a modifier. You can use multiple such modifiers as needed for particular sections of the pattern.
Magic and nomagic
-
\mmagic mode (this is the default setting) -
\Mnomagic mode-
.,*and~are no longer metacharacters (compared to magic mode) -
\.,\*and\~will make them to behave as metacharacters -
^and$would still behave as metacharacters -
\Ma.bmatches onlya.b -
\Ma\.bmatchesa.bas well asa=bora<boracdetc
-
Very magic
The default syntax of Vim regexp has only a few metacharacters like ., *, ^ and so on. If you are familiar with regexp usage in programming languages such as Perl, Python and JavaScript, you can use \v to get a similar syntax in Vim. This will allow the use of more metacharacters such as (), {}, +, ? and so on without having to prefix them with a \ metacharacter. From :h magic documentation:
Use of
\vmeans that after it, all ASCII characters except0-9,a-z,A-Zand_have special meaning
-
\v<his>matcheshisorto-hisbut notthisorhistoryor_hist -
a<b.*\v<end>matchesc=a<b
0 Response to "Vim Find Character Continue Past Newline"
Post a Comment