comparison venv/lib/python2.7/site-packages/docutils/parsers/rst/states.py @ 0:d67268158946 draft

planemo upload commit a3f181f5f126803c654b3a66dd4e83a48f7e203b
author bcclaywell
date Mon, 12 Oct 2015 17:43:33 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:d67268158946
1 # $Id: states.py 7640 2013-03-25 20:57:52Z milde $
2 # Author: David Goodger <goodger@python.org>
3 # Copyright: This module has been placed in the public domain.
4
5 """
6 This is the ``docutils.parsers.rst.states`` module, the core of
7 the reStructuredText parser. It defines the following:
8
9 :Classes:
10 - `RSTStateMachine`: reStructuredText parser's entry point.
11 - `NestedStateMachine`: recursive StateMachine.
12 - `RSTState`: reStructuredText State superclass.
13 - `Inliner`: For parsing inline markup.
14 - `Body`: Generic classifier of the first line of a block.
15 - `SpecializedBody`: Superclass for compound element members.
16 - `BulletList`: Second and subsequent bullet_list list_items
17 - `DefinitionList`: Second+ definition_list_items.
18 - `EnumeratedList`: Second+ enumerated_list list_items.
19 - `FieldList`: Second+ fields.
20 - `OptionList`: Second+ option_list_items.
21 - `RFC2822List`: Second+ RFC2822-style fields.
22 - `ExtensionOptions`: Parses directive option fields.
23 - `Explicit`: Second+ explicit markup constructs.
24 - `SubstitutionDef`: For embedded directives in substitution definitions.
25 - `Text`: Classifier of second line of a text block.
26 - `SpecializedText`: Superclass for continuation lines of Text-variants.
27 - `Definition`: Second line of potential definition_list_item.
28 - `Line`: Second line of overlined section title or transition marker.
29 - `Struct`: An auxiliary collection class.
30
31 :Exception classes:
32 - `MarkupError`
33 - `ParserError`
34 - `MarkupMismatch`
35
36 :Functions:
37 - `escape2null()`: Return a string, escape-backslashes converted to nulls.
38 - `unescape()`: Return a string, nulls removed or restored to backslashes.
39
40 :Attributes:
41 - `state_classes`: set of State classes used with `RSTStateMachine`.
42
43 Parser Overview
44 ===============
45
46 The reStructuredText parser is implemented as a recursive state machine,
47 examining its input one line at a time. To understand how the parser works,
48 please first become familiar with the `docutils.statemachine` module. In the
49 description below, references are made to classes defined in this module;
50 please see the individual classes for details.
51
52 Parsing proceeds as follows:
53
54 1. The state machine examines each line of input, checking each of the
55 transition patterns of the state `Body`, in order, looking for a match.
56 The implicit transitions (blank lines and indentation) are checked before
57 any others. The 'text' transition is a catch-all (matches anything).
58
59 2. The method associated with the matched transition pattern is called.
60
61 A. Some transition methods are self-contained, appending elements to the
62 document tree (`Body.doctest` parses a doctest block). The parser's
63 current line index is advanced to the end of the element, and parsing
64 continues with step 1.
65
66 B. Other transition methods trigger the creation of a nested state machine,
67 whose job is to parse a compound construct ('indent' does a block quote,
68 'bullet' does a bullet list, 'overline' does a section [first checking
69 for a valid section header], etc.).
70
71 - In the case of lists and explicit markup, a one-off state machine is
72 created and run to parse contents of the first item.
73
74 - A new state machine is created and its initial state is set to the
75 appropriate specialized state (`BulletList` in the case of the
76 'bullet' transition; see `SpecializedBody` for more detail). This
77 state machine is run to parse the compound element (or series of
78 explicit markup elements), and returns as soon as a non-member element
79 is encountered. For example, the `BulletList` state machine ends as
80 soon as it encounters an element which is not a list item of that
81 bullet list. The optional omission of inter-element blank lines is
82 enabled by this nested state machine.
83
84 - The current line index is advanced to the end of the elements parsed,
85 and parsing continues with step 1.
86
87 C. The result of the 'text' transition depends on the next line of text.
88 The current state is changed to `Text`, under which the second line is
89 examined. If the second line is:
90
91 - Indented: The element is a definition list item, and parsing proceeds
92 similarly to step 2.B, using the `DefinitionList` state.
93
94 - A line of uniform punctuation characters: The element is a section
95 header; again, parsing proceeds as in step 2.B, and `Body` is still
96 used.
97
98 - Anything else: The element is a paragraph, which is examined for
99 inline markup and appended to the parent element. Processing
100 continues with step 1.
101 """
102
103 __docformat__ = 'reStructuredText'
104
105
106 import sys
107 import re
108 from types import FunctionType, MethodType
109
110 from docutils import nodes, statemachine, utils
111 from docutils import ApplicationError, DataError
112 from docutils.statemachine import StateMachineWS, StateWS
113 from docutils.nodes import fully_normalize_name as normalize_name
114 from docutils.nodes import whitespace_normalize_name
115 import docutils.parsers.rst
116 from docutils.parsers.rst import directives, languages, tableparser, roles
117 from docutils.parsers.rst.languages import en as _fallback_language_module
118 from docutils.utils import escape2null, unescape, column_width
119 from docutils.utils import punctuation_chars, roman, urischemes
120
121 class MarkupError(DataError): pass
122 class UnknownInterpretedRoleError(DataError): pass
123 class InterpretedRoleNotImplementedError(DataError): pass
124 class ParserError(ApplicationError): pass
125 class MarkupMismatch(Exception): pass
126
127
128 class Struct:
129
130 """Stores data attributes for dotted-attribute access."""
131
132 def __init__(self, **keywordargs):
133 self.__dict__.update(keywordargs)
134
135
136 class RSTStateMachine(StateMachineWS):
137
138 """
139 reStructuredText's master StateMachine.
140
141 The entry point to reStructuredText parsing is the `run()` method.
142 """
143
144 def run(self, input_lines, document, input_offset=0, match_titles=True,
145 inliner=None):
146 """
147 Parse `input_lines` and modify the `document` node in place.
148
149 Extend `StateMachineWS.run()`: set up parse-global data and
150 run the StateMachine.
151 """
152 self.language = languages.get_language(
153 document.settings.language_code)
154 self.match_titles = match_titles
155 if inliner is None:
156 inliner = Inliner()
157 inliner.init_customizations(document.settings)
158 self.memo = Struct(document=document,
159 reporter=document.reporter,
160 language=self.language,
161 title_styles=[],
162 section_level=0,
163 section_bubble_up_kludge=False,
164 inliner=inliner)
165 self.document = document
166 self.attach_observer(document.note_source)
167 self.reporter = self.memo.reporter
168 self.node = document
169 results = StateMachineWS.run(self, input_lines, input_offset,
170 input_source=document['source'])
171 assert results == [], 'RSTStateMachine.run() results should be empty!'
172 self.node = self.memo = None # remove unneeded references
173
174
175 class NestedStateMachine(StateMachineWS):
176
177 """
178 StateMachine run from within other StateMachine runs, to parse nested
179 document structures.
180 """
181
182 def run(self, input_lines, input_offset, memo, node, match_titles=True):
183 """
184 Parse `input_lines` and populate a `docutils.nodes.document` instance.
185
186 Extend `StateMachineWS.run()`: set up document-wide data.
187 """
188 self.match_titles = match_titles
189 self.memo = memo
190 self.document = memo.document
191 self.attach_observer(self.document.note_source)
192 self.reporter = memo.reporter
193 self.language = memo.language
194 self.node = node
195 results = StateMachineWS.run(self, input_lines, input_offset)
196 assert results == [], ('NestedStateMachine.run() results should be '
197 'empty!')
198 return results
199
200
201 class RSTState(StateWS):
202
203 """
204 reStructuredText State superclass.
205
206 Contains methods used by all State subclasses.
207 """
208
209 nested_sm = NestedStateMachine
210 nested_sm_cache = []
211
212 def __init__(self, state_machine, debug=False):
213 self.nested_sm_kwargs = {'state_classes': state_classes,
214 'initial_state': 'Body'}
215 StateWS.__init__(self, state_machine, debug)
216
217 def runtime_init(self):
218 StateWS.runtime_init(self)
219 memo = self.state_machine.memo
220 self.memo = memo
221 self.reporter = memo.reporter
222 self.inliner = memo.inliner
223 self.document = memo.document
224 self.parent = self.state_machine.node
225 # enable the reporter to determine source and source-line
226 if not hasattr(self.reporter, 'get_source_and_line'):
227 self.reporter.get_source_and_line = self.state_machine.get_source_and_line
228 # print "adding get_source_and_line to reporter", self.state_machine.input_offset
229
230
231 def goto_line(self, abs_line_offset):
232 """
233 Jump to input line `abs_line_offset`, ignoring jumps past the end.
234 """
235 try:
236 self.state_machine.goto_line(abs_line_offset)
237 except EOFError:
238 pass
239
240 def no_match(self, context, transitions):
241 """
242 Override `StateWS.no_match` to generate a system message.
243
244 This code should never be run.
245 """
246 self.reporter.severe(
247 'Internal error: no transition pattern match. State: "%s"; '
248 'transitions: %s; context: %s; current line: %r.'
249 % (self.__class__.__name__, transitions, context,
250 self.state_machine.line))
251 return context, None, []
252
253 def bof(self, context):
254 """Called at beginning of file."""
255 return [], []
256
257 def nested_parse(self, block, input_offset, node, match_titles=False,
258 state_machine_class=None, state_machine_kwargs=None):
259 """
260 Create a new StateMachine rooted at `node` and run it over the input
261 `block`.
262 """
263 use_default = 0
264 if state_machine_class is None:
265 state_machine_class = self.nested_sm
266 use_default += 1
267 if state_machine_kwargs is None:
268 state_machine_kwargs = self.nested_sm_kwargs
269 use_default += 1
270 block_length = len(block)
271
272 state_machine = None
273 if use_default == 2:
274 try:
275 state_machine = self.nested_sm_cache.pop()
276 except IndexError:
277 pass
278 if not state_machine:
279 state_machine = state_machine_class(debug=self.debug,
280 **state_machine_kwargs)
281 state_machine.run(block, input_offset, memo=self.memo,
282 node=node, match_titles=match_titles)
283 if use_default == 2:
284 self.nested_sm_cache.append(state_machine)
285 else:
286 state_machine.unlink()
287 new_offset = state_machine.abs_line_offset()
288 # No `block.parent` implies disconnected -- lines aren't in sync:
289 if block.parent and (len(block) - block_length) != 0:
290 # Adjustment for block if modified in nested parse:
291 self.state_machine.next_line(len(block) - block_length)
292 return new_offset
293
294 def nested_list_parse(self, block, input_offset, node, initial_state,
295 blank_finish,
296 blank_finish_state=None,
297 extra_settings={},
298 match_titles=False,
299 state_machine_class=None,
300 state_machine_kwargs=None):
301 """
302 Create a new StateMachine rooted at `node` and run it over the input
303 `block`. Also keep track of optional intermediate blank lines and the
304 required final one.
305 """
306 if state_machine_class is None:
307 state_machine_class = self.nested_sm
308 if state_machine_kwargs is None:
309 state_machine_kwargs = self.nested_sm_kwargs.copy()
310 state_machine_kwargs['initial_state'] = initial_state
311 state_machine = state_machine_class(debug=self.debug,
312 **state_machine_kwargs)
313 if blank_finish_state is None:
314 blank_finish_state = initial_state
315 state_machine.states[blank_finish_state].blank_finish = blank_finish
316 for key, value in extra_settings.items():
317 setattr(state_machine.states[initial_state], key, value)
318 state_machine.run(block, input_offset, memo=self.memo,
319 node=node, match_titles=match_titles)
320 blank_finish = state_machine.states[blank_finish_state].blank_finish
321 state_machine.unlink()
322 return state_machine.abs_line_offset(), blank_finish
323
324 def section(self, title, source, style, lineno, messages):
325 """Check for a valid subsection and create one if it checks out."""
326 if self.check_subsection(source, style, lineno):
327 self.new_subsection(title, lineno, messages)
328
329 def check_subsection(self, source, style, lineno):
330 """
331 Check for a valid subsection header. Return 1 (true) or None (false).
332
333 When a new section is reached that isn't a subsection of the current
334 section, back up the line count (use ``previous_line(-x)``), then
335 ``raise EOFError``. The current StateMachine will finish, then the
336 calling StateMachine can re-examine the title. This will work its way
337 back up the calling chain until the correct section level isreached.
338
339 @@@ Alternative: Evaluate the title, store the title info & level, and
340 back up the chain until that level is reached. Store in memo? Or
341 return in results?
342
343 :Exception: `EOFError` when a sibling or supersection encountered.
344 """
345 memo = self.memo
346 title_styles = memo.title_styles
347 mylevel = memo.section_level
348 try: # check for existing title style
349 level = title_styles.index(style) + 1
350 except ValueError: # new title style
351 if len(title_styles) == memo.section_level: # new subsection
352 title_styles.append(style)
353 return 1
354 else: # not at lowest level
355 self.parent += self.title_inconsistent(source, lineno)
356 return None
357 if level <= mylevel: # sibling or supersection
358 memo.section_level = level # bubble up to parent section
359 if len(style) == 2:
360 memo.section_bubble_up_kludge = True
361 # back up 2 lines for underline title, 3 for overline title
362 self.state_machine.previous_line(len(style) + 1)
363 raise EOFError # let parent section re-evaluate
364 if level == mylevel + 1: # immediate subsection
365 return 1
366 else: # invalid subsection
367 self.parent += self.title_inconsistent(source, lineno)
368 return None
369
370 def title_inconsistent(self, sourcetext, lineno):
371 error = self.reporter.severe(
372 'Title level inconsistent:', nodes.literal_block('', sourcetext),
373 line=lineno)
374 return error
375
376 def new_subsection(self, title, lineno, messages):
377 """Append new subsection to document tree. On return, check level."""
378 memo = self.memo
379 mylevel = memo.section_level
380 memo.section_level += 1
381 section_node = nodes.section()
382 self.parent += section_node
383 textnodes, title_messages = self.inline_text(title, lineno)
384 titlenode = nodes.title(title, '', *textnodes)
385 name = normalize_name(titlenode.astext())
386 section_node['names'].append(name)
387 section_node += titlenode
388 section_node += messages
389 section_node += title_messages
390 self.document.note_implicit_target(section_node, section_node)
391 offset = self.state_machine.line_offset + 1
392 absoffset = self.state_machine.abs_line_offset() + 1
393 newabsoffset = self.nested_parse(
394 self.state_machine.input_lines[offset:], input_offset=absoffset,
395 node=section_node, match_titles=True)
396 self.goto_line(newabsoffset)
397 if memo.section_level <= mylevel: # can't handle next section?
398 raise EOFError # bubble up to supersection
399 # reset section_level; next pass will detect it properly
400 memo.section_level = mylevel
401
402 def paragraph(self, lines, lineno):
403 """
404 Return a list (paragraph & messages) & a boolean: literal_block next?
405 """
406 data = '\n'.join(lines).rstrip()
407 if re.search(r'(?<!\\)(\\\\)*::$', data):
408 if len(data) == 2:
409 return [], 1
410 elif data[-3] in ' \n':
411 text = data[:-3].rstrip()
412 else:
413 text = data[:-1]
414 literalnext = 1
415 else:
416 text = data
417 literalnext = 0
418 textnodes, messages = self.inline_text(text, lineno)
419 p = nodes.paragraph(data, '', *textnodes)
420 p.source, p.line = self.state_machine.get_source_and_line(lineno)
421 return [p] + messages, literalnext
422
423 def inline_text(self, text, lineno):
424 """
425 Return 2 lists: nodes (text and inline elements), and system_messages.
426 """
427 return self.inliner.parse(text, lineno, self.memo, self.parent)
428
429 def unindent_warning(self, node_name):
430 # the actual problem is one line below the current line
431 lineno = self.state_machine.abs_line_number()+1
432 return self.reporter.warning('%s ends without a blank line; '
433 'unexpected unindent.' % node_name,
434 line=lineno)
435
436
437 def build_regexp(definition, compile=True):
438 """
439 Build, compile and return a regular expression based on `definition`.
440
441 :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts),
442 where "parts" is a list of regular expressions and/or regular
443 expression definitions to be joined into an or-group.
444 """
445 name, prefix, suffix, parts = definition
446 part_strings = []
447 for part in parts:
448 if type(part) is tuple:
449 part_strings.append(build_regexp(part, None))
450 else:
451 part_strings.append(part)
452 or_group = '|'.join(part_strings)
453 regexp = '%(prefix)s(?P<%(name)s>%(or_group)s)%(suffix)s' % locals()
454 if compile:
455 return re.compile(regexp, re.UNICODE)
456 else:
457 return regexp
458
459
460 class Inliner:
461
462 """
463 Parse inline markup; call the `parse()` method.
464 """
465
466 def __init__(self):
467 self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
468 """List of (pattern, bound method) tuples, used by
469 `self.implicit_inline`."""
470
471 def init_customizations(self, settings):
472 """Setting-based customizations; run when parsing begins."""
473 if settings.pep_references:
474 self.implicit_dispatch.append((self.patterns.pep,
475 self.pep_reference))
476 if settings.rfc_references:
477 self.implicit_dispatch.append((self.patterns.rfc,
478 self.rfc_reference))
479
480 def parse(self, text, lineno, memo, parent):
481 # Needs to be refactored for nested inline markup.
482 # Add nested_parse() method?
483 """
484 Return 2 lists: nodes (text and inline elements), and system_messages.
485
486 Using `self.patterns.initial`, a pattern which matches start-strings
487 (emphasis, strong, interpreted, phrase reference, literal,
488 substitution reference, and inline target) and complete constructs
489 (simple reference, footnote reference), search for a candidate. When
490 one is found, check for validity (e.g., not a quoted '*' character).
491 If valid, search for the corresponding end string if applicable, and
492 check it for validity. If not found or invalid, generate a warning
493 and ignore the start-string. Implicit inline markup (e.g. standalone
494 URIs) is found last.
495 """
496 self.reporter = memo.reporter
497 self.document = memo.document
498 self.language = memo.language
499 self.parent = parent
500 pattern_search = self.patterns.initial.search
501 dispatch = self.dispatch
502 remaining = escape2null(text)
503 processed = []
504 unprocessed = []
505 messages = []
506 while remaining:
507 match = pattern_search(remaining)
508 if match:
509 groups = match.groupdict()
510 method = dispatch[groups['start'] or groups['backquote']
511 or groups['refend'] or groups['fnend']]
512 before, inlines, remaining, sysmessages = method(self, match,
513 lineno)
514 unprocessed.append(before)
515 messages += sysmessages
516 if inlines:
517 processed += self.implicit_inline(''.join(unprocessed),
518 lineno)
519 processed += inlines
520 unprocessed = []
521 else:
522 break
523 remaining = ''.join(unprocessed) + remaining
524 if remaining:
525 processed += self.implicit_inline(remaining, lineno)
526 return processed, messages
527
528 # Inline object recognition
529 # -------------------------
530 # lookahead and look-behind expressions for inline markup rules
531 start_string_prefix = (u'(^|(?<=\\s|[%s%s]))' %
532 (punctuation_chars.openers,
533 punctuation_chars.delimiters))
534 end_string_suffix = (u'($|(?=\\s|[\x00%s%s%s]))' %
535 (punctuation_chars.closing_delimiters,
536 punctuation_chars.delimiters,
537 punctuation_chars.closers))
538 # print start_string_prefix.encode('utf8')
539 # TODO: support non-ASCII whitespace in the following 4 patterns?
540 non_whitespace_before = r'(?<![ \n])'
541 non_whitespace_escape_before = r'(?<![ \n\x00])'
542 non_unescaped_whitespace_escape_before = r'(?<!(?<!\x00)[ \n\x00])'
543 non_whitespace_after = r'(?![ \n])'
544 # Alphanumerics with isolated internal [-._+:] chars (i.e. not 2 together):
545 simplename = r'(?:(?!_)\w)+(?:[-._+:](?:(?!_)\w)+)*'
546 # Valid URI characters (see RFC 2396 & RFC 2732);
547 # final \x00 allows backslash escapes in URIs:
548 uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9\x00]"""
549 # Delimiter indicating the end of a URI (not part of the URI):
550 uri_end_delim = r"""[>]"""
551 # Last URI character; same as uric but no punctuation:
552 urilast = r"""[_~*/=+a-zA-Z0-9]"""
553 # End of a URI (either 'urilast' or 'uric followed by a
554 # uri_end_delim'):
555 uri_end = r"""(?:%(urilast)s|%(uric)s(?=%(uri_end_delim)s))""" % locals()
556 emailc = r"""[-_!~*'{|}/#?^`&=+$%a-zA-Z0-9\x00]"""
557 email_pattern = r"""
558 %(emailc)s+(?:\.%(emailc)s+)* # name
559 (?<!\x00)@ # at
560 %(emailc)s+(?:\.%(emailc)s*)* # host
561 %(uri_end)s # final URI char
562 """
563 parts = ('initial_inline', start_string_prefix, '',
564 [('start', '', non_whitespace_after, # simple start-strings
565 [r'\*\*', # strong
566 r'\*(?!\*)', # emphasis but not strong
567 r'``', # literal
568 r'_`', # inline internal target
569 r'\|(?!\|)'] # substitution reference
570 ),
571 ('whole', '', end_string_suffix, # whole constructs
572 [# reference name & end-string
573 r'(?P<refname>%s)(?P<refend>__?)' % simplename,
574 ('footnotelabel', r'\[', r'(?P<fnend>\]_)',
575 [r'[0-9]+', # manually numbered
576 r'\#(%s)?' % simplename, # auto-numbered (w/ label?)
577 r'\*', # auto-symbol
578 r'(?P<citationlabel>%s)' % simplename] # citation reference
579 )
580 ]
581 ),
582 ('backquote', # interpreted text or phrase reference
583 '(?P<role>(:%s:)?)' % simplename, # optional role
584 non_whitespace_after,
585 ['`(?!`)'] # but not literal
586 )
587 ]
588 )
589 patterns = Struct(
590 initial=build_regexp(parts),
591 emphasis=re.compile(non_whitespace_escape_before
592 + r'(\*)' + end_string_suffix, re.UNICODE),
593 strong=re.compile(non_whitespace_escape_before
594 + r'(\*\*)' + end_string_suffix, re.UNICODE),
595 interpreted_or_phrase_ref=re.compile(
596 r"""
597 %(non_unescaped_whitespace_escape_before)s
598 (
599 `
600 (?P<suffix>
601 (?P<role>:%(simplename)s:)?
602 (?P<refend>__?)?
603 )
604 )
605 %(end_string_suffix)s
606 """ % locals(), re.VERBOSE | re.UNICODE),
607 embedded_link=re.compile(
608 r"""
609 (
610 (?:[ \n]+|^) # spaces or beginning of line/string
611 < # open bracket
612 %(non_whitespace_after)s
613 ([^<>\x00]+(\x00_)?) # anything but angle brackets & nulls
614 # except escaped trailing low line
615 %(non_whitespace_before)s
616 > # close bracket w/o whitespace before
617 )
618 $ # end of string
619 """ % locals(), re.VERBOSE | re.UNICODE),
620 literal=re.compile(non_whitespace_before + '(``)'
621 + end_string_suffix),
622 target=re.compile(non_whitespace_escape_before
623 + r'(`)' + end_string_suffix),
624 substitution_ref=re.compile(non_whitespace_escape_before
625 + r'(\|_{0,2})'
626 + end_string_suffix),
627 email=re.compile(email_pattern % locals() + '$',
628 re.VERBOSE | re.UNICODE),
629 uri=re.compile(
630 (r"""
631 %(start_string_prefix)s
632 (?P<whole>
633 (?P<absolute> # absolute URI
634 (?P<scheme> # scheme (http, ftp, mailto)
635 [a-zA-Z][a-zA-Z0-9.+-]*
636 )
637 :
638 (
639 ( # either:
640 (//?)? # hierarchical URI
641 %(uric)s* # URI characters
642 %(uri_end)s # final URI char
643 )
644 ( # optional query
645 \?%(uric)s*
646 %(uri_end)s
647 )?
648 ( # optional fragment
649 \#%(uric)s*
650 %(uri_end)s
651 )?
652 )
653 )
654 | # *OR*
655 (?P<email> # email address
656 """ + email_pattern + r"""
657 )
658 )
659 %(end_string_suffix)s
660 """) % locals(), re.VERBOSE | re.UNICODE),
661 pep=re.compile(
662 r"""
663 %(start_string_prefix)s
664 (
665 (pep-(?P<pepnum1>\d+)(.txt)?) # reference to source file
666 |
667 (PEP\s+(?P<pepnum2>\d+)) # reference by name
668 )
669 %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE),
670 rfc=re.compile(
671 r"""
672 %(start_string_prefix)s
673 (RFC(-|\s+)?(?P<rfcnum>\d+))
674 %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE))
675
676 def quoted_start(self, match):
677 """Test if inline markup start-string is 'quoted'.
678
679 'Quoted' in this context means the start-string is enclosed in a pair
680 of matching opening/closing delimiters (not necessarily quotes)
681 or at the end of the match.
682 """
683 string = match.string
684 start = match.start()
685 if start == 0: # start-string at beginning of text
686 return False
687 prestart = string[start - 1]
688 try:
689 poststart = string[match.end()]
690 except IndexError: # start-string at end of text
691 return True # not "quoted" but no markup start-string either
692 return punctuation_chars.match_chars(prestart, poststart)
693
694 def inline_obj(self, match, lineno, end_pattern, nodeclass,
695 restore_backslashes=False):
696 string = match.string
697 matchstart = match.start('start')
698 matchend = match.end('start')
699 if self.quoted_start(match):
700 return (string[:matchend], [], string[matchend:], [], '')
701 endmatch = end_pattern.search(string[matchend:])
702 if endmatch and endmatch.start(1): # 1 or more chars
703 text = unescape(endmatch.string[:endmatch.start(1)],
704 restore_backslashes)
705 textend = matchend + endmatch.end(1)
706 rawsource = unescape(string[matchstart:textend], 1)
707 return (string[:matchstart], [nodeclass(rawsource, text)],
708 string[textend:], [], endmatch.group(1))
709 msg = self.reporter.warning(
710 'Inline %s start-string without end-string.'
711 % nodeclass.__name__, line=lineno)
712 text = unescape(string[matchstart:matchend], 1)
713 rawsource = unescape(string[matchstart:matchend], 1)
714 prb = self.problematic(text, rawsource, msg)
715 return string[:matchstart], [prb], string[matchend:], [msg], ''
716
717 def problematic(self, text, rawsource, message):
718 msgid = self.document.set_id(message, self.parent)
719 problematic = nodes.problematic(rawsource, text, refid=msgid)
720 prbid = self.document.set_id(problematic)
721 message.add_backref(prbid)
722 return problematic
723
724 def emphasis(self, match, lineno):
725 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
726 match, lineno, self.patterns.emphasis, nodes.emphasis)
727 return before, inlines, remaining, sysmessages
728
729 def strong(self, match, lineno):
730 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
731 match, lineno, self.patterns.strong, nodes.strong)
732 return before, inlines, remaining, sysmessages
733
734 def interpreted_or_phrase_ref(self, match, lineno):
735 end_pattern = self.patterns.interpreted_or_phrase_ref
736 string = match.string
737 matchstart = match.start('backquote')
738 matchend = match.end('backquote')
739 rolestart = match.start('role')
740 role = match.group('role')
741 position = ''
742 if role:
743 role = role[1:-1]
744 position = 'prefix'
745 elif self.quoted_start(match):
746 return (string[:matchend], [], string[matchend:], [])
747 endmatch = end_pattern.search(string[matchend:])
748 if endmatch and endmatch.start(1): # 1 or more chars
749 textend = matchend + endmatch.end()
750 if endmatch.group('role'):
751 if role:
752 msg = self.reporter.warning(
753 'Multiple roles in interpreted text (both '
754 'prefix and suffix present; only one allowed).',
755 line=lineno)
756 text = unescape(string[rolestart:textend], 1)
757 prb = self.problematic(text, text, msg)
758 return string[:rolestart], [prb], string[textend:], [msg]
759 role = endmatch.group('suffix')[1:-1]
760 position = 'suffix'
761 escaped = endmatch.string[:endmatch.start(1)]
762 rawsource = unescape(string[matchstart:textend], 1)
763 if rawsource[-1:] == '_':
764 if role:
765 msg = self.reporter.warning(
766 'Mismatch: both interpreted text role %s and '
767 'reference suffix.' % position, line=lineno)
768 text = unescape(string[rolestart:textend], 1)
769 prb = self.problematic(text, text, msg)
770 return string[:rolestart], [prb], string[textend:], [msg]
771 return self.phrase_ref(string[:matchstart], string[textend:],
772 rawsource, escaped, unescape(escaped))
773 else:
774 rawsource = unescape(string[rolestart:textend], 1)
775 nodelist, messages = self.interpreted(rawsource, escaped, role,
776 lineno)
777 return (string[:rolestart], nodelist,
778 string[textend:], messages)
779 msg = self.reporter.warning(
780 'Inline interpreted text or phrase reference start-string '
781 'without end-string.', line=lineno)
782 text = unescape(string[matchstart:matchend], 1)
783 prb = self.problematic(text, text, msg)
784 return string[:matchstart], [prb], string[matchend:], [msg]
785
786 def phrase_ref(self, before, after, rawsource, escaped, text):
787 match = self.patterns.embedded_link.search(escaped)
788 if match: # embedded <URI> or <alias_>
789 text = unescape(escaped[:match.start(0)])
790 aliastext = unescape(match.group(2), restore_backslashes=True)
791 if aliastext.endswith('_') and not (aliastext.endswith(r'\_')
792 or self.patterns.uri.match(aliastext)):
793 aliastype = 'name'
794 alias = normalize_name(aliastext[:-1])
795 target = nodes.target(match.group(1), refname=alias)
796 target.indirect_reference_name = aliastext[:-1]
797 else:
798 aliastype = 'uri'
799 alias = ''.join(aliastext.split())
800 alias = self.adjust_uri(alias)
801 if alias.endswith(r'\_'):
802 alias = alias[:-2] + '_'
803 target = nodes.target(match.group(1), refuri=alias)
804 target.referenced = 1
805 if not aliastext:
806 raise ApplicationError('problem with embedded link: %r'
807 % aliastext)
808 if not text:
809 text = alias
810 else:
811 target = None
812
813 refname = normalize_name(text)
814 reference = nodes.reference(rawsource, text,
815 name=whitespace_normalize_name(text))
816 node_list = [reference]
817
818 if rawsource[-2:] == '__':
819 if target and (aliastype == 'name'):
820 reference['refname'] = alias
821 self.document.note_refname(reference)
822 # self.document.note_indirect_target(target) # required?
823 elif target and (aliastype == 'uri'):
824 reference['refuri'] = alias
825 else:
826 reference['anonymous'] = 1
827 else:
828 if target:
829 target['names'].append(refname)
830 if aliastype == 'name':
831 reference['refname'] = alias
832 self.document.note_indirect_target(target)
833 self.document.note_refname(reference)
834 else:
835 reference['refuri'] = alias
836 self.document.note_explicit_target(target, self.parent)
837 # target.note_referenced_by(name=refname)
838 node_list.append(target)
839 else:
840 reference['refname'] = refname
841 self.document.note_refname(reference)
842 return before, node_list, after, []
843
844
845 def adjust_uri(self, uri):
846 match = self.patterns.email.match(uri)
847 if match:
848 return 'mailto:' + uri
849 else:
850 return uri
851
852 def interpreted(self, rawsource, text, role, lineno):
853 role_fn, messages = roles.role(role, self.language, lineno,
854 self.reporter)
855 if role_fn:
856 nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
857 return nodes, messages + messages2
858 else:
859 msg = self.reporter.error(
860 'Unknown interpreted text role "%s".' % role,
861 line=lineno)
862 return ([self.problematic(rawsource, rawsource, msg)],
863 messages + [msg])
864
865 def literal(self, match, lineno):
866 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
867 match, lineno, self.patterns.literal, nodes.literal,
868 restore_backslashes=True)
869 return before, inlines, remaining, sysmessages
870
871 def inline_internal_target(self, match, lineno):
872 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
873 match, lineno, self.patterns.target, nodes.target)
874 if inlines and isinstance(inlines[0], nodes.target):
875 assert len(inlines) == 1
876 target = inlines[0]
877 name = normalize_name(target.astext())
878 target['names'].append(name)
879 self.document.note_explicit_target(target, self.parent)
880 return before, inlines, remaining, sysmessages
881
882 def substitution_reference(self, match, lineno):
883 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
884 match, lineno, self.patterns.substitution_ref,
885 nodes.substitution_reference)
886 if len(inlines) == 1:
887 subref_node = inlines[0]
888 if isinstance(subref_node, nodes.substitution_reference):
889 subref_text = subref_node.astext()
890 self.document.note_substitution_ref(subref_node, subref_text)
891 if endstring[-1:] == '_':
892 reference_node = nodes.reference(
893 '|%s%s' % (subref_text, endstring), '')
894 if endstring[-2:] == '__':
895 reference_node['anonymous'] = 1
896 else:
897 reference_node['refname'] = normalize_name(subref_text)
898 self.document.note_refname(reference_node)
899 reference_node += subref_node
900 inlines = [reference_node]
901 return before, inlines, remaining, sysmessages
902
903 def footnote_reference(self, match, lineno):
904 """
905 Handles `nodes.footnote_reference` and `nodes.citation_reference`
906 elements.
907 """
908 label = match.group('footnotelabel')
909 refname = normalize_name(label)
910 string = match.string
911 before = string[:match.start('whole')]
912 remaining = string[match.end('whole'):]
913 if match.group('citationlabel'):
914 refnode = nodes.citation_reference('[%s]_' % label,
915 refname=refname)
916 refnode += nodes.Text(label)
917 self.document.note_citation_ref(refnode)
918 else:
919 refnode = nodes.footnote_reference('[%s]_' % label)
920 if refname[0] == '#':
921 refname = refname[1:]
922 refnode['auto'] = 1
923 self.document.note_autofootnote_ref(refnode)
924 elif refname == '*':
925 refname = ''
926 refnode['auto'] = '*'
927 self.document.note_symbol_footnote_ref(
928 refnode)
929 else:
930 refnode += nodes.Text(label)
931 if refname:
932 refnode['refname'] = refname
933 self.document.note_footnote_ref(refnode)
934 if utils.get_trim_footnote_ref_space(self.document.settings):
935 before = before.rstrip()
936 return (before, [refnode], remaining, [])
937
938 def reference(self, match, lineno, anonymous=False):
939 referencename = match.group('refname')
940 refname = normalize_name(referencename)
941 referencenode = nodes.reference(
942 referencename + match.group('refend'), referencename,
943 name=whitespace_normalize_name(referencename))
944 if anonymous:
945 referencenode['anonymous'] = 1
946 else:
947 referencenode['refname'] = refname
948 self.document.note_refname(referencenode)
949 string = match.string
950 matchstart = match.start('whole')
951 matchend = match.end('whole')
952 return (string[:matchstart], [referencenode], string[matchend:], [])
953
954 def anonymous_reference(self, match, lineno):
955 return self.reference(match, lineno, anonymous=1)
956
957 def standalone_uri(self, match, lineno):
958 if (not match.group('scheme')
959 or match.group('scheme').lower() in urischemes.schemes):
960 if match.group('email'):
961 addscheme = 'mailto:'
962 else:
963 addscheme = ''
964 text = match.group('whole')
965 unescaped = unescape(text, 0)
966 return [nodes.reference(unescape(text, 1), unescaped,
967 refuri=addscheme + unescaped)]
968 else: # not a valid scheme
969 raise MarkupMismatch
970
971 def pep_reference(self, match, lineno):
972 text = match.group(0)
973 if text.startswith('pep-'):
974 pepnum = int(match.group('pepnum1'))
975 elif text.startswith('PEP'):
976 pepnum = int(match.group('pepnum2'))
977 else:
978 raise MarkupMismatch
979 ref = (self.document.settings.pep_base_url
980 + self.document.settings.pep_file_url_template % pepnum)
981 unescaped = unescape(text, 0)
982 return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
983
984 rfc_url = 'rfc%d.html'
985
986 def rfc_reference(self, match, lineno):
987 text = match.group(0)
988 if text.startswith('RFC'):
989 rfcnum = int(match.group('rfcnum'))
990 ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum
991 else:
992 raise MarkupMismatch
993 unescaped = unescape(text, 0)
994 return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
995
996 def implicit_inline(self, text, lineno):
997 """
998 Check each of the patterns in `self.implicit_dispatch` for a match,
999 and dispatch to the stored method for the pattern. Recursively check
1000 the text before and after the match. Return a list of `nodes.Text`
1001 and inline element nodes.
1002 """
1003 if not text:
1004 return []
1005 for pattern, method in self.implicit_dispatch:
1006 match = pattern.search(text)
1007 if match:
1008 try:
1009 # Must recurse on strings before *and* after the match;
1010 # there may be multiple patterns.
1011 return (self.implicit_inline(text[:match.start()], lineno)
1012 + method(match, lineno) +
1013 self.implicit_inline(text[match.end():], lineno))
1014 except MarkupMismatch:
1015 pass
1016 return [nodes.Text(unescape(text), rawsource=unescape(text, 1))]
1017
1018 dispatch = {'*': emphasis,
1019 '**': strong,
1020 '`': interpreted_or_phrase_ref,
1021 '``': literal,
1022 '_`': inline_internal_target,
1023 ']_': footnote_reference,
1024 '|': substitution_reference,
1025 '_': reference,
1026 '__': anonymous_reference}
1027
1028
1029 def _loweralpha_to_int(s, _zero=(ord('a')-1)):
1030 return ord(s) - _zero
1031
1032 def _upperalpha_to_int(s, _zero=(ord('A')-1)):
1033 return ord(s) - _zero
1034
1035 def _lowerroman_to_int(s):
1036 return roman.fromRoman(s.upper())
1037
1038
1039 class Body(RSTState):
1040
1041 """
1042 Generic classifier of the first line of a block.
1043 """
1044
1045 double_width_pad_char = tableparser.TableParser.double_width_pad_char
1046 """Padding character for East Asian double-width text."""
1047
1048 enum = Struct()
1049 """Enumerated list parsing information."""
1050
1051 enum.formatinfo = {
1052 'parens': Struct(prefix='(', suffix=')', start=1, end=-1),
1053 'rparen': Struct(prefix='', suffix=')', start=0, end=-1),
1054 'period': Struct(prefix='', suffix='.', start=0, end=-1)}
1055 enum.formats = enum.formatinfo.keys()
1056 enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
1057 'lowerroman', 'upperroman'] # ORDERED!
1058 enum.sequencepats = {'arabic': '[0-9]+',
1059 'loweralpha': '[a-z]',
1060 'upperalpha': '[A-Z]',
1061 'lowerroman': '[ivxlcdm]+',
1062 'upperroman': '[IVXLCDM]+',}
1063 enum.converters = {'arabic': int,
1064 'loweralpha': _loweralpha_to_int,
1065 'upperalpha': _upperalpha_to_int,
1066 'lowerroman': _lowerroman_to_int,
1067 'upperroman': roman.fromRoman}
1068
1069 enum.sequenceregexps = {}
1070 for sequence in enum.sequences:
1071 enum.sequenceregexps[sequence] = re.compile(
1072 enum.sequencepats[sequence] + '$', re.UNICODE)
1073
1074 grid_table_top_pat = re.compile(r'\+-[-+]+-\+ *$')
1075 """Matches the top (& bottom) of a full table)."""
1076
1077 simple_table_top_pat = re.compile('=+( +=+)+ *$')
1078 """Matches the top of a simple table."""
1079
1080 simple_table_border_pat = re.compile('=+[ =]*$')
1081 """Matches the bottom & header bottom of a simple table."""
1082
1083 pats = {}
1084 """Fragments of patterns used by transitions."""
1085
1086 pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
1087 pats['alpha'] = '[a-zA-Z]'
1088 pats['alphanum'] = '[a-zA-Z0-9]'
1089 pats['alphanumplus'] = '[a-zA-Z0-9_-]'
1090 pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s'
1091 '|%(upperroman)s|#)' % enum.sequencepats)
1092 pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
1093 # @@@ Loosen up the pattern? Allow Unicode?
1094 pats['optarg'] = '(%(alpha)s%(alphanumplus)s*|<[^<>]+>)' % pats
1095 pats['shortopt'] = r'(-|\+)%(alphanum)s( ?%(optarg)s)?' % pats
1096 pats['longopt'] = r'(--|/)%(optname)s([ =]%(optarg)s)?' % pats
1097 pats['option'] = r'(%(shortopt)s|%(longopt)s)' % pats
1098
1099 for format in enum.formats:
1100 pats[format] = '(?P<%s>%s%s%s)' % (
1101 format, re.escape(enum.formatinfo[format].prefix),
1102 pats['enum'], re.escape(enum.formatinfo[format].suffix))
1103
1104 patterns = {
1105 'bullet': u'[-+*\u2022\u2023\u2043]( +|$)',
1106 'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)' % pats,
1107 'field_marker': r':(?![: ])([^:\\]|\\.)*(?<! ):( +|$)',
1108 'option_marker': r'%(option)s(, %(option)s)*( +| ?$)' % pats,
1109 'doctest': r'>>>( +|$)',
1110 'line_block': r'\|( +|$)',
1111 'grid_table_top': grid_table_top_pat,
1112 'simple_table_top': simple_table_top_pat,
1113 'explicit_markup': r'\.\.( +|$)',
1114 'anonymous': r'__( +|$)',
1115 'line': r'(%(nonalphanum7bit)s)\1* *$' % pats,
1116 'text': r''}
1117 initial_transitions = (
1118 'bullet',
1119 'enumerator',
1120 'field_marker',
1121 'option_marker',
1122 'doctest',
1123 'line_block',
1124 'grid_table_top',
1125 'simple_table_top',
1126 'explicit_markup',
1127 'anonymous',
1128 'line',
1129 'text')
1130
1131 def indent(self, match, context, next_state):
1132 """Block quote."""
1133 indented, indent, line_offset, blank_finish = \
1134 self.state_machine.get_indented()
1135 elements = self.block_quote(indented, line_offset)
1136 self.parent += elements
1137 if not blank_finish:
1138 self.parent += self.unindent_warning('Block quote')
1139 return context, next_state, []
1140
1141 def block_quote(self, indented, line_offset):
1142 elements = []
1143 while indented:
1144 (blockquote_lines,
1145 attribution_lines,
1146 attribution_offset,
1147 indented,
1148 new_line_offset) = self.split_attribution(indented, line_offset)
1149 blockquote = nodes.block_quote()
1150 self.nested_parse(blockquote_lines, line_offset, blockquote)
1151 elements.append(blockquote)
1152 if attribution_lines:
1153 attribution, messages = self.parse_attribution(
1154 attribution_lines, attribution_offset)
1155 blockquote += attribution
1156 elements += messages
1157 line_offset = new_line_offset
1158 while indented and not indented[0]:
1159 indented = indented[1:]
1160 line_offset += 1
1161 return elements
1162
1163 # U+2014 is an em-dash:
1164 attribution_pattern = re.compile(u'(---?(?!-)|\u2014) *(?=[^ \\n])',
1165 re.UNICODE)
1166
1167 def split_attribution(self, indented, line_offset):
1168 """
1169 Check for a block quote attribution and split it off:
1170
1171 * First line after a blank line must begin with a dash ("--", "---",
1172 em-dash; matches `self.attribution_pattern`).
1173 * Every line after that must have consistent indentation.
1174 * Attributions must be preceded by block quote content.
1175
1176 Return a tuple of: (block quote content lines, content offset,
1177 attribution lines, attribution offset, remaining indented lines).
1178 """
1179 blank = None
1180 nonblank_seen = False
1181 for i in range(len(indented)):
1182 line = indented[i].rstrip()
1183 if line:
1184 if nonblank_seen and blank == i - 1: # last line blank
1185 match = self.attribution_pattern.match(line)
1186 if match:
1187 attribution_end, indent = self.check_attribution(
1188 indented, i)
1189 if attribution_end:
1190 a_lines = indented[i:attribution_end]
1191 a_lines.trim_left(match.end(), end=1)
1192 a_lines.trim_left(indent, start=1)
1193 return (indented[:i], a_lines,
1194 i, indented[attribution_end:],
1195 line_offset + attribution_end)
1196 nonblank_seen = True
1197 else:
1198 blank = i
1199 else:
1200 return (indented, None, None, None, None)
1201
1202 def check_attribution(self, indented, attribution_start):
1203 """
1204 Check attribution shape.
1205 Return the index past the end of the attribution, and the indent.
1206 """
1207 indent = None
1208 i = attribution_start + 1
1209 for i in range(attribution_start + 1, len(indented)):
1210 line = indented[i].rstrip()
1211 if not line:
1212 break
1213 if indent is None:
1214 indent = len(line) - len(line.lstrip())
1215 elif len(line) - len(line.lstrip()) != indent:
1216 return None, None # bad shape; not an attribution
1217 else:
1218 # return index of line after last attribution line:
1219 i += 1
1220 return i, (indent or 0)
1221
1222 def parse_attribution(self, indented, line_offset):
1223 text = '\n'.join(indented).rstrip()
1224 lineno = self.state_machine.abs_line_number() + line_offset
1225 textnodes, messages = self.inline_text(text, lineno)
1226 node = nodes.attribution(text, '', *textnodes)
1227 node.source, node.line = self.state_machine.get_source_and_line(lineno)
1228 return node, messages
1229
1230 def bullet(self, match, context, next_state):
1231 """Bullet list item."""
1232 bulletlist = nodes.bullet_list()
1233 self.parent += bulletlist
1234 bulletlist['bullet'] = match.string[0]
1235 i, blank_finish = self.list_item(match.end())
1236 bulletlist += i
1237 offset = self.state_machine.line_offset + 1 # next line
1238 new_line_offset, blank_finish = self.nested_list_parse(
1239 self.state_machine.input_lines[offset:],
1240 input_offset=self.state_machine.abs_line_offset() + 1,
1241 node=bulletlist, initial_state='BulletList',
1242 blank_finish=blank_finish)
1243 self.goto_line(new_line_offset)
1244 if not blank_finish:
1245 self.parent += self.unindent_warning('Bullet list')
1246 return [], next_state, []
1247
1248 def list_item(self, indent):
1249 if self.state_machine.line[indent:]:
1250 indented, line_offset, blank_finish = (
1251 self.state_machine.get_known_indented(indent))
1252 else:
1253 indented, indent, line_offset, blank_finish = (
1254 self.state_machine.get_first_known_indented(indent))
1255 listitem = nodes.list_item('\n'.join(indented))
1256 if indented:
1257 self.nested_parse(indented, input_offset=line_offset,
1258 node=listitem)
1259 return listitem, blank_finish
1260
1261 def enumerator(self, match, context, next_state):
1262 """Enumerated List Item"""
1263 format, sequence, text, ordinal = self.parse_enumerator(match)
1264 if not self.is_enumerated_list_item(ordinal, sequence, format):
1265 raise statemachine.TransitionCorrection('text')
1266 enumlist = nodes.enumerated_list()
1267 self.parent += enumlist
1268 if sequence == '#':
1269 enumlist['enumtype'] = 'arabic'
1270 else:
1271 enumlist['enumtype'] = sequence
1272 enumlist['prefix'] = self.enum.formatinfo[format].prefix
1273 enumlist['suffix'] = self.enum.formatinfo[format].suffix
1274 if ordinal != 1:
1275 enumlist['start'] = ordinal
1276 msg = self.reporter.info(
1277 'Enumerated list start value not ordinal-1: "%s" (ordinal %s)'
1278 % (text, ordinal))
1279 self.parent += msg
1280 listitem, blank_finish = self.list_item(match.end())
1281 enumlist += listitem
1282 offset = self.state_machine.line_offset + 1 # next line
1283 newline_offset, blank_finish = self.nested_list_parse(
1284 self.state_machine.input_lines[offset:],
1285 input_offset=self.state_machine.abs_line_offset() + 1,
1286 node=enumlist, initial_state='EnumeratedList',
1287 blank_finish=blank_finish,
1288 extra_settings={'lastordinal': ordinal,
1289 'format': format,
1290 'auto': sequence == '#'})
1291 self.goto_line(newline_offset)
1292 if not blank_finish:
1293 self.parent += self.unindent_warning('Enumerated list')
1294 return [], next_state, []
1295
1296 def parse_enumerator(self, match, expected_sequence=None):
1297 """
1298 Analyze an enumerator and return the results.
1299
1300 :Return:
1301 - the enumerator format ('period', 'parens', or 'rparen'),
1302 - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
1303 - the text of the enumerator, stripped of formatting, and
1304 - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
1305 ``None`` is returned for invalid enumerator text).
1306
1307 The enumerator format has already been determined by the regular
1308 expression match. If `expected_sequence` is given, that sequence is
1309 tried first. If not, we check for Roman numeral 1. This way,
1310 single-character Roman numerals (which are also alphabetical) can be
1311 matched. If no sequence has been matched, all sequences are checked in
1312 order.
1313 """
1314 groupdict = match.groupdict()
1315 sequence = ''
1316 for format in self.enum.formats:
1317 if groupdict[format]: # was this the format matched?
1318 break # yes; keep `format`
1319 else: # shouldn't happen
1320 raise ParserError('enumerator format not matched')
1321 text = groupdict[format][self.enum.formatinfo[format].start
1322 :self.enum.formatinfo[format].end]
1323 if text == '#':
1324 sequence = '#'
1325 elif expected_sequence:
1326 try:
1327 if self.enum.sequenceregexps[expected_sequence].match(text):
1328 sequence = expected_sequence
1329 except KeyError: # shouldn't happen
1330 raise ParserError('unknown enumerator sequence: %s'
1331 % sequence)
1332 elif text == 'i':
1333 sequence = 'lowerroman'
1334 elif text == 'I':
1335 sequence = 'upperroman'
1336 if not sequence:
1337 for sequence in self.enum.sequences:
1338 if self.enum.sequenceregexps[sequence].match(text):
1339 break
1340 else: # shouldn't happen
1341 raise ParserError('enumerator sequence not matched')
1342 if sequence == '#':
1343 ordinal = 1
1344 else:
1345 try:
1346 ordinal = self.enum.converters[sequence](text)
1347 except roman.InvalidRomanNumeralError:
1348 ordinal = None
1349 return format, sequence, text, ordinal
1350
1351 def is_enumerated_list_item(self, ordinal, sequence, format):
1352 """
1353 Check validity based on the ordinal value and the second line.
1354
1355 Return true if the ordinal is valid and the second line is blank,
1356 indented, or starts with the next enumerator or an auto-enumerator.
1357 """
1358 if ordinal is None:
1359 return None
1360 try:
1361 next_line = self.state_machine.next_line()
1362 except EOFError: # end of input lines
1363 self.state_machine.previous_line()
1364 return 1
1365 else:
1366 self.state_machine.previous_line()
1367 if not next_line[:1].strip(): # blank or indented
1368 return 1
1369 result = self.make_enumerator(ordinal + 1, sequence, format)
1370 if result:
1371 next_enumerator, auto_enumerator = result
1372 try:
1373 if ( next_line.startswith(next_enumerator) or
1374 next_line.startswith(auto_enumerator) ):
1375 return 1
1376 except TypeError:
1377 pass
1378 return None
1379
1380 def make_enumerator(self, ordinal, sequence, format):
1381 """
1382 Construct and return the next enumerated list item marker, and an
1383 auto-enumerator ("#" instead of the regular enumerator).
1384
1385 Return ``None`` for invalid (out of range) ordinals.
1386 """ #"
1387 if sequence == '#':
1388 enumerator = '#'
1389 elif sequence == 'arabic':
1390 enumerator = str(ordinal)
1391 else:
1392 if sequence.endswith('alpha'):
1393 if ordinal > 26:
1394 return None
1395 enumerator = chr(ordinal + ord('a') - 1)
1396 elif sequence.endswith('roman'):
1397 try:
1398 enumerator = roman.toRoman(ordinal)
1399 except roman.RomanError:
1400 return None
1401 else: # shouldn't happen
1402 raise ParserError('unknown enumerator sequence: "%s"'
1403 % sequence)
1404 if sequence.startswith('lower'):
1405 enumerator = enumerator.lower()
1406 elif sequence.startswith('upper'):
1407 enumerator = enumerator.upper()
1408 else: # shouldn't happen
1409 raise ParserError('unknown enumerator sequence: "%s"'
1410 % sequence)
1411 formatinfo = self.enum.formatinfo[format]
1412 next_enumerator = (formatinfo.prefix + enumerator + formatinfo.suffix
1413 + ' ')
1414 auto_enumerator = formatinfo.prefix + '#' + formatinfo.suffix + ' '
1415 return next_enumerator, auto_enumerator
1416
1417 def field_marker(self, match, context, next_state):
1418 """Field list item."""
1419 field_list = nodes.field_list()
1420 self.parent += field_list
1421 field, blank_finish = self.field(match)
1422 field_list += field
1423 offset = self.state_machine.line_offset + 1 # next line
1424 newline_offset, blank_finish = self.nested_list_parse(
1425 self.state_machine.input_lines[offset:],
1426 input_offset=self.state_machine.abs_line_offset() + 1,
1427 node=field_list, initial_state='FieldList',
1428 blank_finish=blank_finish)
1429 self.goto_line(newline_offset)
1430 if not blank_finish:
1431 self.parent += self.unindent_warning('Field list')
1432 return [], next_state, []
1433
1434 def field(self, match):
1435 name = self.parse_field_marker(match)
1436 src, srcline = self.state_machine.get_source_and_line()
1437 lineno = self.state_machine.abs_line_number()
1438 indented, indent, line_offset, blank_finish = \
1439 self.state_machine.get_first_known_indented(match.end())
1440 field_node = nodes.field()
1441 field_node.source = src
1442 field_node.line = srcline
1443 name_nodes, name_messages = self.inline_text(name, lineno)
1444 field_node += nodes.field_name(name, '', *name_nodes)
1445 field_body = nodes.field_body('\n'.join(indented), *name_messages)
1446 field_node += field_body
1447 if indented:
1448 self.parse_field_body(indented, line_offset, field_body)
1449 return field_node, blank_finish
1450
1451 def parse_field_marker(self, match):
1452 """Extract & return field name from a field marker match."""
1453 field = match.group()[1:] # strip off leading ':'
1454 field = field[:field.rfind(':')] # strip off trailing ':' etc.
1455 return field
1456
1457 def parse_field_body(self, indented, offset, node):
1458 self.nested_parse(indented, input_offset=offset, node=node)
1459
1460 def option_marker(self, match, context, next_state):
1461 """Option list item."""
1462 optionlist = nodes.option_list()
1463 try:
1464 listitem, blank_finish = self.option_list_item(match)
1465 except MarkupError, error:
1466 # This shouldn't happen; pattern won't match.
1467 msg = self.reporter.error(u'Invalid option list marker: %s' %
1468 error)
1469 self.parent += msg
1470 indented, indent, line_offset, blank_finish = \
1471 self.state_machine.get_first_known_indented(match.end())
1472 elements = self.block_quote(indented, line_offset)
1473 self.parent += elements
1474 if not blank_finish:
1475 self.parent += self.unindent_warning('Option list')
1476 return [], next_state, []
1477 self.parent += optionlist
1478 optionlist += listitem
1479 offset = self.state_machine.line_offset + 1 # next line
1480 newline_offset, blank_finish = self.nested_list_parse(
1481 self.state_machine.input_lines[offset:],
1482 input_offset=self.state_machine.abs_line_offset() + 1,
1483 node=optionlist, initial_state='OptionList',
1484 blank_finish=blank_finish)
1485 self.goto_line(newline_offset)
1486 if not blank_finish:
1487 self.parent += self.unindent_warning('Option list')
1488 return [], next_state, []
1489
1490 def option_list_item(self, match):
1491 offset = self.state_machine.abs_line_offset()
1492 options = self.parse_option_marker(match)
1493 indented, indent, line_offset, blank_finish = \
1494 self.state_machine.get_first_known_indented(match.end())
1495 if not indented: # not an option list item
1496 self.goto_line(offset)
1497 raise statemachine.TransitionCorrection('text')
1498 option_group = nodes.option_group('', *options)
1499 description = nodes.description('\n'.join(indented))
1500 option_list_item = nodes.option_list_item('', option_group,
1501 description)
1502 if indented:
1503 self.nested_parse(indented, input_offset=line_offset,
1504 node=description)
1505 return option_list_item, blank_finish
1506
1507 def parse_option_marker(self, match):
1508 """
1509 Return a list of `node.option` and `node.option_argument` objects,
1510 parsed from an option marker match.
1511
1512 :Exception: `MarkupError` for invalid option markers.
1513 """
1514 optlist = []
1515 optionstrings = match.group().rstrip().split(', ')
1516 for optionstring in optionstrings:
1517 tokens = optionstring.split()
1518 delimiter = ' '
1519 firstopt = tokens[0].split('=', 1)
1520 if len(firstopt) > 1:
1521 # "--opt=value" form
1522 tokens[:1] = firstopt
1523 delimiter = '='
1524 elif (len(tokens[0]) > 2
1525 and ((tokens[0].startswith('-')
1526 and not tokens[0].startswith('--'))
1527 or tokens[0].startswith('+'))):
1528 # "-ovalue" form
1529 tokens[:1] = [tokens[0][:2], tokens[0][2:]]
1530 delimiter = ''
1531 if len(tokens) > 1 and (tokens[1].startswith('<')
1532 and tokens[-1].endswith('>')):
1533 # "-o <value1 value2>" form; join all values into one token
1534 tokens[1:] = [' '.join(tokens[1:])]
1535 if 0 < len(tokens) <= 2:
1536 option = nodes.option(optionstring)
1537 option += nodes.option_string(tokens[0], tokens[0])
1538 if len(tokens) > 1:
1539 option += nodes.option_argument(tokens[1], tokens[1],
1540 delimiter=delimiter)
1541 optlist.append(option)
1542 else:
1543 raise MarkupError(
1544 'wrong number of option tokens (=%s), should be 1 or 2: '
1545 '"%s"' % (len(tokens), optionstring))
1546 return optlist
1547
1548 def doctest(self, match, context, next_state):
1549 data = '\n'.join(self.state_machine.get_text_block())
1550 self.parent += nodes.doctest_block(data, data)
1551 return [], next_state, []
1552
1553 def line_block(self, match, context, next_state):
1554 """First line of a line block."""
1555 block = nodes.line_block()
1556 self.parent += block
1557 lineno = self.state_machine.abs_line_number()
1558 line, messages, blank_finish = self.line_block_line(match, lineno)
1559 block += line
1560 self.parent += messages
1561 if not blank_finish:
1562 offset = self.state_machine.line_offset + 1 # next line
1563 new_line_offset, blank_finish = self.nested_list_parse(
1564 self.state_machine.input_lines[offset:],
1565 input_offset=self.state_machine.abs_line_offset() + 1,
1566 node=block, initial_state='LineBlock',
1567 blank_finish=0)
1568 self.goto_line(new_line_offset)
1569 if not blank_finish:
1570 self.parent += self.reporter.warning(
1571 'Line block ends without a blank line.',
1572 line=lineno+1)
1573 if len(block):
1574 if block[0].indent is None:
1575 block[0].indent = 0
1576 self.nest_line_block_lines(block)
1577 return [], next_state, []
1578
1579 def line_block_line(self, match, lineno):
1580 """Return one line element of a line_block."""
1581 indented, indent, line_offset, blank_finish = \
1582 self.state_machine.get_first_known_indented(match.end(),
1583 until_blank=True)
1584 text = u'\n'.join(indented)
1585 text_nodes, messages = self.inline_text(text, lineno)
1586 line = nodes.line(text, '', *text_nodes)
1587 if match.string.rstrip() != '|': # not empty
1588 line.indent = len(match.group(1)) - 1
1589 return line, messages, blank_finish
1590
1591 def nest_line_block_lines(self, block):
1592 for index in range(1, len(block)):
1593 if getattr(block[index], 'indent', None) is None:
1594 block[index].indent = block[index - 1].indent
1595 self.nest_line_block_segment(block)
1596
1597 def nest_line_block_segment(self, block):
1598 indents = [item.indent for item in block]
1599 least = min(indents)
1600 new_items = []
1601 new_block = nodes.line_block()
1602 for item in block:
1603 if item.indent > least:
1604 new_block.append(item)
1605 else:
1606 if len(new_block):
1607 self.nest_line_block_segment(new_block)
1608 new_items.append(new_block)
1609 new_block = nodes.line_block()
1610 new_items.append(item)
1611 if len(new_block):
1612 self.nest_line_block_segment(new_block)
1613 new_items.append(new_block)
1614 block[:] = new_items
1615
1616 def grid_table_top(self, match, context, next_state):
1617 """Top border of a full table."""
1618 return self.table_top(match, context, next_state,
1619 self.isolate_grid_table,
1620 tableparser.GridTableParser)
1621
1622 def simple_table_top(self, match, context, next_state):
1623 """Top border of a simple table."""
1624 return self.table_top(match, context, next_state,
1625 self.isolate_simple_table,
1626 tableparser.SimpleTableParser)
1627
1628 def table_top(self, match, context, next_state,
1629 isolate_function, parser_class):
1630 """Top border of a generic table."""
1631 nodelist, blank_finish = self.table(isolate_function, parser_class)
1632 self.parent += nodelist
1633 if not blank_finish:
1634 msg = self.reporter.warning(
1635 'Blank line required after table.',
1636 line=self.state_machine.abs_line_number()+1)
1637 self.parent += msg
1638 return [], next_state, []
1639
1640 def table(self, isolate_function, parser_class):
1641 """Parse a table."""
1642 block, messages, blank_finish = isolate_function()
1643 if block:
1644 try:
1645 parser = parser_class()
1646 tabledata = parser.parse(block)
1647 tableline = (self.state_machine.abs_line_number() - len(block)
1648 + 1)
1649 table = self.build_table(tabledata, tableline)
1650 nodelist = [table] + messages
1651 except tableparser.TableMarkupError, err:
1652 nodelist = self.malformed_table(block, ' '.join(err.args),
1653 offset=err.offset) + messages
1654 else:
1655 nodelist = messages
1656 return nodelist, blank_finish
1657
1658 def isolate_grid_table(self):
1659 messages = []
1660 blank_finish = 1
1661 try:
1662 block = self.state_machine.get_text_block(flush_left=True)
1663 except statemachine.UnexpectedIndentationError, err:
1664 block, src, srcline = err.args
1665 messages.append(self.reporter.error('Unexpected indentation.',
1666 source=src, line=srcline))
1667 blank_finish = 0
1668 block.disconnect()
1669 # for East Asian chars:
1670 block.pad_double_width(self.double_width_pad_char)
1671 width = len(block[0].strip())
1672 for i in range(len(block)):
1673 block[i] = block[i].strip()
1674 if block[i][0] not in '+|': # check left edge
1675 blank_finish = 0
1676 self.state_machine.previous_line(len(block) - i)
1677 del block[i:]
1678 break
1679 if not self.grid_table_top_pat.match(block[-1]): # find bottom
1680 blank_finish = 0
1681 # from second-last to third line of table:
1682 for i in range(len(block) - 2, 1, -1):
1683 if self.grid_table_top_pat.match(block[i]):
1684 self.state_machine.previous_line(len(block) - i + 1)
1685 del block[i+1:]
1686 break
1687 else:
1688 messages.extend(self.malformed_table(block))
1689 return [], messages, blank_finish
1690 for i in range(len(block)): # check right edge
1691 if len(block[i]) != width or block[i][-1] not in '+|':
1692 messages.extend(self.malformed_table(block))
1693 return [], messages, blank_finish
1694 return block, messages, blank_finish
1695
1696 def isolate_simple_table(self):
1697 start = self.state_machine.line_offset
1698 lines = self.state_machine.input_lines
1699 limit = len(lines) - 1
1700 toplen = len(lines[start].strip())
1701 pattern_match = self.simple_table_border_pat.match
1702 found = 0
1703 found_at = None
1704 i = start + 1
1705 while i <= limit:
1706 line = lines[i]
1707 match = pattern_match(line)
1708 if match:
1709 if len(line.strip()) != toplen:
1710 self.state_machine.next_line(i - start)
1711 messages = self.malformed_table(
1712 lines[start:i+1], 'Bottom/header table border does '
1713 'not match top border.')
1714 return [], messages, i == limit or not lines[i+1].strip()
1715 found += 1
1716 found_at = i
1717 if found == 2 or i == limit or not lines[i+1].strip():
1718 end = i
1719 break
1720 i += 1
1721 else: # reached end of input_lines
1722 if found:
1723 extra = ' or no blank line after table bottom'
1724 self.state_machine.next_line(found_at - start)
1725 block = lines[start:found_at+1]
1726 else:
1727 extra = ''
1728 self.state_machine.next_line(i - start - 1)
1729 block = lines[start:]
1730 messages = self.malformed_table(
1731 block, 'No bottom table border found%s.' % extra)
1732 return [], messages, not extra
1733 self.state_machine.next_line(end - start)
1734 block = lines[start:end+1]
1735 # for East Asian chars:
1736 block.pad_double_width(self.double_width_pad_char)
1737 return block, [], end == limit or not lines[end+1].strip()
1738
1739 def malformed_table(self, block, detail='', offset=0):
1740 block.replace(self.double_width_pad_char, '')
1741 data = '\n'.join(block)
1742 message = 'Malformed table.'
1743 startline = self.state_machine.abs_line_number() - len(block) + 1
1744 if detail:
1745 message += '\n' + detail
1746 error = self.reporter.error(message, nodes.literal_block(data, data),
1747 line=startline+offset)
1748 return [error]
1749
1750 def build_table(self, tabledata, tableline, stub_columns=0):
1751 colwidths, headrows, bodyrows = tabledata
1752 table = nodes.table()
1753 tgroup = nodes.tgroup(cols=len(colwidths))
1754 table += tgroup
1755 for colwidth in colwidths:
1756 colspec = nodes.colspec(colwidth=colwidth)
1757 if stub_columns:
1758 colspec.attributes['stub'] = 1
1759 stub_columns -= 1
1760 tgroup += colspec
1761 if headrows:
1762 thead = nodes.thead()
1763 tgroup += thead
1764 for row in headrows:
1765 thead += self.build_table_row(row, tableline)
1766 tbody = nodes.tbody()
1767 tgroup += tbody
1768 for row in bodyrows:
1769 tbody += self.build_table_row(row, tableline)
1770 return table
1771
1772 def build_table_row(self, rowdata, tableline):
1773 row = nodes.row()
1774 for cell in rowdata:
1775 if cell is None:
1776 continue
1777 morerows, morecols, offset, cellblock = cell
1778 attributes = {}
1779 if morerows:
1780 attributes['morerows'] = morerows
1781 if morecols:
1782 attributes['morecols'] = morecols
1783 entry = nodes.entry(**attributes)
1784 row += entry
1785 if ''.join(cellblock):
1786 self.nested_parse(cellblock, input_offset=tableline+offset,
1787 node=entry)
1788 return row
1789
1790
1791 explicit = Struct()
1792 """Patterns and constants used for explicit markup recognition."""
1793
1794 explicit.patterns = Struct(
1795 target=re.compile(r"""
1796 (
1797 _ # anonymous target
1798 | # *OR*
1799 (?!_) # no underscore at the beginning
1800 (?P<quote>`?) # optional open quote
1801 (?![ `]) # first char. not space or
1802 # backquote
1803 (?P<name> # reference name
1804 .+?
1805 )
1806 %(non_whitespace_escape_before)s
1807 (?P=quote) # close quote if open quote used
1808 )
1809 (?<!(?<!\x00):) # no unescaped colon at end
1810 %(non_whitespace_escape_before)s
1811 [ ]? # optional space
1812 : # end of reference name
1813 ([ ]+|$) # followed by whitespace
1814 """ % vars(Inliner), re.VERBOSE | re.UNICODE),
1815 reference=re.compile(r"""
1816 (
1817 (?P<simple>%(simplename)s)_
1818 | # *OR*
1819 ` # open backquote
1820 (?![ ]) # not space
1821 (?P<phrase>.+?) # hyperlink phrase
1822 %(non_whitespace_escape_before)s
1823 `_ # close backquote,
1824 # reference mark
1825 )
1826 $ # end of string
1827 """ % vars(Inliner), re.VERBOSE | re.UNICODE),
1828 substitution=re.compile(r"""
1829 (
1830 (?![ ]) # first char. not space
1831 (?P<name>.+?) # substitution text
1832 %(non_whitespace_escape_before)s
1833 \| # close delimiter
1834 )
1835 ([ ]+|$) # followed by whitespace
1836 """ % vars(Inliner),
1837 re.VERBOSE | re.UNICODE),)
1838
1839 def footnote(self, match):
1840 src, srcline = self.state_machine.get_source_and_line()
1841 indented, indent, offset, blank_finish = \
1842 self.state_machine.get_first_known_indented(match.end())
1843 label = match.group(1)
1844 name = normalize_name(label)
1845 footnote = nodes.footnote('\n'.join(indented))
1846 footnote.source = src
1847 footnote.line = srcline
1848 if name[0] == '#': # auto-numbered
1849 name = name[1:] # autonumber label
1850 footnote['auto'] = 1
1851 if name:
1852 footnote['names'].append(name)
1853 self.document.note_autofootnote(footnote)
1854 elif name == '*': # auto-symbol
1855 name = ''
1856 footnote['auto'] = '*'
1857 self.document.note_symbol_footnote(footnote)
1858 else: # manually numbered
1859 footnote += nodes.label('', label)
1860 footnote['names'].append(name)
1861 self.document.note_footnote(footnote)
1862 if name:
1863 self.document.note_explicit_target(footnote, footnote)
1864 else:
1865 self.document.set_id(footnote, footnote)
1866 if indented:
1867 self.nested_parse(indented, input_offset=offset, node=footnote)
1868 return [footnote], blank_finish
1869
1870 def citation(self, match):
1871 src, srcline = self.state_machine.get_source_and_line()
1872 indented, indent, offset, blank_finish = \
1873 self.state_machine.get_first_known_indented(match.end())
1874 label = match.group(1)
1875 name = normalize_name(label)
1876 citation = nodes.citation('\n'.join(indented))
1877 citation.source = src
1878 citation.line = srcline
1879 citation += nodes.label('', label)
1880 citation['names'].append(name)
1881 self.document.note_citation(citation)
1882 self.document.note_explicit_target(citation, citation)
1883 if indented:
1884 self.nested_parse(indented, input_offset=offset, node=citation)
1885 return [citation], blank_finish
1886
1887 def hyperlink_target(self, match):
1888 pattern = self.explicit.patterns.target
1889 lineno = self.state_machine.abs_line_number()
1890 block, indent, offset, blank_finish = \
1891 self.state_machine.get_first_known_indented(
1892 match.end(), until_blank=True, strip_indent=False)
1893 blocktext = match.string[:match.end()] + '\n'.join(block)
1894 block = [escape2null(line) for line in block]
1895 escaped = block[0]
1896 blockindex = 0
1897 while True:
1898 targetmatch = pattern.match(escaped)
1899 if targetmatch:
1900 break
1901 blockindex += 1
1902 try:
1903 escaped += block[blockindex]
1904 except IndexError:
1905 raise MarkupError('malformed hyperlink target.')
1906 del block[:blockindex]
1907 block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
1908 target = self.make_target(block, blocktext, lineno,
1909 targetmatch.group('name'))
1910 return [target], blank_finish
1911
1912 def make_target(self, block, block_text, lineno, target_name):
1913 target_type, data = self.parse_target(block, block_text, lineno)
1914 if target_type == 'refname':
1915 target = nodes.target(block_text, '', refname=normalize_name(data))
1916 target.indirect_reference_name = data
1917 self.add_target(target_name, '', target, lineno)
1918 self.document.note_indirect_target(target)
1919 return target
1920 elif target_type == 'refuri':
1921 target = nodes.target(block_text, '')
1922 self.add_target(target_name, data, target, lineno)
1923 return target
1924 else:
1925 return data
1926
1927 def parse_target(self, block, block_text, lineno):
1928 """
1929 Determine the type of reference of a target.
1930
1931 :Return: A 2-tuple, one of:
1932
1933 - 'refname' and the indirect reference name
1934 - 'refuri' and the URI
1935 - 'malformed' and a system_message node
1936 """
1937 if block and block[-1].strip()[-1:] == '_': # possible indirect target
1938 reference = ' '.join([line.strip() for line in block])
1939 refname = self.is_reference(reference)
1940 if refname:
1941 return 'refname', refname
1942 reference = ''.join([''.join(line.split()) for line in block])
1943 return 'refuri', unescape(reference)
1944
1945 def is_reference(self, reference):
1946 match = self.explicit.patterns.reference.match(
1947 whitespace_normalize_name(reference))
1948 if not match:
1949 return None
1950 return unescape(match.group('simple') or match.group('phrase'))
1951
1952 def add_target(self, targetname, refuri, target, lineno):
1953 target.line = lineno
1954 if targetname:
1955 name = normalize_name(unescape(targetname))
1956 target['names'].append(name)
1957 if refuri:
1958 uri = self.inliner.adjust_uri(refuri)
1959 if uri:
1960 target['refuri'] = uri
1961 else:
1962 raise ApplicationError('problem with URI: %r' % refuri)
1963 self.document.note_explicit_target(target, self.parent)
1964 else: # anonymous target
1965 if refuri:
1966 target['refuri'] = refuri
1967 target['anonymous'] = 1
1968 self.document.note_anonymous_target(target)
1969
1970 def substitution_def(self, match):
1971 pattern = self.explicit.patterns.substitution
1972 src, srcline = self.state_machine.get_source_and_line()
1973 block, indent, offset, blank_finish = \
1974 self.state_machine.get_first_known_indented(match.end(),
1975 strip_indent=False)
1976 blocktext = (match.string[:match.end()] + '\n'.join(block))
1977 block.disconnect()
1978 escaped = escape2null(block[0].rstrip())
1979 blockindex = 0
1980 while True:
1981 subdefmatch = pattern.match(escaped)
1982 if subdefmatch:
1983 break
1984 blockindex += 1
1985 try:
1986 escaped = escaped + ' ' + escape2null(block[blockindex].strip())
1987 except IndexError:
1988 raise MarkupError('malformed substitution definition.')
1989 del block[:blockindex] # strip out the substitution marker
1990 block[0] = (block[0].strip() + ' ')[subdefmatch.end()-len(escaped)-1:-1]
1991 if not block[0]:
1992 del block[0]
1993 offset += 1
1994 while block and not block[-1].strip():
1995 block.pop()
1996 subname = subdefmatch.group('name')
1997 substitution_node = nodes.substitution_definition(blocktext)
1998 substitution_node.source = src
1999 substitution_node.line = srcline
2000 if not block:
2001 msg = self.reporter.warning(
2002 'Substitution definition "%s" missing contents.' % subname,
2003 nodes.literal_block(blocktext, blocktext),
2004 source=src, line=srcline)
2005 return [msg], blank_finish
2006 block[0] = block[0].strip()
2007 substitution_node['names'].append(
2008 nodes.whitespace_normalize_name(subname))
2009 new_abs_offset, blank_finish = self.nested_list_parse(
2010 block, input_offset=offset, node=substitution_node,
2011 initial_state='SubstitutionDef', blank_finish=blank_finish)
2012 i = 0
2013 for node in substitution_node[:]:
2014 if not (isinstance(node, nodes.Inline) or
2015 isinstance(node, nodes.Text)):
2016 self.parent += substitution_node[i]
2017 del substitution_node[i]
2018 else:
2019 i += 1
2020 for node in substitution_node.traverse(nodes.Element):
2021 if self.disallowed_inside_substitution_definitions(node):
2022 pformat = nodes.literal_block('', node.pformat().rstrip())
2023 msg = self.reporter.error(
2024 'Substitution definition contains illegal element:',
2025 pformat, nodes.literal_block(blocktext, blocktext),
2026 source=src, line=srcline)
2027 return [msg], blank_finish
2028 if len(substitution_node) == 0:
2029 msg = self.reporter.warning(
2030 'Substitution definition "%s" empty or invalid.' % subname,
2031 nodes.literal_block(blocktext, blocktext),
2032 source=src, line=srcline)
2033 return [msg], blank_finish
2034 self.document.note_substitution_def(
2035 substitution_node, subname, self.parent)
2036 return [substitution_node], blank_finish
2037
2038 def disallowed_inside_substitution_definitions(self, node):
2039 if (node['ids'] or
2040 isinstance(node, nodes.reference) and node.get('anonymous') or
2041 isinstance(node, nodes.footnote_reference) and node.get('auto')):
2042 return 1
2043 else:
2044 return 0
2045
2046 def directive(self, match, **option_presets):
2047 """Returns a 2-tuple: list of nodes, and a "blank finish" boolean."""
2048 type_name = match.group(1)
2049 directive_class, messages = directives.directive(
2050 type_name, self.memo.language, self.document)
2051 self.parent += messages
2052 if directive_class:
2053 return self.run_directive(
2054 directive_class, match, type_name, option_presets)
2055 else:
2056 return self.unknown_directive(type_name)
2057
2058 def run_directive(self, directive, match, type_name, option_presets):
2059 """
2060 Parse a directive then run its directive function.
2061
2062 Parameters:
2063
2064 - `directive`: The class implementing the directive. Must be
2065 a subclass of `rst.Directive`.
2066
2067 - `match`: A regular expression match object which matched the first
2068 line of the directive.
2069
2070 - `type_name`: The directive name, as used in the source text.
2071
2072 - `option_presets`: A dictionary of preset options, defaults for the
2073 directive options. Currently, only an "alt" option is passed by
2074 substitution definitions (value: the substitution name), which may
2075 be used by an embedded image directive.
2076
2077 Returns a 2-tuple: list of nodes, and a "blank finish" boolean.
2078 """
2079 if isinstance(directive, (FunctionType, MethodType)):
2080 from docutils.parsers.rst import convert_directive_function
2081 directive = convert_directive_function(directive)
2082 lineno = self.state_machine.abs_line_number()
2083 initial_line_offset = self.state_machine.line_offset
2084 indented, indent, line_offset, blank_finish \
2085 = self.state_machine.get_first_known_indented(match.end(),
2086 strip_top=0)
2087 block_text = '\n'.join(self.state_machine.input_lines[
2088 initial_line_offset : self.state_machine.line_offset + 1])
2089 try:
2090 arguments, options, content, content_offset = (
2091 self.parse_directive_block(indented, line_offset,
2092 directive, option_presets))
2093 except MarkupError, detail:
2094 error = self.reporter.error(
2095 'Error in "%s" directive:\n%s.' % (type_name,
2096 ' '.join(detail.args)),
2097 nodes.literal_block(block_text, block_text), line=lineno)
2098 return [error], blank_finish
2099 directive_instance = directive(
2100 type_name, arguments, options, content, lineno,
2101 content_offset, block_text, self, self.state_machine)
2102 try:
2103 result = directive_instance.run()
2104 except docutils.parsers.rst.DirectiveError, error:
2105 msg_node = self.reporter.system_message(error.level, error.msg,
2106 line=lineno)
2107 msg_node += nodes.literal_block(block_text, block_text)
2108 result = [msg_node]
2109 assert isinstance(result, list), \
2110 'Directive "%s" must return a list of nodes.' % type_name
2111 for i in range(len(result)):
2112 assert isinstance(result[i], nodes.Node), \
2113 ('Directive "%s" returned non-Node object (index %s): %r'
2114 % (type_name, i, result[i]))
2115 return (result,
2116 blank_finish or self.state_machine.is_next_line_blank())
2117
2118 def parse_directive_block(self, indented, line_offset, directive,
2119 option_presets):
2120 option_spec = directive.option_spec
2121 has_content = directive.has_content
2122 if indented and not indented[0].strip():
2123 indented.trim_start()
2124 line_offset += 1
2125 while indented and not indented[-1].strip():
2126 indented.trim_end()
2127 if indented and (directive.required_arguments
2128 or directive.optional_arguments
2129 or option_spec):
2130 for i, line in enumerate(indented):
2131 if not line.strip():
2132 break
2133 else:
2134 i += 1
2135 arg_block = indented[:i]
2136 content = indented[i+1:]
2137 content_offset = line_offset + i + 1
2138 else:
2139 content = indented
2140 content_offset = line_offset
2141 arg_block = []
2142 if option_spec:
2143 options, arg_block = self.parse_directive_options(
2144 option_presets, option_spec, arg_block)
2145 else:
2146 options = {}
2147 if arg_block and not (directive.required_arguments
2148 or directive.optional_arguments):
2149 content = arg_block + indented[i:]
2150 content_offset = line_offset
2151 arg_block = []
2152 while content and not content[0].strip():
2153 content.trim_start()
2154 content_offset += 1
2155 if directive.required_arguments or directive.optional_arguments:
2156 arguments = self.parse_directive_arguments(
2157 directive, arg_block)
2158 else:
2159 arguments = []
2160 if content and not has_content:
2161 raise MarkupError('no content permitted')
2162 return (arguments, options, content, content_offset)
2163
2164 def parse_directive_options(self, option_presets, option_spec, arg_block):
2165 options = option_presets.copy()
2166 for i, line in enumerate(arg_block):
2167 if re.match(Body.patterns['field_marker'], line):
2168 opt_block = arg_block[i:]
2169 arg_block = arg_block[:i]
2170 break
2171 else:
2172 opt_block = []
2173 if opt_block:
2174 success, data = self.parse_extension_options(option_spec,
2175 opt_block)
2176 if success: # data is a dict of options
2177 options.update(data)
2178 else: # data is an error string
2179 raise MarkupError(data)
2180 return options, arg_block
2181
2182 def parse_directive_arguments(self, directive, arg_block):
2183 required = directive.required_arguments
2184 optional = directive.optional_arguments
2185 arg_text = '\n'.join(arg_block)
2186 arguments = arg_text.split()
2187 if len(arguments) < required:
2188 raise MarkupError('%s argument(s) required, %s supplied'
2189 % (required, len(arguments)))
2190 elif len(arguments) > required + optional:
2191 if directive.final_argument_whitespace:
2192 arguments = arg_text.split(None, required + optional - 1)
2193 else:
2194 raise MarkupError(
2195 'maximum %s argument(s) allowed, %s supplied'
2196 % (required + optional, len(arguments)))
2197 return arguments
2198
2199 def parse_extension_options(self, option_spec, datalines):
2200 """
2201 Parse `datalines` for a field list containing extension options
2202 matching `option_spec`.
2203
2204 :Parameters:
2205 - `option_spec`: a mapping of option name to conversion
2206 function, which should raise an exception on bad input.
2207 - `datalines`: a list of input strings.
2208
2209 :Return:
2210 - Success value, 1 or 0.
2211 - An option dictionary on success, an error string on failure.
2212 """
2213 node = nodes.field_list()
2214 newline_offset, blank_finish = self.nested_list_parse(
2215 datalines, 0, node, initial_state='ExtensionOptions',
2216 blank_finish=True)
2217 if newline_offset != len(datalines): # incomplete parse of block
2218 return 0, 'invalid option block'
2219 try:
2220 options = utils.extract_extension_options(node, option_spec)
2221 except KeyError, detail:
2222 return 0, ('unknown option: "%s"' % detail.args[0])
2223 except (ValueError, TypeError), detail:
2224 return 0, ('invalid option value: %s' % ' '.join(detail.args))
2225 except utils.ExtensionOptionError, detail:
2226 return 0, ('invalid option data: %s' % ' '.join(detail.args))
2227 if blank_finish:
2228 return 1, options
2229 else:
2230 return 0, 'option data incompletely parsed'
2231
2232 def unknown_directive(self, type_name):
2233 lineno = self.state_machine.abs_line_number()
2234 indented, indent, offset, blank_finish = \
2235 self.state_machine.get_first_known_indented(0, strip_indent=False)
2236 text = '\n'.join(indented)
2237 error = self.reporter.error(
2238 'Unknown directive type "%s".' % type_name,
2239 nodes.literal_block(text, text), line=lineno)
2240 return [error], blank_finish
2241
2242 def comment(self, match):
2243 if not match.string[match.end():].strip() \
2244 and self.state_machine.is_next_line_blank(): # an empty comment?
2245 return [nodes.comment()], 1 # "A tiny but practical wart."
2246 indented, indent, offset, blank_finish = \
2247 self.state_machine.get_first_known_indented(match.end())
2248 while indented and not indented[-1].strip():
2249 indented.trim_end()
2250 text = '\n'.join(indented)
2251 return [nodes.comment(text, text)], blank_finish
2252
2253 explicit.constructs = [
2254 (footnote,
2255 re.compile(r"""
2256 \.\.[ ]+ # explicit markup start
2257 \[
2258 ( # footnote label:
2259 [0-9]+ # manually numbered footnote
2260 | # *OR*
2261 \# # anonymous auto-numbered footnote
2262 | # *OR*
2263 \#%s # auto-number ed?) footnote label
2264 | # *OR*
2265 \* # auto-symbol footnote
2266 )
2267 \]
2268 ([ ]+|$) # whitespace or end of line
2269 """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
2270 (citation,
2271 re.compile(r"""
2272 \.\.[ ]+ # explicit markup start
2273 \[(%s)\] # citation label
2274 ([ ]+|$) # whitespace or end of line
2275 """ % Inliner.simplename, re.VERBOSE | re.UNICODE)),
2276 (hyperlink_target,
2277 re.compile(r"""
2278 \.\.[ ]+ # explicit markup start
2279 _ # target indicator
2280 (?![ ]|$) # first char. not space or EOL
2281 """, re.VERBOSE | re.UNICODE)),
2282 (substitution_def,
2283 re.compile(r"""
2284 \.\.[ ]+ # explicit markup start
2285 \| # substitution indicator
2286 (?![ ]|$) # first char. not space or EOL
2287 """, re.VERBOSE | re.UNICODE)),
2288 (directive,
2289 re.compile(r"""
2290 \.\.[ ]+ # explicit markup start
2291 (%s) # directive name
2292 [ ]? # optional space
2293 :: # directive delimiter
2294 ([ ]+|$) # whitespace or end of line
2295 """ % Inliner.simplename, re.VERBOSE | re.UNICODE))]
2296
2297 def explicit_markup(self, match, context, next_state):
2298 """Footnotes, hyperlink targets, directives, comments."""
2299 nodelist, blank_finish = self.explicit_construct(match)
2300 self.parent += nodelist
2301 self.explicit_list(blank_finish)
2302 return [], next_state, []
2303
2304 def explicit_construct(self, match):
2305 """Determine which explicit construct this is, parse & return it."""
2306 errors = []
2307 for method, pattern in self.explicit.constructs:
2308 expmatch = pattern.match(match.string)
2309 if expmatch:
2310 try:
2311 return method(self, expmatch)
2312 except MarkupError, error:
2313 lineno = self.state_machine.abs_line_number()
2314 message = ' '.join(error.args)
2315 errors.append(self.reporter.warning(message, line=lineno))
2316 break
2317 nodelist, blank_finish = self.comment(match)
2318 return nodelist + errors, blank_finish
2319
2320 def explicit_list(self, blank_finish):
2321 """
2322 Create a nested state machine for a series of explicit markup
2323 constructs (including anonymous hyperlink targets).
2324 """
2325 offset = self.state_machine.line_offset + 1 # next line
2326 newline_offset, blank_finish = self.nested_list_parse(
2327 self.state_machine.input_lines[offset:],
2328 input_offset=self.state_machine.abs_line_offset() + 1,
2329 node=self.parent, initial_state='Explicit',
2330 blank_finish=blank_finish,
2331 match_titles=self.state_machine.match_titles)
2332 self.goto_line(newline_offset)
2333 if not blank_finish:
2334 self.parent += self.unindent_warning('Explicit markup')
2335
2336 def anonymous(self, match, context, next_state):
2337 """Anonymous hyperlink targets."""
2338 nodelist, blank_finish = self.anonymous_target(match)
2339 self.parent += nodelist
2340 self.explicit_list(blank_finish)
2341 return [], next_state, []
2342
2343 def anonymous_target(self, match):
2344 lineno = self.state_machine.abs_line_number()
2345 block, indent, offset, blank_finish \
2346 = self.state_machine.get_first_known_indented(match.end(),
2347 until_blank=True)
2348 blocktext = match.string[:match.end()] + '\n'.join(block)
2349 block = [escape2null(line) for line in block]
2350 target = self.make_target(block, blocktext, lineno, '')
2351 return [target], blank_finish
2352
2353 def line(self, match, context, next_state):
2354 """Section title overline or transition marker."""
2355 if self.state_machine.match_titles:
2356 return [match.string], 'Line', []
2357 elif match.string.strip() == '::':
2358 raise statemachine.TransitionCorrection('text')
2359 elif len(match.string.strip()) < 4:
2360 msg = self.reporter.info(
2361 'Unexpected possible title overline or transition.\n'
2362 "Treating it as ordinary text because it's so short.",
2363 line=self.state_machine.abs_line_number())
2364 self.parent += msg
2365 raise statemachine.TransitionCorrection('text')
2366 else:
2367 blocktext = self.state_machine.line
2368 msg = self.reporter.severe(
2369 'Unexpected section title or transition.',
2370 nodes.literal_block(blocktext, blocktext),
2371 line=self.state_machine.abs_line_number())
2372 self.parent += msg
2373 return [], next_state, []
2374
2375 def text(self, match, context, next_state):
2376 """Titles, definition lists, paragraphs."""
2377 return [match.string], 'Text', []
2378
2379
2380 class RFC2822Body(Body):
2381
2382 """
2383 RFC2822 headers are only valid as the first constructs in documents. As
2384 soon as anything else appears, the `Body` state should take over.
2385 """
2386
2387 patterns = Body.patterns.copy() # can't modify the original
2388 patterns['rfc2822'] = r'[!-9;-~]+:( +|$)'
2389 initial_transitions = [(name, 'Body')
2390 for name in Body.initial_transitions]
2391 initial_transitions.insert(-1, ('rfc2822', 'Body')) # just before 'text'
2392
2393 def rfc2822(self, match, context, next_state):
2394 """RFC2822-style field list item."""
2395 fieldlist = nodes.field_list(classes=['rfc2822'])
2396 self.parent += fieldlist
2397 field, blank_finish = self.rfc2822_field(match)
2398 fieldlist += field
2399 offset = self.state_machine.line_offset + 1 # next line
2400 newline_offset, blank_finish = self.nested_list_parse(
2401 self.state_machine.input_lines[offset:],
2402 input_offset=self.state_machine.abs_line_offset() + 1,
2403 node=fieldlist, initial_state='RFC2822List',
2404 blank_finish=blank_finish)
2405 self.goto_line(newline_offset)
2406 if not blank_finish:
2407 self.parent += self.unindent_warning(
2408 'RFC2822-style field list')
2409 return [], next_state, []
2410
2411 def rfc2822_field(self, match):
2412 name = match.string[:match.string.find(':')]
2413 indented, indent, line_offset, blank_finish = \
2414 self.state_machine.get_first_known_indented(match.end(),
2415 until_blank=True)
2416 fieldnode = nodes.field()
2417 fieldnode += nodes.field_name(name, name)
2418 fieldbody = nodes.field_body('\n'.join(indented))
2419 fieldnode += fieldbody
2420 if indented:
2421 self.nested_parse(indented, input_offset=line_offset,
2422 node=fieldbody)
2423 return fieldnode, blank_finish
2424
2425
2426 class SpecializedBody(Body):
2427
2428 """
2429 Superclass for second and subsequent compound element members. Compound
2430 elements are lists and list-like constructs.
2431
2432 All transition methods are disabled (redefined as `invalid_input`).
2433 Override individual methods in subclasses to re-enable.
2434
2435 For example, once an initial bullet list item, say, is recognized, the
2436 `BulletList` subclass takes over, with a "bullet_list" node as its
2437 container. Upon encountering the initial bullet list item, `Body.bullet`
2438 calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which
2439 starts up a nested parsing session with `BulletList` as the initial state.
2440 Only the ``bullet`` transition method is enabled in `BulletList`; as long
2441 as only bullet list items are encountered, they are parsed and inserted
2442 into the container. The first construct which is *not* a bullet list item
2443 triggers the `invalid_input` method, which ends the nested parse and
2444 closes the container. `BulletList` needs to recognize input that is
2445 invalid in the context of a bullet list, which means everything *other
2446 than* bullet list items, so it inherits the transition list created in
2447 `Body`.
2448 """
2449
2450 def invalid_input(self, match=None, context=None, next_state=None):
2451 """Not a compound element member. Abort this state machine."""
2452 self.state_machine.previous_line() # back up so parent SM can reassess
2453 raise EOFError
2454
2455 indent = invalid_input
2456 bullet = invalid_input
2457 enumerator = invalid_input
2458 field_marker = invalid_input
2459 option_marker = invalid_input
2460 doctest = invalid_input
2461 line_block = invalid_input
2462 grid_table_top = invalid_input
2463 simple_table_top = invalid_input
2464 explicit_markup = invalid_input
2465 anonymous = invalid_input
2466 line = invalid_input
2467 text = invalid_input
2468
2469
2470 class BulletList(SpecializedBody):
2471
2472 """Second and subsequent bullet_list list_items."""
2473
2474 def bullet(self, match, context, next_state):
2475 """Bullet list item."""
2476 if match.string[0] != self.parent['bullet']:
2477 # different bullet: new list
2478 self.invalid_input()
2479 listitem, blank_finish = self.list_item(match.end())
2480 self.parent += listitem
2481 self.blank_finish = blank_finish
2482 return [], next_state, []
2483
2484
2485 class DefinitionList(SpecializedBody):
2486
2487 """Second and subsequent definition_list_items."""
2488
2489 def text(self, match, context, next_state):
2490 """Definition lists."""
2491 return [match.string], 'Definition', []
2492
2493
2494 class EnumeratedList(SpecializedBody):
2495
2496 """Second and subsequent enumerated_list list_items."""
2497
2498 def enumerator(self, match, context, next_state):
2499 """Enumerated list item."""
2500 format, sequence, text, ordinal = self.parse_enumerator(
2501 match, self.parent['enumtype'])
2502 if ( format != self.format
2503 or (sequence != '#' and (sequence != self.parent['enumtype']
2504 or self.auto
2505 or ordinal != (self.lastordinal + 1)))
2506 or not self.is_enumerated_list_item(ordinal, sequence, format)):
2507 # different enumeration: new list
2508 self.invalid_input()
2509 if sequence == '#':
2510 self.auto = 1
2511 listitem, blank_finish = self.list_item(match.end())
2512 self.parent += listitem
2513 self.blank_finish = blank_finish
2514 self.lastordinal = ordinal
2515 return [], next_state, []
2516
2517
2518 class FieldList(SpecializedBody):
2519
2520 """Second and subsequent field_list fields."""
2521
2522 def field_marker(self, match, context, next_state):
2523 """Field list field."""
2524 field, blank_finish = self.field(match)
2525 self.parent += field
2526 self.blank_finish = blank_finish
2527 return [], next_state, []
2528
2529
2530 class OptionList(SpecializedBody):
2531
2532 """Second and subsequent option_list option_list_items."""
2533
2534 def option_marker(self, match, context, next_state):
2535 """Option list item."""
2536 try:
2537 option_list_item, blank_finish = self.option_list_item(match)
2538 except MarkupError:
2539 self.invalid_input()
2540 self.parent += option_list_item
2541 self.blank_finish = blank_finish
2542 return [], next_state, []
2543
2544
2545 class RFC2822List(SpecializedBody, RFC2822Body):
2546
2547 """Second and subsequent RFC2822-style field_list fields."""
2548
2549 patterns = RFC2822Body.patterns
2550 initial_transitions = RFC2822Body.initial_transitions
2551
2552 def rfc2822(self, match, context, next_state):
2553 """RFC2822-style field list item."""
2554 field, blank_finish = self.rfc2822_field(match)
2555 self.parent += field
2556 self.blank_finish = blank_finish
2557 return [], 'RFC2822List', []
2558
2559 blank = SpecializedBody.invalid_input
2560
2561
2562 class ExtensionOptions(FieldList):
2563
2564 """
2565 Parse field_list fields for extension options.
2566
2567 No nested parsing is done (including inline markup parsing).
2568 """
2569
2570 def parse_field_body(self, indented, offset, node):
2571 """Override `Body.parse_field_body` for simpler parsing."""
2572 lines = []
2573 for line in list(indented) + ['']:
2574 if line.strip():
2575 lines.append(line)
2576 elif lines:
2577 text = '\n'.join(lines)
2578 node += nodes.paragraph(text, text)
2579 lines = []
2580
2581
2582 class LineBlock(SpecializedBody):
2583
2584 """Second and subsequent lines of a line_block."""
2585
2586 blank = SpecializedBody.invalid_input
2587
2588 def line_block(self, match, context, next_state):
2589 """New line of line block."""
2590 lineno = self.state_machine.abs_line_number()
2591 line, messages, blank_finish = self.line_block_line(match, lineno)
2592 self.parent += line
2593 self.parent.parent += messages
2594 self.blank_finish = blank_finish
2595 return [], next_state, []
2596
2597
2598 class Explicit(SpecializedBody):
2599
2600 """Second and subsequent explicit markup construct."""
2601
2602 def explicit_markup(self, match, context, next_state):
2603 """Footnotes, hyperlink targets, directives, comments."""
2604 nodelist, blank_finish = self.explicit_construct(match)
2605 self.parent += nodelist
2606 self.blank_finish = blank_finish
2607 return [], next_state, []
2608
2609 def anonymous(self, match, context, next_state):
2610 """Anonymous hyperlink targets."""
2611 nodelist, blank_finish = self.anonymous_target(match)
2612 self.parent += nodelist
2613 self.blank_finish = blank_finish
2614 return [], next_state, []
2615
2616 blank = SpecializedBody.invalid_input
2617
2618
2619 class SubstitutionDef(Body):
2620
2621 """
2622 Parser for the contents of a substitution_definition element.
2623 """
2624
2625 patterns = {
2626 'embedded_directive': re.compile(r'(%s)::( +|$)'
2627 % Inliner.simplename, re.UNICODE),
2628 'text': r''}
2629 initial_transitions = ['embedded_directive', 'text']
2630
2631 def embedded_directive(self, match, context, next_state):
2632 nodelist, blank_finish = self.directive(match,
2633 alt=self.parent['names'][0])
2634 self.parent += nodelist
2635 if not self.state_machine.at_eof():
2636 self.blank_finish = blank_finish
2637 raise EOFError
2638
2639 def text(self, match, context, next_state):
2640 if not self.state_machine.at_eof():
2641 self.blank_finish = self.state_machine.is_next_line_blank()
2642 raise EOFError
2643
2644
2645 class Text(RSTState):
2646
2647 """
2648 Classifier of second line of a text block.
2649
2650 Could be a paragraph, a definition list item, or a title.
2651 """
2652
2653 patterns = {'underline': Body.patterns['line'],
2654 'text': r''}
2655 initial_transitions = [('underline', 'Body'), ('text', 'Body')]
2656
2657 def blank(self, match, context, next_state):
2658 """End of paragraph."""
2659 # NOTE: self.paragraph returns [ node, system_message(s) ], literalnext
2660 paragraph, literalnext = self.paragraph(
2661 context, self.state_machine.abs_line_number() - 1)
2662 self.parent += paragraph
2663 if literalnext:
2664 self.parent += self.literal_block()
2665 return [], 'Body', []
2666
2667 def eof(self, context):
2668 if context:
2669 self.blank(None, context, None)
2670 return []
2671
2672 def indent(self, match, context, next_state):
2673 """Definition list item."""
2674 definitionlist = nodes.definition_list()
2675 definitionlistitem, blank_finish = self.definition_list_item(context)
2676 definitionlist += definitionlistitem
2677 self.parent += definitionlist
2678 offset = self.state_machine.line_offset + 1 # next line
2679 newline_offset, blank_finish = self.nested_list_parse(
2680 self.state_machine.input_lines[offset:],
2681 input_offset=self.state_machine.abs_line_offset() + 1,
2682 node=definitionlist, initial_state='DefinitionList',
2683 blank_finish=blank_finish, blank_finish_state='Definition')
2684 self.goto_line(newline_offset)
2685 if not blank_finish:
2686 self.parent += self.unindent_warning('Definition list')
2687 return [], 'Body', []
2688
2689 def underline(self, match, context, next_state):
2690 """Section title."""
2691 lineno = self.state_machine.abs_line_number()
2692 title = context[0].rstrip()
2693 underline = match.string.rstrip()
2694 source = title + '\n' + underline
2695 messages = []
2696 if column_width(title) > len(underline):
2697 if len(underline) < 4:
2698 if self.state_machine.match_titles:
2699 msg = self.reporter.info(
2700 'Possible title underline, too short for the title.\n'
2701 "Treating it as ordinary text because it's so short.",
2702 line=lineno)
2703 self.parent += msg
2704 raise statemachine.TransitionCorrection('text')
2705 else:
2706 blocktext = context[0] + '\n' + self.state_machine.line
2707 msg = self.reporter.warning('Title underline too short.',
2708 nodes.literal_block(blocktext, blocktext), line=lineno)
2709 messages.append(msg)
2710 if not self.state_machine.match_titles:
2711 blocktext = context[0] + '\n' + self.state_machine.line
2712 # We need get_source_and_line() here to report correctly
2713 src, srcline = self.state_machine.get_source_and_line()
2714 # TODO: why is abs_line_number() == srcline+1
2715 # if the error is in a table (try with test_tables.py)?
2716 # print "get_source_and_line", srcline
2717 # print "abs_line_number", self.state_machine.abs_line_number()
2718 msg = self.reporter.severe('Unexpected section title.',
2719 nodes.literal_block(blocktext, blocktext),
2720 source=src, line=srcline)
2721 self.parent += messages
2722 self.parent += msg
2723 return [], next_state, []
2724 style = underline[0]
2725 context[:] = []
2726 self.section(title, source, style, lineno - 1, messages)
2727 return [], next_state, []
2728
2729 def text(self, match, context, next_state):
2730 """Paragraph."""
2731 startline = self.state_machine.abs_line_number() - 1
2732 msg = None
2733 try:
2734 block = self.state_machine.get_text_block(flush_left=True)
2735 except statemachine.UnexpectedIndentationError, err:
2736 block, src, srcline = err.args
2737 msg = self.reporter.error('Unexpected indentation.',
2738 source=src, line=srcline)
2739 lines = context + list(block)
2740 paragraph, literalnext = self.paragraph(lines, startline)
2741 self.parent += paragraph
2742 self.parent += msg
2743 if literalnext:
2744 try:
2745 self.state_machine.next_line()
2746 except EOFError:
2747 pass
2748 self.parent += self.literal_block()
2749 return [], next_state, []
2750
2751 def literal_block(self):
2752 """Return a list of nodes."""
2753 indented, indent, offset, blank_finish = \
2754 self.state_machine.get_indented()
2755 while indented and not indented[-1].strip():
2756 indented.trim_end()
2757 if not indented:
2758 return self.quoted_literal_block()
2759 data = '\n'.join(indented)
2760 literal_block = nodes.literal_block(data, data)
2761 literal_block.line = offset + 1
2762 nodelist = [literal_block]
2763 if not blank_finish:
2764 nodelist.append(self.unindent_warning('Literal block'))
2765 return nodelist
2766
2767 def quoted_literal_block(self):
2768 abs_line_offset = self.state_machine.abs_line_offset()
2769 offset = self.state_machine.line_offset
2770 parent_node = nodes.Element()
2771 new_abs_offset = self.nested_parse(
2772 self.state_machine.input_lines[offset:],
2773 input_offset=abs_line_offset, node=parent_node, match_titles=False,
2774 state_machine_kwargs={'state_classes': (QuotedLiteralBlock,),
2775 'initial_state': 'QuotedLiteralBlock'})
2776 self.goto_line(new_abs_offset)
2777 return parent_node.children
2778
2779 def definition_list_item(self, termline):
2780 indented, indent, line_offset, blank_finish = \
2781 self.state_machine.get_indented()
2782 itemnode = nodes.definition_list_item(
2783 '\n'.join(termline + list(indented)))
2784 lineno = self.state_machine.abs_line_number() - 1
2785 (itemnode.source,
2786 itemnode.line) = self.state_machine.get_source_and_line(lineno)
2787 termlist, messages = self.term(termline, lineno)
2788 itemnode += termlist
2789 definition = nodes.definition('', *messages)
2790 itemnode += definition
2791 if termline[0][-2:] == '::':
2792 definition += self.reporter.info(
2793 'Blank line missing before literal block (after the "::")? '
2794 'Interpreted as a definition list item.',
2795 line=lineno+1)
2796 self.nested_parse(indented, input_offset=line_offset, node=definition)
2797 return itemnode, blank_finish
2798
2799 classifier_delimiter = re.compile(' +: +')
2800
2801 def term(self, lines, lineno):
2802 """Return a definition_list's term and optional classifiers."""
2803 assert len(lines) == 1
2804 text_nodes, messages = self.inline_text(lines[0], lineno)
2805 term_node = nodes.term()
2806 (term_node.source,
2807 term_node.line) = self.state_machine.get_source_and_line(lineno)
2808 term_node.rawsource = unescape(lines[0])
2809 node_list = [term_node]
2810 for i in range(len(text_nodes)):
2811 node = text_nodes[i]
2812 if isinstance(node, nodes.Text):
2813 parts = self.classifier_delimiter.split(node.rawsource)
2814 if len(parts) == 1:
2815 node_list[-1] += node
2816 else:
2817
2818 node_list[-1] += nodes.Text(parts[0].rstrip())
2819 for part in parts[1:]:
2820 classifier_node = nodes.classifier('', part)
2821 node_list.append(classifier_node)
2822 else:
2823 node_list[-1] += node
2824 return node_list, messages
2825
2826
2827 class SpecializedText(Text):
2828
2829 """
2830 Superclass for second and subsequent lines of Text-variants.
2831
2832 All transition methods are disabled. Override individual methods in
2833 subclasses to re-enable.
2834 """
2835
2836 def eof(self, context):
2837 """Incomplete construct."""
2838 return []
2839
2840 def invalid_input(self, match=None, context=None, next_state=None):
2841 """Not a compound element member. Abort this state machine."""
2842 raise EOFError
2843
2844 blank = invalid_input
2845 indent = invalid_input
2846 underline = invalid_input
2847 text = invalid_input
2848
2849
2850 class Definition(SpecializedText):
2851
2852 """Second line of potential definition_list_item."""
2853
2854 def eof(self, context):
2855 """Not a definition."""
2856 self.state_machine.previous_line(2) # so parent SM can reassess
2857 return []
2858
2859 def indent(self, match, context, next_state):
2860 """Definition list item."""
2861 itemnode, blank_finish = self.definition_list_item(context)
2862 self.parent += itemnode
2863 self.blank_finish = blank_finish
2864 return [], 'DefinitionList', []
2865
2866
2867 class Line(SpecializedText):
2868
2869 """
2870 Second line of over- & underlined section title or transition marker.
2871 """
2872
2873 eofcheck = 1 # @@@ ???
2874 """Set to 0 while parsing sections, so that we don't catch the EOF."""
2875
2876 def eof(self, context):
2877 """Transition marker at end of section or document."""
2878 marker = context[0].strip()
2879 if self.memo.section_bubble_up_kludge:
2880 self.memo.section_bubble_up_kludge = False
2881 elif len(marker) < 4:
2882 self.state_correction(context)
2883 if self.eofcheck: # ignore EOFError with sections
2884 lineno = self.state_machine.abs_line_number() - 1
2885 transition = nodes.transition(rawsource=context[0])
2886 transition.line = lineno
2887 self.parent += transition
2888 self.eofcheck = 1
2889 return []
2890
2891 def blank(self, match, context, next_state):
2892 """Transition marker."""
2893 src, srcline = self.state_machine.get_source_and_line()
2894 marker = context[0].strip()
2895 if len(marker) < 4:
2896 self.state_correction(context)
2897 transition = nodes.transition(rawsource=marker)
2898 transition.source = src
2899 transition.line = srcline - 1
2900 self.parent += transition
2901 return [], 'Body', []
2902
2903 def text(self, match, context, next_state):
2904 """Potential over- & underlined title."""
2905 lineno = self.state_machine.abs_line_number() - 1
2906 overline = context[0]
2907 title = match.string
2908 underline = ''
2909 try:
2910 underline = self.state_machine.next_line()
2911 except EOFError:
2912 blocktext = overline + '\n' + title
2913 if len(overline.rstrip()) < 4:
2914 self.short_overline(context, blocktext, lineno, 2)
2915 else:
2916 msg = self.reporter.severe(
2917 'Incomplete section title.',
2918 nodes.literal_block(blocktext, blocktext),
2919 line=lineno)
2920 self.parent += msg
2921 return [], 'Body', []
2922 source = '%s\n%s\n%s' % (overline, title, underline)
2923 overline = overline.rstrip()
2924 underline = underline.rstrip()
2925 if not self.transitions['underline'][0].match(underline):
2926 blocktext = overline + '\n' + title + '\n' + underline
2927 if len(overline.rstrip()) < 4:
2928 self.short_overline(context, blocktext, lineno, 2)
2929 else:
2930 msg = self.reporter.severe(
2931 'Missing matching underline for section title overline.',
2932 nodes.literal_block(source, source),
2933 line=lineno)
2934 self.parent += msg
2935 return [], 'Body', []
2936 elif overline != underline:
2937 blocktext = overline + '\n' + title + '\n' + underline
2938 if len(overline.rstrip()) < 4:
2939 self.short_overline(context, blocktext, lineno, 2)
2940 else:
2941 msg = self.reporter.severe(
2942 'Title overline & underline mismatch.',
2943 nodes.literal_block(source, source),
2944 line=lineno)
2945 self.parent += msg
2946 return [], 'Body', []
2947 title = title.rstrip()
2948 messages = []
2949 if column_width(title) > len(overline):
2950 blocktext = overline + '\n' + title + '\n' + underline
2951 if len(overline.rstrip()) < 4:
2952 self.short_overline(context, blocktext, lineno, 2)
2953 else:
2954 msg = self.reporter.warning(
2955 'Title overline too short.',
2956 nodes.literal_block(source, source),
2957 line=lineno)
2958 messages.append(msg)
2959 style = (overline[0], underline[0])
2960 self.eofcheck = 0 # @@@ not sure this is correct
2961 self.section(title.lstrip(), source, style, lineno + 1, messages)
2962 self.eofcheck = 1
2963 return [], 'Body', []
2964
2965 indent = text # indented title
2966
2967 def underline(self, match, context, next_state):
2968 overline = context[0]
2969 blocktext = overline + '\n' + self.state_machine.line
2970 lineno = self.state_machine.abs_line_number() - 1
2971 if len(overline.rstrip()) < 4:
2972 self.short_overline(context, blocktext, lineno, 1)
2973 msg = self.reporter.error(
2974 'Invalid section title or transition marker.',
2975 nodes.literal_block(blocktext, blocktext),
2976 line=lineno)
2977 self.parent += msg
2978 return [], 'Body', []
2979
2980 def short_overline(self, context, blocktext, lineno, lines=1):
2981 msg = self.reporter.info(
2982 'Possible incomplete section title.\nTreating the overline as '
2983 "ordinary text because it's so short.",
2984 line=lineno)
2985 self.parent += msg
2986 self.state_correction(context, lines)
2987
2988 def state_correction(self, context, lines=1):
2989 self.state_machine.previous_line(lines)
2990 context[:] = []
2991 raise statemachine.StateCorrection('Body', 'text')
2992
2993
2994 class QuotedLiteralBlock(RSTState):
2995
2996 """
2997 Nested parse handler for quoted (unindented) literal blocks.
2998
2999 Special-purpose. Not for inclusion in `state_classes`.
3000 """
3001
3002 patterns = {'initial_quoted': r'(%(nonalphanum7bit)s)' % Body.pats,
3003 'text': r''}
3004 initial_transitions = ('initial_quoted', 'text')
3005
3006 def __init__(self, state_machine, debug=False):
3007 RSTState.__init__(self, state_machine, debug)
3008 self.messages = []
3009 self.initial_lineno = None
3010
3011 def blank(self, match, context, next_state):
3012 if context:
3013 raise EOFError
3014 else:
3015 return context, next_state, []
3016
3017 def eof(self, context):
3018 if context:
3019 src, srcline = self.state_machine.get_source_and_line(
3020 self.initial_lineno)
3021 text = '\n'.join(context)
3022 literal_block = nodes.literal_block(text, text)
3023 literal_block.source = src
3024 literal_block.line = srcline
3025 self.parent += literal_block
3026 else:
3027 self.parent += self.reporter.warning(
3028 'Literal block expected; none found.',
3029 line=self.state_machine.abs_line_number())
3030 # src not available, because statemachine.input_lines is empty
3031 self.state_machine.previous_line()
3032 self.parent += self.messages
3033 return []
3034
3035 def indent(self, match, context, next_state):
3036 assert context, ('QuotedLiteralBlock.indent: context should not '
3037 'be empty!')
3038 self.messages.append(
3039 self.reporter.error('Unexpected indentation.',
3040 line=self.state_machine.abs_line_number()))
3041 self.state_machine.previous_line()
3042 raise EOFError
3043
3044 def initial_quoted(self, match, context, next_state):
3045 """Match arbitrary quote character on the first line only."""
3046 self.remove_transition('initial_quoted')
3047 quote = match.string[0]
3048 pattern = re.compile(re.escape(quote), re.UNICODE)
3049 # New transition matches consistent quotes only:
3050 self.add_transition('quoted',
3051 (pattern, self.quoted, self.__class__.__name__))
3052 self.initial_lineno = self.state_machine.abs_line_number()
3053 return [match.string], next_state, []
3054
3055 def quoted(self, match, context, next_state):
3056 """Match consistent quotes on subsequent lines."""
3057 context.append(match.string)
3058 return context, next_state, []
3059
3060 def text(self, match, context, next_state):
3061 if context:
3062 self.messages.append(
3063 self.reporter.error('Inconsistent literal block quoting.',
3064 line=self.state_machine.abs_line_number()))
3065 self.state_machine.previous_line()
3066 raise EOFError
3067
3068
3069 state_classes = (Body, BulletList, DefinitionList, EnumeratedList, FieldList,
3070 OptionList, LineBlock, ExtensionOptions, Explicit, Text,
3071 Definition, Line, SubstitutionDef, RFC2822Body, RFC2822List)
3072 """Standard set of State classes used to start `RSTStateMachine`."""