Readers¶
Readers are responsible for the language dependent source parsing.
Reader¶
-
class
Reader
(lexer)¶ This is the base class for all readers. The public functions exposed to
Document
areprocess()
, andfilter_output()
.The main tasks for a reader is:
- Recognize lines that can contain directives. (comment lines or doc strings).
- Modify the source for language specific optimizations.
- Filter the processed output.
Parameters: lexer – A pygments lexer for the specified language re_line_start = re.compile("^", re.M) #to find the line start indices class Reader(object): #Public Methods <<Reader.__init__>> <<Reader.process>> <<Reader.filter_output>> #Protected Methods <<Reader._accept_token>> <<Reader._post_process>> <<Reader._handle_token>> <<Reader._cut_comment>>
-
__init__
(lexer, single_comment_markers, block_comment_markers)¶ The constructor initialises the language specific pygments lexer and comment markers.
Parameters: - lexer (Lexer) – The language specific pygments lexer.
- single_comment_markers (string[]) – The language specific single comment markers.
- block_comment_markers (string[]) – The language specific block comment markers.
def __init__(self, lexer, single_comment_markers, block_comment_markers): self.lexer = lexer self.single_comment_markers = single_comment_markers self.block_comment_markers = block_comment_markers
-
process
(fname, text)¶ Reads the source code and identifies the directives. This method is call by
Document
.Parameters: - fname (string) – The file name of the source code
- text (string) – The source code
Returns: A list of
Line
objects.def process(self, fname, text): text = text.replace("\t", " "*8) starts = [ mo.start() for mo in re_line_start.finditer(text) ] lines = [ Line(fname, i, l) for i, l in enumerate(text.splitlines()) ] self.lines = lines # A list of lines self.starts = starts # the start indices of the lines tokens = self.lexer.get_tokens_unprocessed(text) for index, token, value in tokens: self._handle_token(index, token, value) self._post_process(fname, text) return self.lines
-
filter_output
(lines)¶ This method is call by
Document
and gives the reader the chance to influence the final output.def filter_output(self, lines): return lines
-
_handle_token
(index, token, value)¶ Find antiweb directives in valid pygments tokens.
Parameters: - index (integer) – The index within the source code
- token – A pygments token.
- value (string) – The token value.
def _handle_token(self, index, token, value): if not self._accept_token(token): return cvalue = self._cut_comment(index, token, value) offset = value.index(cvalue) for k, v in list(directives.items()): for mo in v.expression.finditer(cvalue): li = bisect.bisect(self.starts, index+mo.start()+offset)-1 line = self.lines[li] line.directives = list(line.directives) + [ v(line.index, mo) ]
-
_cut_comment
(index, token, text)¶ Cuts of the comment identifiers.
Parameters: - index (integer) – The index within the source code
- token – A pygments token.
- text (string) – The token value.
Returns: value without comment identifiers.
def _cut_comment(self, index, token, text): return text
-
_post_process
(fname, text)¶ Does some post processing after the directives where found.
def _post_process(self, fname, text): #correct the line attribute of directives, in case there have #been lines inserted or deleted by subclasses of Reader for i, l in enumerate(self.lines): for d in l.directives: d.line = i #give the directives the chance to match for l in self.lines: for d in l.directives: d.match(self.lines)
-
_accept_token
(token)¶ Checks if the token type may contain a directive.
Parameters: token – A pygments token Returns: True
if the token may contain a directive.False
otherwise.def _accept_token(self, token): return True
CReader¶
-
class
CReader
¶ A reader for C/C++ code. This class inherits
Reader
.class CReader(Reader): def __init__(self, lexer, single_comment_markers, block_comment_markers): super(CReader, self).__init__(lexer, single_comment_markers, block_comment_markers) #For C/C++ exists only one type of single and block comment markers. #The markers are retrieved here in order to avoid iterating over the markers every time they are used. self.single_comment_marker = single_comment_markers[0] self.block_comment_marker_start = block_comment_markers[0] self.block_comment_marker_end = block_comment_markers[1] def _accept_token(self, token): return token in Token.Comment def _cut_comment(self, index, token, text): if text.startswith(self.block_comment_marker_start): text = text[2:-2] elif text.startswith(self.single_comment_marker): text = text[2:] return text def filter_output(self, lines): """ .. py:method:: filter_output(lines) See :py:meth:`Reader.filter_output`. """ for l in lines: if l.type == "d": #remove comment chars in document lines stext = l.text.lstrip() if stext == self.block_comment_marker_start or stext == self.block_comment_marker_end: #remove /* and */ from documentation lines #see the l.text.lstrip()! if the lines ends with a white space #the quotes will be kept! This is feature, to force the quotes #in the output continue if stext.startswith(self.single_comment_marker): l.text = l.indented(stext[2:]) yield l
CSharpReader¶
-
class
CSharpReader
¶ A reader for C# code. This class inherits
CReader
. The CSharpReader is needed because C# specific output filtering is applied. Compared to C, C# uses XML comments starting with /// which are reused for the final documentation. Here you can see an overview of the CSharpReader class and its methods.class CSharpReader(CReader): default_xml_block_index = -1 def __init__(self, lexer, single_comment_markers, block_comment_markers): #the call to the base class CReader is needed for initialising the comment markers super(CSharpReader, self).__init__(lexer, single_comment_markers, block_comment_markers) <<CSharpReader.strip_tags>> <<CSharpReader.get_attribute_text>> <<CSharpReader.get_stripped_xml_lines>> <<CSharpReader.create_new_lines>> <<CSharpReader.init_xml_block>> <<CSharpReader.filter_output>>
Removes all C# XML tags. The tags are replaced by their attributes and contents. This method is called recursively. This is needed for nested tags.
Examples:
<param name="arg1">value</param>
will be stripped to : “arg1 value”<para>Start <see cref="(String)"/> end</para>
will be stripped to : “Start (String) end”
Parameters: tags (Tag) – the parsed xml tags Returns: a string containing the stripped version of the tags def strip_tags(self, tags): text = "" if isinstance(tags, Tag): text = self.get_attribute_text(tags) for content in tags.contents: if not isinstance(content, NavigableString): content = self.strip_tags(content) text += content return text
-
get_attribute_text
(tag)¶ Returns the values of all XML tag attributes seperated by whitespace.
Examples:
<param name="arg1">parameterValue</param>
returns : “arg1 “<include file='file1' path='[@name="test"]' />
returns: “file1 [@name=”test”] “
Parameters: tag (Tag) – a BeautifulSoup Tag Returns: the values of all attributes concatenated def get_attribute_text(self, tag): #collect all values of xml tag attributes attributes = tag.attrs attribute_text = "" for attribute, value in attributes.items(): attribute_text = value + " " return attribute_text
-
get_stripped_xml_lines
(xml_lines_block)¶ Removes all XML comment tags from the lines in the xml_lines_block.
Parameters: xml_lines_block (Line[]) – A list containing all Line object which belong to an XML comment block. Returns: A list of all stripped XML lines. def get_stripped_xml_lines(self, xml_lines_block): #the xml_lines_block contains Line objects. As the xml parser expects a string #the Line obects have to be converted to a string. xml_text = "\n".join(map(operator.attrgetter("text"), xml_lines_block)) #the xml lines will be parsed and then all xml tags will be removed xml_tags = BeautifulSoup(xml_text, "html.parser") stripped_xml_lines = self.strip_tags(xml_tags) return stripped_xml_lines.splitlines()
-
create_new_lines
(stripped_xml_lines, index, fname)¶ This method is called after all XML comment tags of a comment block are stripped. For each new line in the stripped_xml_lines a new Line object is created.
Note that leading spaces and tabs are stripped, which means that lines do not have any indentation. Use a C# comment block if indentations should be kept.
Parameters: - stripped_xml_lines (string[]) – The comment lines were the XML tags have been stripped.
- index (int) – The starting index for the new line object.
- fname (string) – The file name of the currently processed file.
Returns: A generator containing the lines which have been created out of the stripped_xml_lines.
def create_new_lines(self, stripped_xml_lines, index, fname): for line in stripped_xml_lines: #only removing spaces and tabs --> newlines should be kept new_line = Line(fname, index, line.lstrip(' \t')) index += 1 yield new_line
-
init_xml_block
()¶ Inits the variables which are needed for collecting an XML comment block.
def init_xml_block(self): xml_lines_block = [] xml_start_index = self.default_xml_block_index return xml_lines_block, xml_start_index
-
filter_output
(lines)¶ Applies C# specific filtering for the final output. XML comment tags are replaced by their attributes and contents.
- We have to handle four cases:
- The current line is a code line: The line is added to result.
- The current line is a block comment: The line can be skipped.
- The current line is an XML comment: The line is added to the xml_lines_block.
- The current line is a single comment line: Add the line to result. If the current line is the first line after an xml comment block, the comment block is processed and its lines are added to result.
Parameters: lines (Line[]) – All lines of a file. The directives have already been replaced. Returns: A generator containing all lines for the final output. def filter_output(self, lines): #first the CReader filters the output #afterwards the CSharpReader does some C# specific filtering lines = super(CSharpReader, self).filter_output(lines) #the xml_lines_block collects xml comment lines #if the end of an xml comment block is identified, the collected xml lines are processed xml_lines_block, xml_start_index = self.init_xml_block() for l in lines: if l.type == "d": text = l.text.lstrip() if text == self.block_comment_marker_start or text == self.block_comment_marker_end: #remove /* and */ from documentation lines. see the l.text.lstrip()! #if the lines ends with a white space the quotes will be kept! #This is feature, to force the quotes in the output continue if text.startswith("/") and not text.startswith(self.block_comment_marker_start): l.text = l.indented(text[1:]) if xml_start_index == self.default_xml_block_index: #indicates that a new xml_block has started xml_start_index = l.index xml_lines_block.append(l) continue elif not xml_start_index == self.default_xml_block_index: #an xml comment block has ended, now the block is processed #at first the xml tags are stripped, afterwards a new line object is created for each #stripped line and added to the final result generator stripped_xml_lines = self.get_stripped_xml_lines(xml_lines_block) new_lines = self.create_new_lines(stripped_xml_lines, xml_start_index, l.fname) for line in new_lines: yield line #reset the xml variables for the next block xml_lines_block, xml_start_index = self.init_xml_block() yield l
PythonReader¶
-
class
PythonReader
¶ A reader for python code. This class inherits
Reader
. To reduce the number of sentinels, the python reader does some more sophisticated source parsing:A construction like:
@subst(_at_)cstart(foo) def foo(arg1, arg2): Foo's documentation code
is replaced by:
@subst(_at_)cstart(foo) def foo(arg1, arg2): @subst(_at_)start(foo doc) Foo's documentation @subst(_at_)include(foo) @subst(_at_)(foo doc) code
The replacement will be done only:
- If the doc string begins with “”“
- If the block was started by a
@rstart
or@cstart
directive - If there is no antiweb directive in the doc string.
- Only a
@cstart
will insert the @include directive.
Additionally the python reader removes all single line
"""
and@subst(triple)
from documentation lines. In the following lines:@subst(_at_)start(foo) Documentation
The
"""
are automatically removed in the rst output. (seefilter_output()
for details).class PythonReader(Reader): def __init__(self, lexer, single_comment_markers, block_comment_markers): super(PythonReader, self).__init__(lexer, single_comment_markers, block_comment_markers) self.doc_lines = [] <<PythonReader._post_process>> <<PythonReader._accept_token>> <<PythonReader._cut_comment>> <<PythonReader.filter_output>>
-
_post_process
(fname, text)¶ -
This implementation decorates doc strings with antiweb directives.
def _post_process(self, fname, text): #from behind because we will probably insert some lines self.doc_lines.sort(reverse=True) #handle each found doc string for start_line, end_line in self.doc_lines: indents = set() <<no antiweb directives in doc string>> <<find the last directive before the doc string>> if isinstance(last_directive, RStart): <<decorate beginning and end>> if isinstance(last_directive, CStart): <<insert additional include>> super(PythonReader, self)._post_process(fname, text)
<<no antiweb directives in doc string>>
#If antiweb directives are within the doc string, #the doc string will not be decorated! directives_between_start_and_end_line = False for l in self.lines[start_line+1:end_line]: if l: #needed for <<insert additional include>> indents.add(l.indent) if l.directives: directives_between_start_and_end_line = True break if directives_between_start_and_end_line: continue
<<find the last directive before the doc string>>
last_directive = None for l in reversed(self.lines[:start_line]): if l.directives: last_directive = l.directives[0] break
<<decorate beginning and end>>
l = self.lines[start_line] start = Start(start_line, last_directive.name + " doc") l.directives = list(l.directives) + [start] l = self.lines[end_line] end = End(end_line, last_directive.name + " doc") l.directives = list(l.directives) + [end]
<<insert additional include>>
l = l.like("") include = Include(end_line, last_directive.name) l.directives = list(l.directives) + [include] self.lines.insert(end_line, l) #the include directive should have the same #indentation as the .. py:function:: directive #inside the doc string. (It should be second #value of sorted indents) indents = list(sorted(indents)) if len(indents) > 1: l.change_indent(indents[1]-l.indent)
-
_accept_token
(token)¶ -
def _accept_token(self, token): return token in Token.Comment or token in Token.Literal.String.Doc
-
filter_output
(lines)¶ -
def filter_output(self, lines): for l in lines: if l.type == "d": #remove comment chars in document lines stext = l.text.lstrip() if stext == '"""' or stext == "'''": #remove """ and ''' from documentation lines #see the l.text.lstrip()! if the lines ends with a white space #the quotes will be kept! This is feature, to force the quotes #in the output continue if stext.startswith("#"): #remove comments but not chapters l.text = l.indented(stext[1:]) yield l
-
_cut_comment
(index, token, text)¶ -
def _cut_comment(self, index, token, text): if token in Token.Literal.String.Doc: if text.startswith('"""'): #save the start/end line of doc strings beginning with """ #for further decoration processing in _post_process, start_line = bisect.bisect(self.starts, index)-1 end_line = bisect.bisect(self.starts, index+len(text)-3)-1 lines = list(filter(bool, text[3:-3].splitlines())) #filter out empty strings if lines: self.doc_lines.append((start_line, end_line)) text = text[3:-3] return text
ClojureReader¶
-
class
ClojureReader
¶ A reader for Clojure code. This class inherits
Reader
.class ClojureReader(Reader): def _accept_token(self, token): return token in Token.Comment def _cut_comment(self, index, token, text): if text.startswith(";"): text = text[1:] return text def filter_output(self, lines): """ .. py:method:: filter_output(lines) See :py:meth:`Reader.filter_output`. """ for l in lines: if l.type == "d": #remove comment chars in document lines stext = l.text.lstrip() if stext.startswith(";"): #remove comments but not chapters l.text = l.indented(stext[1:]) yield l
GenericReader¶
-
class
GenericReader
¶ A generic reader for languages with single line and block comments . This class inherits
Reader
.class GenericReader(Reader): def __init__(self, single_comment_markers, block_comment_markers): self.single_comment_markers = single_comment_markers self.block_comment_markers = block_comment_markers def _accept_token(self, token): return token in Token.Comment def _cut_comment(self, index, token, text): if text.startswith("/*"): text = text[2:-2] elif text.startswith("//"): text = text[2:] return text def filter_output(self, lines): for l in lines: if l.type == "d": #remove comment chars in document lines stext = l.text.lstrip() for block_start, block_end in self.block_comment_markers: #comment layout: [("/*", "*/"),("#","@")] if stext == block_start or stext == block_end: #remove """ and ''' from documentation lines #see the l.text.lstrip()! if the lines ends with a white space #the quotes will be kept! This is feature, to force the quotes #in the output continue for comment_start in self.single_comment_markers: #comment layout: ["//",";"] if stext.startswith(comment_start): #remove comments but not chapters l.text = l.indented(stext[2:]) yield l
RstReader¶
-
class
RstReader
¶ A reader for rst code. This class inherits
Reader
.class RstReader(Reader): def _accept_token(self, token): return token in Token.Comment def _cut_comment(self, index, token, text): if text.startswith(".. "): text = text[3:] return text def filter_output(self, lines): """ .. py:method:: filter_output(lines) See :py:meth:`Reader.filter_output`. """ for l in lines: if l.type == "d": #remove comment chars in document lines stext = l.text.lstrip() if stext == '.. ': #remove """ and ''' from documentation lines #see the l.text.lstrip()! if the lines ends with a white space #the quotes will be kept! This is feature, to force the quotes #in the output continue if stext.startswith(".. "): #remove comments but not chapters l.text = l.indented(stext[3:]) yield l
XmlReader¶
-
class
XmlReader
¶ A reader for XML. This class inherits
Reader
.Whitespaces and comment tokens are removed from text blocks that are defined on single comment lines. If a text block should be indented, define it on multiple lines and indent the include statement. Examples of how to define text blocks with and without indentation (without ‘_’):
<!-- @_include(comments2) @_include(comments3) --> <!-- @_start(comments2) --> <!-- comments2 --> <!-- @_(comments2) --> <!-- @_start(comments3) comments3 _(comments3) --> will be resolved as: comments2 comments3
Here you can see an overview of the XmlReader class and its methods.
class XmlReader(Reader): def __init__(self, lexer, single_comment_markers, block_comment_markers): super(XmlReader, self).__init__(lexer, single_comment_markers, block_comment_markers) self.block_comment_marker_start = block_comment_markers[0] self.block_comment_marker_end = block_comment_markers[1] def _accept_token(self, token): return token in Token.Comment def _cut_comment(self, index, token, text): if text.startswith(self.block_comment_marker_start): text = self.remove_block_comment_start(text) text = self.remove_block_comment_end(text) return text def filter_output(self, lines): """ .. py:method:: filter_output(lines) See :py:meth:`Reader.filter_output`. """ for l in lines: if l.type == "d": stext = l.text.strip() if stext == self.block_comment_marker_start or stext == self.block_comment_marker_end: continue #the block comment markers should be removed for cases like: '<!-- asdasdasd -->' if stext.startswith(self.block_comment_marker_start): stext = self.remove_block_comment_start(stext) l.text = stext.strip() if stext.endswith(self.block_comment_marker_end): stext = self.remove_block_comment_end(stext) l.text = stext.strip() yield l def remove_block_comment_start(self, text): return text[len(self.block_comment_marker_start):] def remove_block_comment_end(self, text): return text[:-len(self.block_comment_marker_end)]
The Reader Dictionary¶
When writing a new reader, please register it in this dictionary with the according lexer name of the file. Please note that the Readername is the name of the class, not the file.
Format:
"lexername" : Readername,
readers = {
"C" : CReader,
"C++" : CReader,
"C#" : CSharpReader,
"Python" : PythonReader,
"Clojure" : ClojureReader,
"rst" : RstReader,
"XML" : XmlReader
}