Labs 11 and 12. Implementing a subset of Markdown lightweight markup language

Labs 11 and 12. Implementing a subset of Markdown lightweight markup language.

This lab takes two classes April 28 and May 5.

This lab will take two weeks. So, you don’t have to submit your solutions in the first class, unless you finish it by the end of the day.

How to submit your code

Each program (the source code .cpp file) should be submitted through Blackboard (Course Materials > Lab).

You can submit all your programs at the end of the lab session in one submission. This way, we can hopefully avoid the situation when you are quickly writing your program, immediately uploading it to Blackboard, but then, say 10 minutes later, realizing that there is a bug in it.

Basically, submit when you are sure that it will be your final version.

Each program should start with a comment that contains your name and a short program description, for example:

/*
  Author: Your Name 
  Description: Lab 1. Task 1. Hello world
*/

You can submit incomplete programs, but their “incomplete” status must be clearly mentioned in the comment to the program. In this case, also briefly describe what is implemented, and what is not.

Intro, part 1: HTML.

Every web page you see in the browser is actually a text file written in the special format called HTML (HyperText Markup Language).

For example, look at this example web page. It’s source code is shown below:

<html>
<head>
    <title>The page title</title>
    <link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>

  <h1>Intro to HTML</h1>
  
  <p> We are going to consider only very basic features of HTML. 
  For text formatting, you use special tags (<em>markup elements enclosed
  between angular brackets</em>). </p>

  <p> Each paragraph should be enclosed in its own paragraph tag (p). 
  A paragraph can span multiple lines: the browser wraps the lines 
  automatically. Multiple whitespace characters and new line characters 
  are discarded by the browser, so you have to rely on the paragraph 
  tags to format the page. </p>

  <h2>Adding emphasis</h2>
  <p> The tag em can be used to <em>emphasize certain phrases</em>,
  the browser shows it with the <em>italic</em> typeface. </p>

</body>
</html>

Intro, part 2: Markdown.

To simplify writing web pages, there is a number of lightweight markup languages that can be translated into HTML.

The most widespread of them is called Markdown.

For example, it is used by services like StackOverflow, reddit, Tumblr, Github, and Wordpress for formatting the user-written articles and user discussions.

This format is designed to be very readable. For example, the HTML code shown above looks as follows in Markdown:

# Intro to HTML
We are going to consider only very basic features of HTML. 
For text formatting, you use special tags (*markup elements enclosed
between angular brackets*).

Each paragraph should be enclosed in its own paragraph tag (p). 
A paragraph can span multiple lines: the browser wraps the lines 
automatically. Multiple whitespace characters and new line characters 
are discarded by the browser, so you have to rely on the paragraph 
tags to format the page. 

## Adding emphasis
The tag em can be used to *emphasize certain phrases*, the browser 
shows it with the *italic* typeface.

In particular:

Here is an online editor that lets you write Markdown in the left pane, and it will shows HTML code (or a rendered web page) in the right pane.

Task 0 (don’t submit).

  1. Download the example HTML file example.html and the CSS style file style.css, and save them in the same folder.

  2. Open this HTML file in the text editor and in the browser.

  3. Edit the HTML file adding a couple of new paragraphs and headers of different level (tags <h1><h6>). Refresh this file in the browser (with F5, or Ctrl+R, or Cmd+R) to see how the changes in the file affect the web page.

  4. Open the online Markdown editor and edit the Markdown text in the left panel. The resulting HTML (or its preview) will be shown in the right panel.

    In this editor, write a Markdown file such that the resulting HTML is identical (or similar) to the HTML file you just created in the previous question.

    Observe how the headers and the paragraphs in Markdown get translated into HTML.

Limited Markdown

This simplified version of the original Markdown has only the following syntactic elements:

# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6

They should be replaced with the corresponding header tags
<h1> Header 1 </h1>,
etc.

Task 1.

We are going to write a program that reads a file in the Limited Markdown format, and outputs HTML. The program should read the input through cin, and print HTML into cout, so it’s supposed to be executed using stream redirection

./markdown < input.md > output.html

In the first task, write a program that recognizes the headers ####### and replaces them with the corresponding <h1><h6> tags.

The rest of the input should be output without any changes (so the resulting HTML still contains the original text, but it is not formatted correctly yet).

To read lines one by one from the input stream until it ends, one can use

string s;

while( getline(cin, s) ) {
    // do something with the string
    cout << s << endl;
}

Add the following text at the beginning of the produced HTML file to add the CSS style to the generated web page:

<html>
  <head>
  <title>Page title</title>
  <link rel="stylesheet" type="text/css" href="style.css">
  </head>
<body>

And the following should be added at the end:

</body>
</html>

Adding these two pieces can be done, for example, using the command line cat that let you concatenate files

cat beginning.html main_text.html ending.html > page.html

These auxiliary files beginning.html and ending.html can be downloaded too.

Task 2.

Change the program to support paragraphs. If the text between two consequtive headers has empty lines, such empty lines are interpreted as paragraph breaks.

To identify whitespace characters, you can use the function isspace(c) (Documentation), it returns true if c is one of the whitespace characters ('\t', '\n', '\r', '\v', '\f', ' '), or false otherwise. To use it, include the header <cctype>.

Task 3.

Change the program to support text emphasis. Notice that there may be multiple *emphasized* phrases in a paragraph, and such phrases can start on one line, and continue on the next line. They can even span multiple lines, however they should end by the end of the paragraph.

Notice that in the standard Markdown the starting * should have a non-whitespace character to the right, and the closing * should have a non-whitespace character to the left, but you may drop this particular feature. Simply assume that every odd * starts an emphasis block, and every even * ends it.

Task 4.

Implement some other feature of Markdown that is not a part of Limited Markdown.