Stripping the tags from an HTML file

Recently, I was moving my webpages created through FrontPage 2003 (please stop snickering -it was a great tool for sites with flat web files) to WordPress.

This required clearing of formatting of the HTML file and one could use http://StripHTML.com for doing so – “StripHTML.com gives you a quick, easy and satisfying way to transform your ugly formatted and/or HTMLified text into a clean and pretty text for you to enjoy.”  However, most of us want to preserve the URL links, titles and the paragraph tags.

This VBA program that can be used within Excel takes an input text file, reads it as a single string, removes all tags except the URLs, paragraphs and titles, and writes to a new text file.

The VBA program below has the input file at “C:\Users\Tranq\Documents\abc\inputhtml.txt”, and outputs to “C:\Users\Tranq\Documents\abc\outputhtml.txt” and you can change these locations in the TextFile_PullData and TextFile_Create functions, respectively.  You can modify the program to read several files in a directory and make it even more efficient.

How to Use: Just open an Excel file. Press Alt+F11, and open a new module. Cut and paste what is given below, or download the text file.  Go to a cell in the excel file and enter =CleanTags(A1) in cell B1.  Make sure you have the input file at C:\Users\Tranq\Documents\abc\inputhtml.txt”.   The location can be modified.

**********************************************

Function CleanTags(HTML As String) As String
‘PURPOSE: Clean HTML tags except the paragraph, weblinks and title
‘ You can comment out if you want these to supress these as well
‘ The instructions are given at the spots.
‘ SOURCE: Unknown and AutarKaw.org
Dim result As String, StripIt As Boolean, c As String, i As Long
HTML = TextFile_PullData()

‘StripIt is used to figure out to include or exclude in output
StripIt = False

‘Looking at each character in the HTML file
For i = 1 To Len(HTML)
‘c is each character
c = Mid(HTML, i, 1)
‘Some conditions to take care for end of input file
If i <= Len(HTML) – 1 Then
‘d is last two characters of file to capture <a and <p
‘Just in case
d = Mid(HTML, i, 2)
Else
d = “”
End If
If i <= Len(HTML) – 3 Then
‘e is last four characters of file to capture </a> and </p>
e = Mid(HTML, i, 4)
Else
e = “”
End If

‘Checking for < character that begins an HTML tag
If c = “<” Then StripIt = True

‘Comment this if you want to strip paragraphs
If d = “<p” Then StripIt = False
If e = “</p>” Then StripIt = False

‘Comment this if you want to strip URL tags and title tags as well.
If d = “<a” Then StripIt = False
If e = “</a>” Then StripIt = False

‘Adds to output or skips it
If StripIt = False Then result = result & c
‘Taking care of closing tag to change the StripIt Boolean
If c = “>” Then StripIt = False
Next i
CleanTags = result

‘Putting the output in a new file
abc = TextFile_Create(result)
‘Run the program by entering =CleanTags(A1) in a blank excel file
‘where you have this module. Puts Done in cell if it runs correctly
CleanTags = “Done”
‘This lets you know the work is done. Comment if you like.
‘MsgBox (“Done”)
End Function

Function TextFile_PullData()
‘PURPOSE: Send All Data From Text File To A String Variable
‘SOURCE: http://www.TheSpreadsheetGuru.com

Dim TextFile As Integer
Dim FilePath As String
Dim FileContent As String

‘File Path of Text File
FilePath = “C:\Users\Tranq\Documents\abc\inputhtml.txt”

‘Determine the next file number available for use by the FileOpen function
TextFile = FreeFile

‘Open the text file
Open FilePath For Input As TextFile

‘Store file content inside a variable
FileContent = Input(LOF(TextFile), TextFile)

‘Report Out Text File Contents
‘MsgBox FileContent

‘Close Text File
Close TextFile
TextFile_PullData = FileContent

End Function

Function TextFile_Create(HTML As String)
‘PURPOSE: Create A New Text File
‘SOURCE: http://www.TheSpreadsheetGuru.com

Dim TextFile As Integer
Dim FilePath As String

‘What is the file path and name for the new text file?
FilePath = “C:\Users\Tranq\Documents\abc\outputhtml.txt”

‘Determine the next file number available for use by the FileOpen function
TextFile = FreeFile

‘Open the text file
Open FilePath For Output As TextFile

‘Write some lines of text
Print #TextFile, HTML

‘Save & Close Text File
Close TextFile

End Function


This post is brought to you by

Advertisements

New site for the Numerical Method MOOC

Our MOOCs on Numerical Methods that were offered on canvas.net site have been moved to a new CANVAS Free for Teachers site.  Current students can continue to use the current MOOCs indefinitely.

For future students, the two-part MOOC now has been combined into one and it is accessible at any time.   There are no deadlines to start or finish.  To enroll, just click this link https://canvas.instructure.com/enroll/KYGTJR and you are on your way to learn Numerical Methods via audiovisual lectures, textbook content and online assessments.


This post is brought to you by

Open Education Resource Repository Links

  • Open education resources (OER) or open course wares (OCW) are everywhere you look.  Many have spent valuable time to make repositories of these courses where one can find links to several courses and these are everywhere as well.  So, when I was answering a recent survey on OERs, they asked users which repositories they had used.  I thought it would be good to have links to all the mentioned repositories, and they are given below.

And if you are wondering what an open education resource is, here is how Open Education Consortium defines it. “An OpenCourseWare (OCW) is a free and open digital publication of high quality college and university‐level educational materials. These materials are organized as courses, and often include course planning materials and evaluation tools as well as thematic content. OpenCourseWare are free and openly licensed, accessible to anyone, anytime via the internet.” Acknowledgement:  I would like to thank USF student Brain G for finding and entering the links.


This post is brought to you by

Maximizing the cross-section of a gutter

This post is brought to you by

Using Watu quizzes and Latex in WordPress

Many years ago, I modified a JavaScript code to develop online quizzes for Numerical Methods.  An example of that is here – right click to “View Page Source”. http://nm.mathforcollege.com/mcquizzes/01aae/quiz_01aae_introduction.html

The above quiz is adequate but looks vintage and I am currently in the process of migrating my whole numerical methods site to WordPress.  As part of this migration, I tried to update the JavaScript code to work on WordPress but that turned out to be above my pay grade.

After much searching, I found the WATU quiz plugin and am using it to redevelop the quizzes from scratch with much of the text being cut and paste from old quizzes.

WATU is quite versatile but I faced issues with rendering of the equations.  For that I use the WP-Katex plugin.  Simply put the equations between the  tagname latex in brackets []  followed by /latex in brackets [], and this shortcode works.  I would show an example to illustrate the use of the shortcode, but it gets rendered.  See usage here.

This use of latex shortcode worked well for the question statements in the quiz but when latex shortcode was put in the distractors, equations would show up with weird spacing.  This was resolved quickly by WATU support where they suggested to edit the WATU style.css file.  See https://wordpress.org/support/topic/latex-not-displaying-properly-in-multiple-choice-answers/ for the solution.  It worked.

Now, another Latex issue cropped up when a user would submit a quiz  and it shows the right answers to the user.  The equations written in Latex would not get rendered and instead it would show the linear form of the equations.   Those were resolved by going to general settings in the WATU quizzes and check marking quizzes to not use Ajax.

WATU is a complete solution for posting online quizzes on the web for student practice.  I am not using WATU to collect data or to assign a grade, but these capabilities do exist in WATU.

See the new version of the quiz I created now on WordPress: https://numericalmethods.autarkaw.com/quiz-chapter-01-01-introduction-to-numerical-methods-2/  Please do not bookmark the quiz as this is a test website and I will be migrating the test website to the original site (http://mathforcollege.com) by the time Fall semester starts.  This will create the least disruption to maintain legacy of open course ware.

Reducing ordinary differential equations to state variable matrix form

To be able to solve differential equations numerically, one has to reduce them to a set of first order ordinary differential equations – also called the state variable form.  By writing them in a matrix form, the equations become conducive for programming in languages such as MATLAB.  Here is an example of this reduction to state variable matrix form.

08.05 blog_Page_1

08.05 blog_Page_2

This post is brought to you by

 

 

Matrix Algebra: Eigenvalues and Eigenvectors

Many university STEM major programs have reduced the credit hours for a course in Matrix Algebra or have simply dropped the course from their curriculum.   The content of Matrix Algebra in many cases is taught just in time where needed.  This approach can leave a student with many conceptual holes in the required knowledge of matrix algebra. In this series of blogs, we bring to you ten topics that are of immediate and intermediate interest for Matrix Algebra. Here is the tenth topic where we talk about the eigenvalue and eigenvectors of a square matrix. Learn how to define and find eigenvalues and eigenvector of a square matrix, and get introduced to some key theorems on eigenvalues and eigenvectors.  Get all the resources in form of textbook content, lecture videos, multiple choice test, problem set, and PowerPoint presentation. Eigenvalues and Eigenvectors
This post is brought to you by