Stripping the tags from an HTML file

Recently, I was moving my webpages created through FrontPage 2003 (please stop snickering -it was a great tool for sites with flat web files) to WordPress.

This required clearing of formatting of the HTML file and one could use http://StripHTML.com for doing so – “StripHTML.com gives you a quick, easy and satisfying way to transform your ugly formatted and/or HTMLified text into a clean and pretty text for you to enjoy.”  However, most of us want to preserve the URL links, titles and the paragraph tags.

This VBA program that can be used within Excel takes an input text file, reads it as a single string, removes all tags except the URLs, paragraphs and titles, and writes to a new text file.

The VBA program below has the input file at “C:\Users\Tranq\Documents\abc\inputhtml.txt”, and outputs to “C:\Users\Tranq\Documents\abc\outputhtml.txt” and you can change these locations in the TextFile_PullData and TextFile_Create functions, respectively.  You can modify the program to read several files in a directory and make it even more efficient.

How to Use: Just open an Excel file. Press Alt+F11, and open a new module. Cut and paste what is given below, or download the text file.  Go to a cell in the excel file and enter =CleanTags(A1) in cell B1.  Make sure you have the input file at C:\Users\Tranq\Documents\abc\inputhtml.txt”.   The location can be modified.

**********************************************

Function CleanTags(HTML As String) As String
‘PURPOSE: Clean HTML tags except the paragraph, weblinks and title
‘ You can comment out if you want these to supress these as well
‘ The instructions are given at the spots.
‘ SOURCE: Unknown and AutarKaw.org
Dim result As String, StripIt As Boolean, c As String, i As Long
HTML = TextFile_PullData()

‘StripIt is used to figure out to include or exclude in output
StripIt = False

‘Looking at each character in the HTML file
For i = 1 To Len(HTML)
‘c is each character
c = Mid(HTML, i, 1)
‘Some conditions to take care for end of input file
If i <= Len(HTML) – 1 Then
‘d is last two characters of file to capture <a and <p
‘Just in case
d = Mid(HTML, i, 2)
Else
d = “”
End If
If i <= Len(HTML) – 3 Then
‘e is last four characters of file to capture </a> and </p>
e = Mid(HTML, i, 4)
Else
e = “”
End If

‘Checking for < character that begins an HTML tag
If c = “<” Then StripIt = True

‘Comment this if you want to strip paragraphs
If d = “<p” Then StripIt = False
If e = “</p>” Then StripIt = False

‘Comment this if you want to strip URL tags and title tags as well.
If d = “<a” Then StripIt = False
If e = “</a>” Then StripIt = False

‘Adds to output or skips it
If StripIt = False Then result = result & c
‘Taking care of closing tag to change the StripIt Boolean
If c = “>” Then StripIt = False
Next i
CleanTags = result

‘Putting the output in a new file
abc = TextFile_Create(result)
‘Run the program by entering =CleanTags(A1) in a blank excel file
‘where you have this module. Puts Done in cell if it runs correctly
CleanTags = “Done”
‘This lets you know the work is done. Comment if you like.
‘MsgBox (“Done”)
End Function

Function TextFile_PullData()
‘PURPOSE: Send All Data From Text File To A String Variable
‘SOURCE: http://www.TheSpreadsheetGuru.com

Dim TextFile As Integer
Dim FilePath As String
Dim FileContent As String

‘File Path of Text File
FilePath = “C:\Users\Tranq\Documents\abc\inputhtml.txt”

‘Determine the next file number available for use by the FileOpen function
TextFile = FreeFile

‘Open the text file
Open FilePath For Input As TextFile

‘Store file content inside a variable
FileContent = Input(LOF(TextFile), TextFile)

‘Report Out Text File Contents
‘MsgBox FileContent

‘Close Text File
Close TextFile
TextFile_PullData = FileContent

End Function

Function TextFile_Create(HTML As String)
‘PURPOSE: Create A New Text File
‘SOURCE: http://www.TheSpreadsheetGuru.com

Dim TextFile As Integer
Dim FilePath As String

‘What is the file path and name for the new text file?
FilePath = “C:\Users\Tranq\Documents\abc\outputhtml.txt”

‘Determine the next file number available for use by the FileOpen function
TextFile = FreeFile

‘Open the text file
Open FilePath For Output As TextFile

‘Write some lines of text
Print #TextFile, HTML

‘Save & Close Text File
Close TextFile

End Function


This post is brought to you by

Advertisements

An FE Exam Math Problem in Complex Algebra

“The Fundamentals of Engineering (FE) exam is generally the first step in the process of becoming a professional licensed engineer (P.E.). It is designed for recent graduates and students who are close to finishing an undergraduate engineering degree from an EAC/ABET-accredited program” – FE Exam NCEES

For most engineering majors, mathematics is a required part of the examination. Here is a question on complex algebra.

complex number fe exam

 

This post is brought to you by

codecademy.com looks promising

codecademy.com is offering promising free online courses to non-CS programmers.  It is an interactive way of learning how to program and it also assesses your learning gains via badges.  Go and give it a try.  You can get a quick glimpse of what they are trying to do by watching a CNN news clip.

__________________________________

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, the textbook on Introduction to Programming Concepts Using MATLAB, and the YouTube video lectures available athttp://numericalmethods.eng.usf.edu/videos.  Subscribe to the blog via areader or email to stay updated with this blog. Let the information follow you.

Numerical Methods YouTube Video Progress

Since January 2009, I have been videotaping numerical methods course lectures in the Educational Outreach studio of University of South Florida, Tampa. Videos are made in 10-minute segments, not just because that is the limit of the length of YouTube videos, but because we firmly believe in making our resources pedagogically neutral. Ten minute videos allow an instructor to pick and choose what, when and how he or she wants the students to learn.

But some people have asked me – Why YouTube – why not put the same 10-minute videos on your own website. We do have the links to the YouTube videos on our own website, but there are many good reasons to go the YouTube route. The list of reasons below may be obvious to some, while others may not have thought about some of them.

1. Use YouTube’s storage space for the videos.

2. Use YouTube’s compression technology to make the videos stream faster on slow connections.

3. Use the power of Google and YouTube to tag and search the videos.

4. Use YouTube’s bandwidth as opposed to that of my school. My school’s IT department most probably would start screaming when the downloads pick up pace.

5. Use YouTube’s editing facilities to add annotations, links, and play lists.

6. Use the ubiquity of YouTube to reach a large audience.

7. Simple one stop process to let others embed the videos on their website.

8. Have open discussion on the videos via comments.

9. Get the videos rated so that we can judge their quality.

10. Use the “insight” tool of YouTube to analyze who is watching the videos.

______________________________________________________________________

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, and the YouTube video lectures available at http://numericalmethods.eng.usf.edu/videos and http://www.youtube.com/numericalmethodsguy

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

Is a square matrix strictly diagonally dominant?

A square matrix A is strictly diagonally dominant if for all rows the absolute value of the diagonal element in a row is strictly greater than than the sum of absolute value of the rest of the elements in that row.

In this posting, I show a MATLAB program that finds whether a square matrix is strictly diagonally dominant by using two different methods. These are academic ways to reinforce programming skills in a student.

The MATLAB program can be downloaded as a Mfile (better to download it, as single quotes from the web-post do not translate correctly with the MATLAB editor). The html file showing the mfile and the command window output is also available.

%% IS A GIVEN SQUARE MATRIX STRICTLY DIAGONALLY DOMINANT?
% Language : Matlab 2007a
% Authors : Autar Kaw
% Last Revised : November 25, 2008
% Abstract: This program shows you two ways of finding out
% if a square matrix is diagonally dominant. A square matrix is
% diagonally dominant if for all rows the absolute value of the
% diagonal element in a row is strictly greater than than the sum
% of absolute value of the rest of the elements in that row
clc
clear all
disp(‘This program shows you two ways of finding out’)
disp(‘if a square matrix is diagonally dominant. A square matrix is’)
disp(‘diagonally dominant if for all rows the absolute value of the’)
disp(‘diagonal element in a row is strictly greater than than the sum’)
disp(‘of absolute value of the rest of the elements in that row’)
disp(‘ ‘)
%% INPUTS
% The square matrix
A=[-12 1 -7 2;1 3.4 1.1 1.1; 1 0 -4.5 0;10 1 1 10];
disp (‘INPUTS’)
disp(‘Here is the square matrix’)
A
disp(‘ ‘)

%% FIRST SOLUTION
% This is based on finding for how many rows the condition
% the absolute value of the diagonal element in a row is
% strictly greater than than the sum of absolute value
% of the rest of the elements in that row.

%size gives how many rows and columns in the A matrix
rowcol=size(A);
n=rowcol(1);
% count = for how many rows is the inequality met that
% the absolute value of the diagonal element in a row is
% strictly greater than than the sum of absolute value
% of the rest of the elements in that row
count=0;
for i=1:1:n
sumrow=0;
for j=1:1:n
if i~=j
sumrow=sumrow+abs(A(i,j));
end
end
if abs(A(i,i))>sumrow
count=count+1;
end
end
disp(‘FIRST WAY’)
if count==n
disp(‘Matrix is strictly diagonal dominant’)
else
disp(‘Matrix is NOT strictly diagonal dominant’)
end

%% SECOND SOLUTION
% This is based on finding for if for any row the condition
% the absolute value of the diagonal element in a row is
% strictly greater than than the sum of absolute value
% of the rest of the elements in that row is NOT met

%size gives how many rows and columns in the A matrix
rowcol=size(A);
n=rowcol(1);
% flag = keeps track if the condition is not met
% flag = 1 if matrix is strictly diagonally dominant
% flag = 2 if matrix is not strictly diagonally dominant

% Assuming matrix is strictly diagonally dominant
flag=1;
for i=1:1:n
sumrow=0;
for j=1:1:n
if i~=j
sumrow=sumrow+abs(A(i,j));
end
end
% As soon as the condition is not met, it is not a strictly
% diagonally dominant matrix
if abs(A(i,i))<=sumrow
flag=2;
break;
end
end
disp(‘ ‘)
disp(‘SECOND WAY’)
if flag==1
disp(‘Matrix is strictly diagonal dominant’)
else
disp(‘Matrix is NOT strictly diagonal dominant’)
end

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu and the textbook on Numerical Methods with Applications available from the lulu storefront.

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

A better way to show conversion of decimal fractional number to binary

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, and the YouTube video lectures available at http://www.youtube.com/numericalmethodsguy.  

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

Finding height of atmosphere using nonlinear regression

Here is an example of finding the height of the atmosphere using nonlinear regression of the mass density of air vs altitude above sea level.

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu.

An abridged (for low cost) book on Numerical Methods with Applications will be in print (includes problem sets, TOC, index) on December 10, 2008 and available at lulu storefront.

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.