Skip to content Skip to sidebar Skip to footer

How To Remove Html Tags From Word Content?

I know there are a couple threads about it which says simply using Regex.Replace(input, '<.*?>', String.Empty); but I cant use it in text written in word doc. my code is lik

Solution 1:

Give a try the following:

Convert the text with HTML addings to a simple string using

string unFormatted = paragrapf2.ToString(SaveOptions.DisableFormatting));

and then replace the paragraf2 contect with the unFormatted string.

Solution 2:

With some help provided in the comments, i realized the following working solution

findObject.ClearFormatting();
findObject.Text = @"\<*\>";
findObject.MatchWildcards=true;                     
findObject.Replacement.ClearFormatting();
findObject.Replacement.Text = "";                       

object replaceAll = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAll, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

which is using the search pattern \<*\> (containing the wildcard character *, hence findObject.MatchWildcards must be set to true).

Post a Comment for "How To Remove Html Tags From Word Content?"