Note that there are some explanatory texts on larger screens.

plurals
  1. POHTML Agility Pack strip tags NOT IN whitelist
    primarykey
    data
    text
    <p>I'm trying to create a function which removes html tags and attributes which are not in a white list. I have the following HTML: </p> <pre><code>&lt;b&gt;first text &lt;/b&gt; &lt;b&gt;second text here &lt;a&gt;some text here&lt;/a&gt; &lt;a&gt;some text here&lt;/a&gt; &lt;/b&gt; &lt;a&gt;some twxt here&lt;/a&gt; </code></pre> <p>I am using HTML agility pack and the code I have so far is:</p> <pre><code>static List&lt;string&gt; WhiteNodeList = new List&lt;string&gt; { "b" }; static List&lt;string&gt; WhiteAttrList = new List&lt;string&gt; { }; static HtmlNode htmlNode; public static void RemoveNotInWhiteList(out string _output, HtmlNode pNode, List&lt;string&gt; pWhiteList, List&lt;string&gt; attrWhiteList) { // remove all attributes not on white list foreach (var item in pNode.ChildNodes) { item.Attributes.Where(u =&gt; attrWhiteList.Contains(u.Name) == false).ToList().ForEach(u =&gt; RemoveAttribute(u)); } // remove all html and their innerText and attributes if not on whitelist. //pNode.ChildNodes.Where(u =&gt; pWhiteList.Contains(u.Name) == false).ToList().ForEach(u =&gt; u.Remove()); //pNode.ChildNodes.Where(u =&gt; pWhiteList.Contains(u.Name) == false).ToList().ForEach(u =&gt; u.ParentNode.ReplaceChild(ConvertHtmlToNode(u.InnerHtml),u)); //pNode.ChildNodes.Where(u =&gt; pWhiteList.Contains(u.Name) == false).ToList().ForEach(u =&gt; u.Remove()); for (int i = 0; i &lt; pNode.ChildNodes.Count; i++) { if (!pWhiteList.Contains(pNode.ChildNodes[i].Name)) { HtmlNode _newNode = ConvertHtmlToNode(pNode.ChildNodes[i].InnerHtml); pNode.ChildNodes[i].ParentNode.ReplaceChild(_newNode, pNode.ChildNodes[i]); if (pNode.ChildNodes[i].HasChildNodes &amp;&amp; !string.IsNullOrEmpty(pNode.ChildNodes[i].InnerText.Trim().Replace("\r\n", ""))) { HtmlNode outputNode1 = pNode.ChildNodes[i]; for (int j = 0; j &lt; pNode.ChildNodes[i].ChildNodes.Count; j++) { string _childNodeOutput; RemoveNotInWhiteList(out _childNodeOutput, pNode.ChildNodes[i], WhiteNodeList, WhiteAttrList); pNode.ChildNodes[i].ReplaceChild(ConvertHtmlToNode(_childNodeOutput), pNode.ChildNodes[i].ChildNodes[j]); i++; } } } } // Console.WriteLine(pNode.OuterHtml); _output = pNode.OuterHtml; } private static void RemoveAttribute(HtmlAttribute u) { u.Value = u.Value.ToLower().Replace("javascript", ""); u.Remove(); } public static HtmlNode ConvertHtmlToNode(string html) { HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); if (doc.DocumentNode.ChildNodes.Count == 1) return doc.DocumentNode.ChildNodes[0]; else return doc.DocumentNode; } </code></pre> <p>The output I am tryig to achieve is</p> <pre><code>&lt;b&gt;first text &lt;/b&gt; &lt;b&gt;second text here some text here some text here &lt;/b&gt; some twxt here </code></pre> <p>That means that I only want to keep the <code>&lt;b&gt;</code> tags.<br> The reason i'm doing this is because Some of the users do cpoy-paste from MS WORD into ny WYSYWYG html editor.</p> <p>Thanks.!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload