如何从粘贴的内容中过滤Microsoft Word垃圾?

时间:2010-08-24 作者:artlung

我有一些用户在群博客上发帖,可以剪切和粘贴,但他们的粘贴内容包括:

<!– /* Font Definitions */ @font-face {font-family:”Cambria Math”; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:1; mso-generic-font-family:roman; mso-font-format:other; mso-font-pitch:variable; mso-font-signature:0 0 0 0 0 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-520092929 1073786111 9 0 415 0;} @font-face {font-family:”Trebuchet MS”; panose-1:2 11 6 3 2 2 2 2 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:647 0 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:”"; margin-top:0in; margin-right:0in; margin-bottom:10.0pt; margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:”Trebuchet MS”,”sans-serif”; mso-fareast-font-family:Calibri; mso-fareast-theme-font:minor-latin; mso-bidi-font-family:”Times New Roman”; mso-bidi-theme-font:minor-bidi; color:black;} p {mso-style-noshow:yes; mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0in; mso-margin-bottom-alt:auto; margin-left:0in; mso-pagination:widow-orphan; font-size:12.0pt; font-family:”Times New Roman”,”serif”; mso-fareast-font-family:”Times New Roman”;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:12.0pt; mso-ansi-font-size:12.0pt; mso-bidi-font-size:12.0pt; mso-ascii-font-family:”Trebuchet MS”; mso-fareast-font-family:Calibri; mso-fareast-theme-font:minor-latin; mso-hansi-font-family:”Trebuchet MS”; mso-bidi-font-family:”Times New Roman”; mso-bidi-theme-font:minor-bidi; color:black;} .MsoPapDefault {mso-style-type:export-only; margin-bottom:10.0pt; line-height:115%;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} –>

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:”Calibri”,”sans-serif”;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:”Times New Roman”;
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:”Times New Roman”;
mso-bidi-theme-font:minor-bidi;}
如何自动过滤出这样的代码?

4 个回复
最合适的回答,由SO网友:Chris_O 整理而成

WordPress内置的可视文本编辑器上有一个按钮,它将剥离Microsoft Word Formatting。标签为“从Word粘贴”alt text

SO网友:John P Bloch

我建议用Ozh的TinyMCE Advanced 插件。它允许您添加一个“从Word粘贴”选项,为您处理所有这些问题。

然而,如果你对此不感兴趣,你还有一些选择。这样地:

function get_rid_of_mso_junk( $content ){
  return preg_replace( \'@(mso|panose)[^:]{1,25}:[^;]+;(\\s+)?(\\n+)?@i\', \'\', $content );
}

add_filter( \'content_save_pre\', \'get_rid_of_mso_junk\' );
只需继续向该正则表达式中的第一个捕获集添加不需要的声明,以添加应该删除的行。E、 g.:(mso|panose|other-junk|annoyance).

SO网友:EAMann

我与经常面临这个问题的客户合作过。我发现,诀窍是将粘贴复制到HTML视图中,然后切换回可视化编辑器,以便在必要时调整格式。

如果从其他网站复制粘贴,这也是必要的。有时,您会意外地从外部源中引入类定义和内嵌样式,如果您的站点没有设置或支持这些相同的类或样式,则可能会中断显示。

另一种选择是向用户公开Windows Live Writer. 这是一款完全免费的Microsoft产品,可以很好地使用Word的复制粘贴功能,并可以与WordPress进行交互-您可以编写帖子、编辑帖子、使用内置拼写检查器、格式化帖子以准确显示所需内容,然后单击“发布”通过XMLRPC将帖子推送到WordPress。这是一个相当完善的系统,让教第一次写博客的人如何写博客变得非常容易。。。尤其是因为UI从一开始就与Word非常相似。

SO网友:codeprokanner

对于任何寻求此问题解决方案的人,我都会这样做:

function delete_between($beginning, $end, $string) {
    $beginningPos = strpos($string, $beginning);
    $endPos = strpos($string, $end);
    if (!$beginningPos || !$endPos) {
    return $string;
    }

    $textToDelete = substr($string, $beginningPos, ($endPos + strlen($end)) - $beginningPos);

    return str_replace($textToDelete, \'\', $string);
}

function clean_content( $content ){
    if( is_home() || is_single()){
        $content = delete_between(\'<!--[if gte mso\', \';}\', $content);   
        return $content;
    }else{
    return $content;
}

add_filter( \'the_content\', \'clean_content\' );
add_filter( \'the_excerpt\', \'clean_content\' );
您可以用所需的任何内容替换delete\\u between函数中的字符串。但这似乎对我有用。

结束

相关推荐