Note that there are some explanatory texts on larger screens.

plurals
  1. POIs there a PHP class out there that can clean up content?
    primarykey
    data
    text
    <p>I have been trying to write in PHP using a series of regular expressions and the PHP function preg_replace.</p> <p>My main aim is to tidy up the content with things like making sure the beginning of a sentence has an uppercase letter; there is a space after a comma; etc.</p> <p>Some examples of the tidying I am trying to achieve:</p> <pre><code>// Remove any spaces around slashes $content_replacements_from[] = "/\s*\/\s*/"; $content_replacements_to[] = "/"; // Remove any new lines or tabs $content_replacements_from[] = "/[\r\n\t]/"; $content_replacements_to[] = " "; // Remove any extra spaces $content_replacements_from[] = "/\s{2,}/"; $content_replacements_to[] = " "; // Tidy up joined full stops $content_replacements_from[] = "/([a-zA-Z]{1})\s*[\.]{1}\s*([^(jpeg|jpg|png|pdf|gif|doc|xls|docx|xlsx|ppt|pptx|html|php|htm)]{1})/"; $content_replacements_to[] = "$1. $2"; // Tidy up joined commas $content_replacements_from[] = "/([a-zA-Z0-9]{1})\s*[\,]{1}\s*([a-zA-Z0-9]{1})/"; $content_replacements_to[] = "$1, $2"; // Tidy up joined exclamation marks $content_replacements_from[] = "/([a-zA-Z0-9]{1})\s*[\!]{1}\s*([a-zA-Z0-9]{1})/"; $content_replacements_to[] = "$1! $2"; // Tidy up joined question marks $content_replacements_from[] = "/([a-zA-Z0-9]{1})\s*[\?]{1}\s*([a-zA-Z0-9]{1})/"; $content_replacements_to[] = "$1? $2"; // Tidy up joined semi colons $content_replacements_from[] = "/([a-zA-Z0-9]{1})\s*[\;]{1}\s*([a-zA-Z0-9]{1})/"; $content_replacements_to[] = "$1; $2"; // Tidy up joined colons $content_replacements_from[] = "/([a-zA-Z0-9]{1})\s*[\:]{1}\s*([a-zA-Z0-9]{1})/"; $content_replacements_to[] = "$1: $2"; // Tidy up fluid ounces $content_replacements_from[] = "/[Ff]{1}[Ll]{1}.?\s?[Oo]{1}[Zz]{1}/"; $content_replacements_to[] = "fl oz"; // Tidy up rpm $content_replacements_from[] = "/[Rr]{1}[Pp]{1}[Mm]{1}/"; $content_replacements_to[] = "rpm"; // Tidy up UK $content_replacements_from[] = "/[Uu]{1}[Kk]{1}/"; $content_replacements_to[] = "UK"; // Tidy up Maxi-sense $content_replacements_from[] = "/[Mm]{1}axi[\s\-]?[Ss]{1}ense/"; $content_replacements_to[] = "maxi-sense"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Mm]{1}axi[\s\-]?[Ss]{1}ense/"; $content_replacements_to[] = ". Maxi-sense"; $content_replacements_from[] = "/^[Mm]{1}axi[\s\-]?[Ss]{1}ense/"; $content_replacements_to[] = "Maxi-sense"; // Tidy up Side-by-side $content_replacements_from[] = "/[Ss]{1}ide[\s\-]?[Bb]{1}y[\s\-]?[Ss]{1}ide/"; $content_replacements_to[] = "side-by-side"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ss]{1}ide[\s\-]?[Bb]{1}y[\s\-]?[Ss]{1}ide/"; $content_replacements_to[] = ". Side-by-side"; $content_replacements_from[] = "/^[Ss]{1}ide[\s\-]?[Bb]{1}y[\s\-]?[Ss]{1}ide/"; $content_replacements_to[] = "Side-by-side"; // Tidy up extra large $content_replacements_from[] = "/[Xx]{1}[Ll]{l}/"; $content_replacements_to[] = "extra large"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Xx]{1}[Ll]{l}/"; $content_replacements_to[] = "Extra large"; $content_replacements_from[] = "/^[Xx]{1}[Ll]{l}/"; $content_replacements_to[] = "Extra large"; // Tidy up D-radius $content_replacements_from[] = "/[Dd]{1}[\s\-]?[Rr]{1}adius/"; $content_replacements_to[] = "D-radius"; // Tidy up A-rate $content_replacements_from[] = "/[Aa]{1}[\s\-]?[Rr]{1}ate/"; $content_replacements_to[] = "A-rate"; // Tidy up In-column $content_replacements_from[] = "/[Ii]{1}n[\s\-]?[Cc]{1}olum[n]?/"; $content_replacements_to[] = "in-column"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ii]{1}n[\s\-]?[Cc]{1}olum[n]?/"; $content_replacements_to[] = "In-column"; $content_replacements_from[] = "/^[Ii]{1}n[\s\-]?[Cc]{1}olum[n]?/"; $content_replacements_to[] = "In-column"; // Tidy up kW $content_replacements_from[] = "/[Kk]{1}[Ww]{1}/"; $content_replacements_to[] = "kW"; // Tidy up Built-in $content_replacements_from[] = "/[Bb]{1}uilt[\s\-]?[Ii]{1}n/"; $content_replacements_to[] = "built-in"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Bb]{1}uilt[\s\-]?[Ii]{1}n/"; $content_replacements_to[] = "Built-in"; $content_replacements_from[] = "/^[Bb]{1}uilt[\s\-]?[Ii]{1}n/"; $content_replacements_to[] = "Built-in"; // Tidy up Built-under $content_replacements_from[] = "/[Bb]{1}uilt[\s\-]?[Uu]{1}nder/"; $content_replacements_to[] = "built-under"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Bb]{1}uilt[\s\-]?[Uu]{1}nder/"; $content_replacements_to[] = "Built-under"; $content_replacements_from[] = "/^[Bb]{1}uilt[\s\-]?[Uu]{1}nder/"; $content_replacements_to[] = "Built-under"; // Tidy up Under-counter $content_replacements_from[] = "/[Uu]{1}nder[\s\-]?[Cc]{1}ounter/"; $content_replacements_to[] = "under-counter"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Uu]{1}nder[\s\-]?[Cc]{1}ounter/"; $content_replacements_to[] = "Under-counter"; $content_replacements_from[] = "/^[Uu]{1}nder[\s\-]?[Cc]{1}ounter/"; $content_replacements_to[] = "Under-counter"; // Tidy up Under-cabinet $content_replacements_from[] = "/[Uu]{1}nder[\s\-]?[Cc]{1}abinet/"; $content_replacements_to[] = "under-cabinet"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Uu]{1}nder[\s\-]?[Cc]{1}abinet/"; $content_replacements_to[] = "Under-cabinet"; $content_replacements_from[] = "/^[Uu]{1}nder[\s\-]?[Cc]{1}abinet/"; $content_replacements_to[] = "Under-cabinet"; // Tidy up integrated $content_replacements_from[] = "/([a-zA-Z0-9]{1})[\s]{1}[\-]{1}[Ii]{1}ntegrated/"; $content_replacements_to[] = "$1-integrated"; // Tidy up Semi-integrated $content_replacements_from[] = "/[Ss]{1}emi[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "semi-integrated"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ss]{1}emi[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "Semi-integrated"; $content_replacements_from[] = "/^[Ss]{1}emi[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "Semi-integrated"; // Tidy up Fully-integrated $content_replacements_from[] = "/[Ff]{1}ully[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "fully-integrated"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ff]{1}ully[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "Fully-integrated"; $content_replacements_from[] = "/^[Ff]{1}ully[\s\-]?[Ii]{1}ntegrated/"; $content_replacements_to[] = "Fully-integrated"; // Tidy up Semi-automatic $content_replacements_from[] = "/[Ss]{1}emi[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "semi-automatic"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ss]{1}emi[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "Semi-automatic"; $content_replacements_from[] = "/^[Ss]{1}emi[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "Semi-automatic"; // Tidy up Fully-automatic $content_replacements_from[] = "/[Ff]{1}ully[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "fully-automatic"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Ff]{1}ully[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "Fully-automatic"; $content_replacements_from[] = "/^[Ff]{1}ully[\s\-]?[Aa]{1}utomatic/"; $content_replacements_to[] = "Fully-automatic"; // Tidy up Pull-out $content_replacements_from[] = "/[Pp]{1}ull[\s\-]?[Oo]{1}ut/"; $content_replacements_to[] = "pull-out"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Pp]{1}ull[\s\-]?[Oo]{1}ut/"; $content_replacements_to[] = "Pull-out"; $content_replacements_from[] = "/^[Pp]{1}ull[\s\-]?[Oo]{1}ut/"; $content_replacements_to[] = "Pull-out"; // Tidy up including $content_replacements_from[] = "/\s[Ii]{1}nc[l]?[\.]?\s/"; $content_replacements_to[] = " including "; // Tidy up use $content_replacements_from[] = "/\s[Uu]{1}se\s/"; $content_replacements_to[] = " use "; // Tidy up ?-piece $content_replacements_from[] = "/([2345TtYy]{1})[\s\-]?[Pp]{1}iece/"; $content_replacements_to[] = "$1-piece"; // Tidy up ?-spout $content_replacements_from[] = "/([Cc]{1})[\s\-]?[Ss]{1}pout/"; $content_replacements_to[] = "$1-spout"; // Tidy up ?-end $content_replacements_from[] = "/([Cc]{1})[\s\-]?[Ee]{1}nd/"; $content_replacements_to[] = "$1-end"; // Tidy up Brushed Steel $content_replacements_from[] = "/[Bb]{1}[\-\/]{1}[Ss]{1}teel/"; $content_replacements_to[] = "brushed steel"; // Tidy up Stainless Steel $content_replacements_from[] = "/[Ss]{1}[\-\/]{1}[Ss]{1}teel/"; $content_replacements_to[] = "stainless steel"; // Tidy up Silk Steel $content_replacements_from[] = "/[Ss]{1}ilk[\s]?[Ss]{1}teel/"; $content_replacements_to[] = "silk steel"; // Remove trade marks $content_replacements_from[] = "/™/"; $content_replacements_to[] = ""; // Replace long dashes $content_replacements_from[] = "/–/"; $content_replacements_to[] = "-"; // Replace single quotes $content_replacements_from[] = "/’/"; $content_replacements_to[] = "'"; $content_replacements_from[] = "/`/"; $content_replacements_to[] = "'"; // Tidy up m $content_replacements_from[] = "/[\s]?[Mm]{1}etre/"; $content_replacements_to[] = "m"; // Tidy up m3 $content_replacements_from[] = "/([0-9]{1})[\s]?[Mm]{1}3/"; $content_replacements_to[] = "$1m&amp;sup3;"; $content_replacements_from[] = "/\&amp;sup3\;/"; $content_replacements_to[] = html_entity_decode("&amp;sup3;"); // Tidy up to in between numbers $content_replacements_from[] = "/([0-9]{1})[\s]?to[\s]?([0-9]{1})/"; $content_replacements_to[] = "$1 - $2"; // Tidy up per hour $content_replacements_from[] = "/\s[Aa]{1}nd\s[Hh]{1}[Rr]?$/"; $content_replacements_to[] = "ph"; // Tidy up l $content_replacements_from[] = "/[\s]?[Ll]{1}itre/"; $content_replacements_to[] = "l"; // Tidy up -in $content_replacements_from[] = "/\-[Ii]{1}n/"; $content_replacements_to[] = "-in"; // Tidy up plus $content_replacements_from[] = "/\s[Pp]{1}lus\s/"; $content_replacements_to[] = " plus "; // Tidy up including $content_replacements_from[] = "/\s[Ii]{1}ncluding\s/"; $content_replacements_to[] = " including "; // Tidy up including $content_replacements_from[] = "/[Ii]{1}nc\s/"; $content_replacements_to[] = "Including "; // Tidy up Push/pull $content_replacements_from[] = "/[Pp]{1}ush\/[Pp]{1}ull/"; $content_replacements_to[] = "push/pull"; $content_replacements_from[] = "/[\.|\!|\?]{1}\s{1}[Pp]{1}ush\/[Pp]{1}ull/"; $content_replacements_to[] = "Push/pull"; $content_replacements_from[] = "/^[Pp]{1}ush\/[Pp]{1}ull/"; $content_replacements_to[] = "Push/pull"; // Tidy up + $content_replacements_from[] = "/\s\+\s/"; $content_replacements_to[] = " and "; // Tidy up * $content_replacements_from[] = "/\*/"; $content_replacements_to[] = ""; // Tidy up with $content_replacements_from[] = "/\s[Ww]{1}ith\s/"; $content_replacements_to[] = " with "; // Tidy up without $content_replacements_from[] = "/\s[Ww]{1}ithout\s/"; $content_replacements_to[] = " without "; // Tidy up in $content_replacements_from[] = "/\s[Ii]{1}n\s/"; $content_replacements_to[] = " in "; // Tidy up of $content_replacements_from[] = "/\s[Oo]{1}f\s/"; $content_replacements_to[] = " of "; // Tidy up for $content_replacements_from[] = "/\s[Ff]{1}or\s/"; $content_replacements_to[] = " for "; // Tidy up or $content_replacements_from[] = "/\s[Oo]{1}r\s/"; $content_replacements_to[] = " or "; // Tidy up and $content_replacements_from[] = "/\s[Aa]{1}nd\s/"; $content_replacements_to[] = " and "; // Tidy up to $content_replacements_from[] = "/\s[Tt]{1}o\s/"; $content_replacements_to[] = " to "; // Tidy up too $content_replacements_from[] = "/\s[Tt]{1}oo\s/"; $content_replacements_to[] = " too "; // Tidy up &amp;amp; $content_replacements_from[] = "/\s&amp;amp;\s/"; $content_replacements_to[] = " and "; // Tidy up &amp; $content_replacements_from[] = "/\s&amp;\s/"; $content_replacements_to[] = " and "; // Tidy up mm $content_replacements_from[] = "/M[Mm]{1}/"; $content_replacements_to[] = "mm"; // Tidy up ize to ise $content_replacements_from[] = "/([a-zA-Z]{2})ize{1}/"; $content_replacements_to[] = "$1ise"; // Tidy up izer to iser $content_replacements_from[] = "/([a-zA-Z]{2})izer{1}/"; $content_replacements_to[] = "$1iser"; // Tidy up yze to yse $content_replacements_from[] = "/([a-zA-Z]{2})yze{1}/"; $content_replacements_to[] = "$1yse"; // Tidy up ization to isation $content_replacements_from[] = "/([a-zA-Z]{2})ization{1}/"; $content_replacements_to[] = "$1isation"; // Tidy up times symbol $content_replacements_from[] = "/([0-9]{1})\s*[Xx]\s*([0-9A-Za-z]{1})/"; $content_replacements_to[] = "$1 &amp;times; $2"; // Tidy up times symbol $content_replacements_from[] = "/\&amp;times\;/"; $content_replacements_to[] = html_entity_decode("&amp;times;"); // Tidy up inches $content_replacements_from[] = "/([0-9]{1})\s*[Ii]{1}nches/"; $content_replacements_to[] = "$1\""; // Tidy up inch $content_replacements_from[] = "/([0-9]{1})\s*[Ii]{1}nch/"; $content_replacements_to[] = "$1\""; // Make the replacements $content = preg_replace($content_replacements_from, $content_replacements_to, $content); </code></pre> <p>This is obviously complicated and lengthy.</p> <p>Does anyone know a better way of doing it or know of a class that is out there that can do this?</p> <p>I would then also want to apply this to content within HTML if possible.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload