Reporting and Analytics for Service

Get Involved. Join the Conversation.

Topic

    Neil
    Extract the middle section of thread noteAnswered
    Topic posted January 14, 2019 by Neil Gold Medal: 3,500+ Points, last edited January 14, 2019 
    44 Views, 11 Comments
    Title:
    Extract the middle section of thread note
    Content:

    Hi,

    I need to extract the middle-part of thread notes for incidents relating to a given product. The format of the note is consistent, but the middle part may sometimes be blank.

    The field is thread.note (n.b: sufficient filters are in-place on the report to only return the thread entry I'm interested in)

    e.g #1: The customer rated us "10 VERY RESPECTFUL" from +441234567890. Message received at 15.12.2018 12:32:46

    In this case, I would want the column in my report to show "VERY RESPECTFUL" (ideally without the speech marks)

    e.g #2: The customer rated us "10" from +447597870439. Message received at 15.12.2018 12:11:40

    In this case, my report column would be blank (a space is also acceptable, a speech mark would be too).

    I have managed to trim the leading text from the column using "substr(threads.note,26)", but I'm struggling to get rid of the trailing text (basically anything after the second/closing speech mark, ideally the second speech mark would not be visible in the report, but this isn't catastrophic if it has to be there).

     

    Is there a solution to this?

    Thanks in advance for any help you can offer.

    Version:
    18A

    Best Comment

    Sebastiaan Draaisma

    Ah yes, my mistake.

    Replace this line of code:

    $rows[0][2]->val = preg_replace("/(^\d+\W+)/", "", $rows[0][2]->val); // Delete leading digits etc.

    With this line

    $rows[0][2]->val = preg_replace("/(^\d+\W*)/", "", $rows[0][2]->val); // Delete leading digits etc.

    Instead of \W+ (one or more) you will be using \W* (0 or more)

    Comment

     

    • Sebastiaan Draaisma

      This can easily be achieved by using a custom script :-)
      The custom script can just extract the part for you through a regular expression

      • Neil

        Easy, if you know how! ;)

        Is there any possibility of you being able to provide a script please, Sebastiaan?

        The column in the report is the 3rd column, FYI.

    • Sebastiaan Draaisma

      Absolutely, no problem Neil :-)

      Please find the attached report example based on the following code:

      $rows[0][1]->val = preg_replace("/(^[a-z ]{1,} \"\d* )/i", "", $rows[0][1]->val); // Delete prefix
      $rows[0][1]->val = preg_replace("/(\" .+$)/", "", $rows[0][1]->val); // Delete suffix

      For the third column you need to replace [1] to [2] in all the variables $rows[0][2]->val (this as the column id starts at 0)

      $rows[0][2]->val = preg_replace("/(^[a-z ]{1,} \"\d* )/i", "", $rows[0][2]->val); // Delete prefix
      $rows[0][2]->val = preg_replace("/(\" .+$)/", "", $rows[0][2]->val); // Delete suffix

      The code is devided into 2 parts. The first part starts from the beginning of the string ^ and looks for characters matching [a-z ] and a space where at least one character needs to match {1,} followed by a double quote with an ecape character (otherwise the code won't work) \" folowed by one or more digits \d* all case insensitive /i

      The second part starts at the double quote \" followed by a space followed by any character .+ untill the end of the string $

      • Neil

        Hi Sebastiaan,

        Thanks for this, however it has not worked.

        The column in my report is #3, so I have edited your code to reflect this - maybe I am wrong?

        $rows[0][3]->val = preg_replace("/(^[a-z ]{1,} \"\d* )/i", "", $rows[0][1]->val); // Delete prefix
        $rows[0][3]->val = preg_replace("/(\" .+$)/", "", $rows[0][1]->val); // Delete suffix

        Also, am I correct in thinking that this column needs to have the "threads.note" definition still? 

        I'm probably being stupid here - I have never used custom scripts before, so am not familiar.

        I've attached the xml of my report after trying your code.

        • Sebastiaan Draaisma

          Yeah, you are wrong :-D

          As the column starts with id 0 the third column will have id 2

          0, 1, 2

          As the source of the text is also located in column 3 (id 2) you will want to change the variable in all 4 places

          $rows[0][2]->val = preg_replace("/(^[a-z ]{1,} \"\d* )/i", "", $rows[0][2]->val); // Delete prefix
          $rows[0][2]->val = preg_replace("/(\" .+$)/", "", $rows[0][2]->val); // Delete suffix

          Also have a look at my beginners tutorial for custom script. Easier than you may think :-)

          A 'How to' on using regular expressions in custom scripts for GDPR

          • Neil

            We're making progress! Thanks!

            It's now doing 'something', but not quite as desired. It is removing the profix from some of them, and the suffix from others.

            Screenshot attached.

    • Sebastiaan Draaisma

      Ah I see some of them have a dot . after the score.

      If you can just create a column after column 3 (otherwise your code won't work) with the original text and list me some entries that did not get matched correctly, than I can adjust the regular expression for you. Or via private message if you prefer :-)

    • Sebastiaan Draaisma

      Use this one (based on substring)

      $string = $rows[0][2]->val;
      $start = strpos($rows[0][2]->val, '"')+1;
      $length = -abs(strpos(strrev($rows[0][2]->val), '"'))-1;

      $rows[0][2]->val = substr($string,$start,$length);
      $rows[0][2]->val = preg_replace("/(^\d+\W*)/", "", $rows[0][2]->val); // Delete leading digits etc.

      This is a little better as it will not cause any trouble when the contact has used double quotes in his text.

      VERY "within quotes" RESPECTFUL

      It looks at the first occurence of a double quote in the string " and stores this as the start position, it adds one  +1 to mark the position directly after the double quote. The length is calculated from the back (reversed counting) counting the first double quote counted from the end of the string strrev it than uses -abs to convert this positive number into a negative number, this as the substr function when fed with a negative number in the length starts counting from the back. We than substract one -1 to stop at the character directly before the double quote. We than use a preg_replace with a regular expression to clean the text from any leading ^ digits \d+ and non word characters \W* such as dot and space characters.

      • Neil

        Hi Sebastiaan,

        This is now working almost perfectly - thank you very much!

        The only thing is, some are blank, and some show the number.

        e.g. The customer rated us "10." from +440000000000. Message received at 08.01.2019 08:35:52

        returns nothing (i.e. 'null'/blank) which is great. However:

        The customer rated us "10" from +440000000000. Message received at 09.01.2019 09:12:37

        returns "10". Also: 

        The customer rated us "8" from +440000000000. Message received at 09.01.2019 09:37:18

        returns "8" (without the speech marks of course).

        I've attached a screenshot, where these three rows are highlighted, FYI.

    • Sebastiaan Draaisma

      Ah yes, my mistake.

      Replace this line of code:

      $rows[0][2]->val = preg_replace("/(^\d+\W+)/", "", $rows[0][2]->val); // Delete leading digits etc.

      With this line

      $rows[0][2]->val = preg_replace("/(^\d+\W*)/", "", $rows[0][2]->val); // Delete leading digits etc.

      Instead of \W+ (one or more) you will be using \W* (0 or more)