Parsing data from complex Html



  • hi, so i am in very wired situation. i need to parse the Active plan from below Html.

    <div class="subscription-plan">
                            <div class="subscription-plan-content">
                                <h2>InActive Plan</h2>
                                <p>Normal sound quality. $9.99/month.</p>
                                
                                    <p>Lossless sound quality. $19.99/month.</p>
                                    
                                                                
                                    
                                                                    <a href="https://xyz.com/us/account/subscription/upgrade/PREMIUM/88cc57a3-4c57-4e9e-8d1f-54c4f84c5bf5" class="clickspinner"><span class="btn">UPGRADE</span></a>
                                    
                                                              
                            </div>
                        </div>
    <div class="subscription-plan">
                            <div class="subscription-plan-content">
                                <h2>Active Plan</h2>
                                <p>Normal sound quality. $9.99/month.</p>
                                
                                    <img class="subscription-activated" src="/assets/images/icons/success-icon-big.png">
                                
                            </div>
                        </div>
    <div class="subscription-plan">
                            <div class="subscription-plan-content">
                                <h2>Inactive Plan</h2>
                                <p>Normal sound quality. $9.99/month.</p>
                                
                                    <p>Lossless sound quality. $19.99/month.</p>
                                    
                                                                
                                    
                                                                    <a href="https://xyz.com/us/account/subscription/upgrade/PREMIUM/88cc57a3-4c57-4e9e-8d1f-54c4f84c5bf5" class="clickspinner"><span class="btn">UPGRADE</span></a>
                                    
                                                              
                            </div>
                        </div>
    
    

    The only difference is the active subscription class have an image tag with class name subscription-activated. anyway to check with plan is active?


  • Admin

    You should use a regex in this case



  • @Ruri regex with 3 line breaks ? cause lets say i select these 3 lines

    <h2>Active Plan</h2>
                                <p>Normal sound quality. $9.99/month.</p>
                                
    <img class="subscription-activated" src="/assets/images/icons/success-icon-big.png">
    

    then there are 4 line breaks and the text in <p> tag changes too if currency is different so cant hardcode that too.


  • Admin

    But why don't you just parse between the text "Active Plan" and the </p> tag via LR? Then you can trim and remove the other unnecessary <p> using replace.



  • In this case i'd use loliscript

    IF
    "<SOURCE>" CONTAINS " <img class="subscription-activated" src="/assets/images/icons/success-icon-big.png">"
    SET CAPTURE "Subbed" "TRUE"
    ELSE
    SET CAPTURE "Subbed" "FALSE"
    ENDIF

    not sure what it would return with other subs or no subs at all but a small change would fix that.



  • Solution

    PARSE "<SOURCE>" LR "<div class="subscription-plan-content">" "</h2>" Recursive=TRUE -> VAR "PLAN"

    FUNCTION Constant "<PLAN[1]>"



  • Many ways can do that:

    1. LR + FUNTION REPLACE
    PARSE "<SOURCE>" LR "<h2>Active Plan</h2>" "</p>" -> VAR "PLAN" 
    
    FUNCTION Replace "<p>" "" "<PLAN>" -> CAP "Plan" 
    
    1. LR + FUNTION TRANSLATE
    PARSE "<SOURCE>" LR "<h2>Active Plan</h2>" "</p>" -> VAR "PLAN" 
    
    FUNCTION Translate StopAfterFirstMatch=TRUE
      KEY "<p>" VALUE "" 
      "<PLAN>" -> CAP "Plan" 
    
    1. REGEX
    PARSE "<SOURCE>" REGEX "<h2>Active Plan</h2>\\s+<p>(.*)</p>" "[1]" CreateEmpty=FALSE -> CAP "Plan" 
    

    And they will have the response not contain "Active Plan", you should use IF
    Example:

    IF "<SOURCE>" Contains "Active Plan"
    
    PARSE "<SOURCE>" REGEX "<h2>Active Plan</h2>\\s+<p>(.*)</p>" "[1]" CreateEmpty=FALSE -> CAP "Plan" 
    
    ENDIF
    


  • TY for replies solved


Log in to reply