How to capture data from different pages from one source?



  • This post is deleted!

  • Admin

    1. Parse URLs to a list variable
    2. Read this https://forum.openbullet.dev/topic/7/how-to-loop-on-a-list-variable
    3. In the loop place a request block, a parse block, and a utility block that adds the newly parsed list to a master list you created outside the loop (I think it's called List > Zip)


  • Will try and update you, thankyou.



  • Ok, i made the config till i parse all url's, now i need to make OB visit these one by one,so, i created a request. But what to put into this?

    <--- Executing Block UTILITY --->
    Executed action Length on file LIST
    SET command executed on field VAR
    "https://test.com/testttttttt/"
    <--- Executing Block FUNCTION --->
    Executed function Compute on input 0+1 with outcome 1
    Parsed variable | Name: INDEX | Value: 1
    
    Jumping to line 19
    "https://test.com/testttttttt1/"
    <--- Executing Block FUNCTION --->
    Executed function Compute on input 1+1 with outcome 2
    Parsed variable | Name: INDEX | Value: 2
    
    Jumping to line 19
    "https://test.com/testttttttt/2"
    <--- Executing Block FUNCTION --->
    Executed function Compute on input 2+1 with outcome 3
    Parsed variable | Name: INDEX | Value: 3
    
    Jumping to line 19
    "https://test.com/testttttttt/3"
    <--- Executing Block FUNCTION --->
    Executed function Compute on input 3+1 with outcome 4
    Parsed variable | Name: INDEX | Value: 4
    
    Jumping to line 19
    Jumping to line 26
    WARNING: The test input data did not respect the validity regex for the selected wordlist type!
    ===== DEBUGGER ENDED AFTER 2,736 SECOND(S) WITH STATUS: NONE =====
    
    

    I have modified the parsed urls.


  • Admin

    In the request put as address <LIST[<INDEX>]>



  • Yeah i tried it already and got:

    <--- Executing Block REQUEST --->
    Calling URL: <LIST[4]>
    Sent Headers:
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36
    Pragma: no-cache
    Accept: */*
    Content-Type: application/x-www-form-urlencoded
    Sent Cookies:
    
    Invalid URI: The hostname could not be parsed.
    ERROR: Invalid URI: The hostname could not be parsed.
    ===== DEBUGGER ENDED AFTER 5,076 SECOND(S) WITH STATUS: ERROR =====
    
    

  • Admin

    Maybe because your URL list is not called LIST but some other name? xD


  • Admin

    Also make sure you're on OB 1.2.2 cause older versions didn't support this syntax



  • I'm using the 1.2.2 for that.

    Cattura.PNG

    Cattura.PNG



  • Works good for me (ignore first two lines - it is only to build an example list):

    c80bf7ee-936b-488a-8a1e-93b1afcbfb78-image.png



  • How stupid, i just needed to move the request before the function tab..worked.



  • Thanks a lot for helping me out, problem is solved.

    Kind regards.



  • Turning back to the parsing.

    If my request url is:

    https://testing.com/

    and the source Urls are in this format:

    <span class=" subject_new" id="tid_901"><a href="Thread-test1">347 - 2020</a></span></span>
    <span class=" subject_new" id="tid_902"><a href="Thread-test2">348 - 2020</a></span></span>
    <span class=" subject_new" id="tid_903"><a href="Thread-test3">349 - 2020</a></span></span>
    <span class=" subject_new" id="tid_904"><a href="Thread-test4">350 - 2020</a></span></span>

    How can i parse the Urls to add them to the list and capture data from them?

    By using [class*=subject_new] as CSS selector, it will capture the whole class and not just the URL.



  • You simply can use regex afterwards to capture the desired elements.



  • Ok, that was done, who could please help me parsing only the usernames:passwords from this?

    </script>
    <meta name="description" content="[email protected]:xxxxxx83" />
    <link rel="canonical" href="https://xxxxx.com/xxxxx-xxxx-xxxxx--xxxxx" />
    </head>
    <body>
    
    
    <meta name="description" content="[email protected]:xxxxxx83 [email protected]:xxxxxx831 [email protected]:xxxxxx835 [email protected]:xxxxxx836" />
    <link rel="canonical" href="https://xxxx/xxxx-xxxx-xxxxx--xxxxxx" />
    </head>
    <body>
    
    
    </script>
    <meta name="description" content="[email protected]:xxxxxx83 | SUP = testttttttttttting |" />
    <link rel="canonical" href="https://xxxxxxxxxxx.com/xxxxxxx-xxxxxxs--xxxxxxxxx" />
    </head>
    <body>
    

    By using

    (?<="description" content=")(.*)(?=")
    

    i can capture the the usernames:passwords but from the last line it capture everything. So, how to avoid it?



  • @kbilly what about 'span[a="href"]' ?



  • @L4roy please re-read my last post. I wonder to parse this, the second last post has been solved.



  • this worked for it will only parse user:pass in all lines

    meta name="description" content="(.[^\s]*)
    

    i just noticed that it doesn't capture everything in second line. but you can still that results you got so far split it and then use "removevalues" and make it remove the lines that doesnt contain "@" or ":".



  • @BULMADB I think the best would be this one then:

    (?<=description" content=")(.[^\s]*)
    

    This code parses only the usernames:passwords without any other words but as you said, it just parses the first ones and not every username:password in the second line.

    So, still looking for the complete solution.

    For now i used this and will filter my lists manually:

    (?<=description" content=")(.*)(?=")
    


  • Coming back to the config parsing, what if i have a site named:

    https://test.com/

    which has 20 pages. Every page has another links in it, where i need to parse and capture things. So, till the very first page, i am able to parse, capture using the above configuration i created. But now i would like to make it more complex by parsing and capturing data from the another pages also. For example:

    https://test.com/page1
    https://test.com/page2
    https://test.com/page3

    TILL

    https://test.com/page20

    How can i make OB continue after the first page has been parsed , to make it visit the https://test.com/page2 till 20 and do the job?


Log in to reply