![streetsmart edge api streetsmart edge api](https://image.cnbcfm.com/api/v1/image/106182957-3ed1-tn-101519-breakout.jpg)
You want it to be very easy to run your scraper in a non-destructive fashion because you will be doing a lot of iterative development to get the patterns right. Mention both the pattern you're looking for AND the text you're comparing against. The target site will change on you, and when that happens you want your error messages to tell you not only what part of the code failed, but why it failed. In your parsing code, take care to make your exceptions very helpful. Validate input and throw meaningful exceptions. Make it very easy for other developers (or yourself in 3 months) to understand what markup you expect to find. If possible, store the patterns as text files or in a resource file somewhere. Make it as easy as possible to change the patterns you look for. If screen scraping is the only option, here are some keys to success: It's easy to grab the HTTP response, it's a lot harder to scrape what the browser displays in response to client-side script contained in that response. If the target website is using any sort of dynamic script to manipulate the webpage you're going to have a very hard time scraping it. Screen scraping doesn't play well with Javascript.
#Streetsmart edge api update#
You'll need to update your code each time the source website changes their markup structure. Otherwise, use a parsing framework like the HTML Agility Pack. If you're scraping a very, very simple page then regular expressions might work. Parsing HTML can be difficult, especially if it's malformed. "Screen scraping" by parsing the HTML from a website is usually a bad idea because: