The Prolegis crawler uses an article extraction tool to find metadata from source content. The documentation below describes how to format your HTML so that we can extract high quality metadata. We’ve attempted to support open standards for HTML metadata wherever possible, so following these formats may improve metadata in other crawler based sources.

The attributes below are optional. If the HTML does not contain the documented elements, our crawler will attempt to pick the best value. If you're unsure why the crawler is not behaving as expected, or if you generally have questions about the crawler, please contact us via the chat tool in the bottom right corner of this screen.

Crawler Page Schema

Field Priority Placement Format Example
Title 1 head
<meta name="DC.title" content="{value}">
<meta name="DC.title" content="Chemical Spills">
2 inline
<h1 class="article"></h1>
<h1 class="article">Chemical Spills</h1>
Authors 1 head
<meta name="DC.author" content="{value}">
<meta name="DC.author" content="Author 1">
<meta name="DC.author" content="Author 2">
2 inline
<span class="by_author">{value}</span>
<span class="by_author">Mark Twain</span>
3 inline
<span class="author">{value}</span>
<span class="author">Mark Twain</span>
Published Date 1 head
<meta name="DC.date" content="{ISO 8601 datetime}">
<meta name="DC.date" content="2020-01-27">
2 inline
<span class="article_datetime">{ISO 8601 datetime}</span>
<span class="article_datetime">2020-01-27</span>
Image Url 1 head
<meta property="og:image" content="{image URL}" />
<meta property="og:image"
   content="http://example.com/rock.jpg" />