URL String, Extract Parts
URL String, Extract Parts
Parses the URL/URI string to extract each components such as Protocol (Scheme), Host, Hostname, Path, Query, Fragment, etc. For components that require URL decoding, decodeURIComponent processing is required separately in the downstream process.
Configs
  • A1: Set URL String (eg https://example.com/path/?query=a#ref ) *#{EL}
  • B1: Select STRING that stores Protocol like “https:” (update)
  • B2: Select STRING that stores Host “example.com” (update)
  • B3: Select STRING that stores Pathname like “/path” (update)
  • B4: Select STRING that stores Query like “query=a” (update)
  • B5: Select STRING that stores Anchor like “ref” (update)
  • C1: Select STRING for Top Directory in Path (update)
  • C2: Select STRING for Lowest Directory (WpSlug) in Path (update)
  • C3: Select STRING that stores FileName at end in Path (update)
  • C4: Select STRING that stores UserName in Host (update)
  • C5: Select STRING that stores Passwd in Host (update)
  • C6: Select STRING that stores Port in Host (update)
  • C7: Select STRING that stores Hostname in Host (update)
Script (click to open)
// GraalJS Script (engine type: 2)

//////// START "main()" /////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strUri         = configs.get( "StrConfA1" ); // REQUIRED
  if( strUri       === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A1: URL/URI} is empty \n" );
  }
const strPocketProtocol        = configs.getObject( "SelectConfB1" ); /// NotRequired
const strPocketHost            = configs.getObject( "SelectConfB2" ); /// NotRequired
const strPocketPathname        = configs.getObject( "SelectConfB3" ); /// NotRequired
const strPocketQuery           = configs.getObject( "SelectConfB4" ); /// NotRequired
const strPocketAnchor          = configs.getObject( "SelectConfB5" ); /// NotRequired
const strPocketTopDirectory    = configs.getObject( "SelectConfC1" ); /// NotRequired
const strPocketLowestDirectory = configs.getObject( "SelectConfC2" ); /// NotRequired
const strPocketFilename        = configs.getObject( "SelectConfC3" ); /// NotRequired
const strPocketHostUserName    = configs.getObject( "SelectConfC4" ); /// NotRequired
const strPocketHostUserPasswd  = configs.getObject( "SelectConfC5" ); /// NotRequired
const strPocketHostPort        = configs.getObject( "SelectConfC6" ); /// NotRequired
const strPocketHostHostname    = configs.getObject( "SelectConfC7" ); /// NotRequired


//// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


//// == Calculating / 演算 ==
// RFC 3986 appendix
// https://tools.ietf.org/html/rfc3986#appendix-B
// ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
// Without "g" flag, the first complete match and its related capturing groups are returned.
// const regUri = /^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/;

// Regular Expression - Capturing group
/*
let strReUri = "^(([^:/?#]+):)?";  // end with ":", eg. "https:", "http:",,
    strReUri += "(//([^/?#]*))?";  // start with "//", not include "/" "?" "#"
    strReUri += "([^?#]*)";        // not include "?" "#"
    strReUri += "(\\?([^#]*))?";   // start with "?", not include "#"
    strReUri += "(#(.*))?";        // start with "#"
*/
// Regular Expression - Named Capturing group - ES2018
let strReUri = "^((?<protocol>[^:/?#]+):)?"; // end with ":", eg. "https:", "http:",,
    strReUri += "(//(?<host>[^/?#]*))?";     // start with "//", not include "/" "?" "#"
    strReUri += "(?<pathname>[^?#]*)";       // not include "?" "#"
    strReUri += "(\\?(?<query>[^#]*))?";     // start with "?", not include "#"
    strReUri += "(#(?<anchor>.*))?";         // start with "#"

const regUri      = new RegExp( strReUri );
const arrUriParts = strUri.match( regUri );
  if( arrUriParts === null ){
    throw new Error( "\n AutomatedTask UnexpectedStringError:" +
                     " No matches are found for URI regular expression \n" );
  }
engine.log( " AutomatedTask: Protocol of Matched URI: " + arrUriParts.groups.protocol );

/// Drill-down Path
let strPath            = arrUriParts.groups.pathname; // eg. "/path1/path2/filename" "/filename" "/"
let strTopDirectory    = "";
let strLowestDirectory = "";
let strFilename        = "";
if( arrUriParts.groups.pathname !== undefined ){
  let arrPath          = strPath.split("/"); // "arrPath.length >= 2", because starts with "/"
  if( arrPath.length !== 2 ){
    strTopDirectory      = arrPath[ 1 ];
    strLowestDirectory   = arrPath[ arrPath.length - 2 ];
  }
  strFilename          = arrPath[ arrPath.length - 1 ];
}

/// Drill-down Host
let strHost            = arrUriParts.groups.host; // eg. "example.com:8080" "usr:pss@example.com:81"
let strHostUserName    = "";
let strHostUserPasswd  = "";
let strHostPort        = "";
let strHostHostname    = "";
let arrHost            = strHost.split("@");
if( arrHost.length === 1 ){    // eg. "example.com:8080"
  let arrDomainPort    = arrHost[0].split(":");
  if( arrDomainPort === 1 ){
    strHostHostname    = arrDomainPort[0];
  }else{
    strHostHostname    = arrDomainPort[0];
    strHostPort        = arrDomainPort[1];
  }
}else{                         // eg. "usr:pss@example.com:81"
  let arrUsrPss        = arrHost[0].split(":");
  if( arrUsrPss === 1 ){
    strHostUserName      = arrUsrPss[0];
  }else{
    strHostUserName      = arrUsrPss[0];
    strHostUserPasswd    = arrUsrPss[1];
  }
  let arrDomainPort    = arrHost[1].split(":");
  if( arrDomainPort === 1 ){
    strHostHostname    = arrDomainPort[0];
  }else{
    strHostHostname    = arrDomainPort[0];
    strHostPort        = arrDomainPort[1];
  }
}


//// == Data Updating / ワークフローデータへの代入 ==
if( strPocketProtocol !== null ){
  if( arrUriParts.groups.protocol !== undefined ){
    engine.setData( strPocketProtocol,         arrUriParts.groups.protocol );
  }else{
    engine.setData( strPocketProtocol,         null  );
  }
}
if( strPocketHost !== null ){
  if( arrUriParts.groups.host !== undefined ){
    engine.setData( strPocketHost,             arrUriParts.groups.host  );
  }else{
    engine.setData( strPocketHost,             null  );
  }
}
if( strPocketPathname !== null ){
  if( arrUriParts.groups.pathname !== undefined ){
    engine.setData( strPocketPathname,         arrUriParts.groups.pathname  );
  }else{
    engine.setData( strPocketPathname,         null  );
  }
}
if( strPocketQuery !== null ){
  if( arrUriParts.groups.query !== undefined ){
    engine.setData( strPocketQuery,            arrUriParts.groups.query );
  }else{
    engine.setData( strPocketQuery,            null  );
  }
}
if( strPocketAnchor !== null ){
  if( arrUriParts.groups.anchor !== undefined ){
    engine.setData( strPocketAnchor,           arrUriParts.groups.anchor );
  }else{
    engine.setData( strPocketAnchor,           null  );
  }
}
if( strPocketTopDirectory !== null ){
  engine.setData( strPocketTopDirectory,       strTopDirectory );
}
if( strPocketLowestDirectory !== null ){
  engine.setData( strPocketLowestDirectory,    strLowestDirectory );
}
if( strPocketFilename !== null ){
  engine.setData( strPocketFilename,           strFilename );
}
if( strPocketHostUserName !== null ){
  engine.setData( strPocketHostUserName,       strHostUserName );
}
if( strPocketHostUserPasswd !== null ){
  engine.setData( strPocketHostUserPasswd,     strHostUserPasswd );
}
if( strPocketHostPort !== null ){
  engine.setData( strPocketHostPort,           strHostPort );
}
if( strPocketHostHostname !== null ){
  engine.setData( strPocketHostHostname,       strHostHostname );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////



/*
Notes:
- Parses strings that start with a Protocol ("http:", "https:", etc.).
- The "Host" in this add-on includes not only "Hostname" but also Port number.
    - To get an accurate internet domain, use "Hostname" instead "Host".

Notes-ja:
- プロトコル("http:", "https:", 等)で始まる文字列を解析します。
- このアドオンにおける「Host部」には「Hostname」だけでなくPort番号等が含まれます。
    - 正確にインターネットドメインを取得したい場合は、Hostname をご利用ください。
*/

/*
APPENDIX
- If the directory does not exist, the directory name will be updated empty.
    - When there is only one "/" in the Path part
- Parsing a URL in programing languages
    - https://developer.mozilla.org/en-US/docs/Web/API/URL
    - https://www.php.net/manual/en/function.parse-url.php
    - https://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
    - https://nodejs.org/api/path.html

APPENDIX-ja
- ディレクトリが無い場合、ディレクトリ名は空で更新されます。
    - Path部に "/" が一つしかない場合
- プログラミング言語でのURLパース(URL解析)
    - https://developer.mozilla.org/ja/docs/Web/API/URL
    - https://www.php.net/manual/ja/function.parse-url.php
    - https://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
    - https://nodejs.org/api/path.html
*/

Download

2021-04-04 (C) Questetra, Inc. (MIT License)
https://support.questetra.com/addons/url-string-extract-parts/
The Addon-import feature is available with Professional or Enterprise edition.

Notes

  • Parses strings that start with a Protocol (“http:”, “https:”, etc.).
  • The “Host” in this add-on includes not only “Hostname” but also Port number.
    • To get an accurate internet domain, use “Hostname” instead “Host”.

Capture

Parses the URL/URI string to extract each components such as Protocol (Scheme), Host, Hostname, Path, Query, Fragment, etc. For components that require URL decoding, decodeURIComponent processing is required separately in the downstream process.

Appendix

See also

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: