programing

정규식을 사용하여 문자열에서 모든 YouTube 동영상 ID를 찾으려면 어떻게하나요?

nasanasas 2020. 9. 4. 07:32
반응형

정규식을 사용하여 문자열에서 모든 YouTube 동영상 ID를 찾으려면 어떻게하나요?


사용자가 무엇이든 쓸 수있는 텍스트 필드가 있습니다.

예를 들면 :

Lorem Ipsum은 단순히 더미 텍스트입니다. 인쇄 및 조판 업계의 http://www.youtube.com/watch?v=DUQi_R4SgWo . Lorem Ipsum은 알려지지 않은 프린터가 유형의 갤리선을 가져 와서 유형 견본 책자를 만들기 위해 스크램블 한 1500 년대 이래로 업계의 표준 더미 텍스트였습니다. 그것은 5 세기뿐만 아니라 본질적으로 변하지 않은 전자 조판으로의 도약에도 살아 남았습니다. http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu 1960 년대에 Lorem Ipsum 구절이 포함 된 Letraset 시트가 출시되면서 대중화되었으며 최근에는 Lorem Ipsum 버전을 포함한 Aldus PageMaker와 같은 데스크톱 출판 소프트웨어로 대중화되었습니다.

이제 그것을 파싱하고 모든 YouTube 동영상 URL과 ID를 찾고 싶습니다.

어떻게 작동하는지 아십니까?


YouTube 동영상 URL은 다양한 형식으로 표시 될 수 있습니다.

  • 최신 짧은 형식 : http://youtu.be/NLqAF9hrVbY
  • iframe : http://www.youtube.com/embed/NLqAF9hrVbY
  • iframe (보안) : https://www.youtube.com/embed/NLqAF9hrVbY
  • 개체 매개 변수 : http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • 개체 포함 : http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • 손목 시계: http://www.youtube.com/watch?v=NLqAF9hrVbY
  • 사용자 : http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
  • ytscreeningroom : http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
  • 무엇이든 / 무엇이든! : http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
  • 임의 / 하위 도메인 / 너무 : http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
  • 더 많은 매개 변수 : http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
  • 쿼리에 점이있을 수 있습니다. http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
  • nocookie 도메인 : http://www.youtube-nocookie.com

다음은 이러한 각 URL 형식과 일치하고 링크로 변환하는 주석 처리 된 정규식이있는 PHP 함수입니다 (아직 링크가 아닌 경우).

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs($text) {
    $text = preg_replace('~(?#!js YouTubeId Rev:20160125_1800)
        # Match non-linked youtube URL in the wild. (Rev:20130823)
        https?://          # Required scheme. Either http or https.
        (?:[0-9A-Z-]+\.)?  # Optional subdomain.
        (?:                # Group host alternatives.
          youtu\.be/       # Either youtu.be,
        | youtube          # or youtube.com or
          (?:-nocookie)?   # youtube-nocookie.com
          \.com            # followed by
          \S*?             # Allow anything up to VIDEO_ID,
          [^\w\s-]         # but char before ID is non-ID char.
        )                  # End host alternatives.
        ([\w-]{11})        # $1: VIDEO_ID is exactly 11 chars.
        (?=[^\w-]|$)       # Assert next char is non-ID or EOS.
        (?!                # Assert URL is not pre-linked.
          [?=&+%\w.-]*     # Allow URL (query) remainder.
          (?:              # Group pre-linked alternatives.
            [\'"][^<>]*>   # Either inside a start tag,
          | </a>           # or inside <a> element text contents.
          )                # End recognized pre-linked alts.
        )                  # End negative lookahead assertion.
        [?=&+%\w.-]*       # Consume any URL (query) remainder.
        ~ix', '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>',
        $text);
    return $text;
}

; // $ YouTubeId를 종료합니다.

그리고 다음은 똑같은 정규식을 가진 JavaScript 버전입니다 (주석이 제거됨).

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs(text) {
    var re = /https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig;
    return text.replace(re,
        '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>');
}

노트:

  • URL의 VIDEO_ID 부분은 유일한 캡처 그룹 인에서 캡처 $1됩니다.
  • If you know that your text does not contain any pre-linked URLs, you can safely remove the negative lookahead assertion which tests for this condition (The assertion beginning with the comment: "Assert URL is not pre-linked.") This will speed up the regex somewhat.
  • The replace string can be modified to suit. The one provided above simply creates a link to the generic "http://www.youtube.com/watch?v=VIDEO_ID" style URL and sets the link text to: "YouTube link: VIDEO_ID".

Edit 2011-07-05: Added - hyphen to ID char class

Edit 2011-07-17: Fixed regex to consume any remaining part (e.g. query) of URL following YouTube ID. Added 'i' ignore-case modifier. Renamed function to camelCase. Improved pre-linked lookahead test.

Edit 2011-07-27: Added new "user" and "ytscreeningroom" formats of YouTube URLs.

Edit 2011-08-02: Simplified/generalized to handle new "any/thing/goes" YouTube URLs.

Edit 2011-08-25: Several modifications:

  • Added a Javascript version of: linkifyYouTubeURLs() function.
  • Previous version had the scheme (HTTP protocol) part optional and thus would match invalid URLs. Made the scheme part required.
  • Previous version used the \b word boundary anchor around the VIDEO_ID. However, this will not work if the VIDEO_ID begins or ends with a - dash. Fixed so that it handles this condition.
  • Changed the VIDEO_ID expression so that it must be exactly 11 characters long.
  • The previous version failed to exclude pre-linked URLs if they had a query string following the VIDEO_ID. Improved the negative lookahead assertion to fix this.
  • Added + and % to character class matching query string.
  • Changed PHP version regex delimiter from: % to a: ~.
  • Added a "Notes" section with some handy notes.

Edit 2011-10-12: YouTube URL host part may now have any subdomain (not just www.).

Edit 2012-05-01: The consume URL section may now allow for '-'.

Edit 2013-08-23: Added additional format provided by @Mei. (The query part may have a . dot.

Edit 2013-11-30: Added additional format provided by @CRONUS: youtube-nocookie.com.

Edit 2016-01-25: Fixed regex to handle error case provided by CRONUS.


Here's a method I once wrote for a project that extracts YouTube and Vimeo video keys:

/**
 *  strip important information out of any video link
 *
 *  @param  string  link to a video on the hosters page
 *  @return mixed  FALSE on failure, array on success
 */
function getHostInfo ($vid_link)
{
  // YouTube get video id
  if (strpos($vid_link, 'youtu'))
  {
    // Regular links
    if (preg_match('/(?<=v\=)([\w\d-_]+)/', $vid_link, $matches))
      return array('host_name' => 'youtube', 'original_key' => $matches[0]); 
    // Ajax hash tag links
    else if (preg_match('§([\d\w-_]+)$§i', $vid_link, $matches))
      return array('host_name' => 'youtube', 'original_key' => $matches[0]);
    else
      return FALSE;
  }
  // Vimeo get video id
  elseif (strpos($vid_link, 'vimeo'))
  {
    if (preg_match('§(?<=/)([\d]+)§', $vid_link, $matches))
      return array('host_name' => 'vimeo', 'original_key' => $matches[0]); 
    else
      return FALSE;
  }
  else
    return FALSE;
}
  1. Find a regex that will extract all links from a text. Google will help you there.
  2. Loop all the links and call getHostInfo() for each

While ridgerunner's answer is the basis for my answer, his does NOT solve for all urls and I don't believe it is capable of it, due to multiple possible matches of VIDEO_ID in a YouTube URL. My regex includes his aggressive approach as a last resort, but attempts all common matchings first, vastly reducing the possibility of a wrong match later in the URL.

This regex:

/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[?=&+%\w-]*/ig;

Handles all of the cases originally referenced in ridgerunners examples, plus any url that might happen to have an 11 character sequence later in the url. ie:

http://www.youtube.com/watch?v=GUEZCxBcM78&feature=pyv&feature=pyv&ad=10059374899&kw=%2Bwingsuit

Here is a working sample that tests all of the sample YouTube urls:

http://jsfiddle.net/DJSwc/5/


Try

[^\s]*youtube\.com[^\s]*?v=([-\w]+)[^\s]*

You will find the video IDs' in the first capturing group. What I don't know is what is a valid Video ID? At the moment I check for v= and capture all -A-Za-z0-9_.

I checked it online here on rubular with your sample string.


Use:

<?php

    // The YouTube URL string

    $youtube_url='http://www.youtube.com/watch?v=8VtUYvwktFQ';

    // Use regex to get the video ID

    $regex='#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#';

    preg_match($regex, $youtube_url, $id);

    // Plug that into our HTML
?>

Okay, I made a function of my own. But I believe it's pretty inefficient. Any improvements are welcome:

function get_youtube_videos($string) {

    $ids = array();

    // Find all URLs
    preg_match_all('/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $string, $links);

    foreach ($links[0] as $link) {
        if (preg_match('~youtube\.com~', $link)) {
            if (preg_match('/[^=]+=([^?]+)/', $link, $id)) {
                $ids[] = $id[1];
            }
        }
    }
    return $ids;
}

I tried a simple expression to get only the videoid:

[?&]v=([^&#]*)

Check it working online here at phpliveregex.


The original poster asked "I would like to parse it and find all YouTube video URLs and their ids." I switched the most popular answer above to a preg_match and returned the video id and URL.

Get YouTube URL and ID from post:

$match[0] = Full URL
$match[1] = video ID

function get_youtube_id($input) {
    $input = preg_match('~https?://(?:[0-9A-Z-]+\.)?(?:youtu\.be/|youtube(?:-nocookie)?\.com\S*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:[\'"][^<>]*>|</a>))[?=&+%\w.-]*~ix',
                        $input, $match);
    return $match;
}

Find a YouTube link easily from a string:

function my_url_search($se_action_data)
{
    $regex = '/https?\:\/\/[^\" ]+/i';
    preg_match_all($regex, $se_action_data, $matches);
    $get_url=array_reverse($matches[0]);
    return array_unique($get_url);
}
echo my_url_search($se_action_data)

String urlid="" ;
String  url="http://www.youtube.com/watch?v=0zM4nApSvMg#t=0m10s";
Pattern pattern =Pattern.compile("(?:http|https|)(?::\\/\\/|)(?:www.|)(?:youtu\\.be\\/|youtube\\.com(?:\\/embed\\/|\\/v\\/|\\/watch\\?v=|\\/ytscreeningroom\\?v=|\\/feeds\\/api\\/videos\\/|\\/user\\\\S*[^\\w\\-\\s]|\\S*[^\\w\\-\\s]))([\\w\\-\\_]{11})[a-z0-9;:@#?&%=+\\/\\$_.-]*");
Matcher result = pattern.matcher(url);
    if (result.find())
    {
         urlid=result.group(1);

    }

This code in java works absolutely fine for all youtube urls at present.

참고URL : https://stackoverflow.com/questions/5830387/how-do-i-find-all-youtube-video-ids-in-a-string-using-a-regex

반응형