代码之家  ›  专栏  ›  技术社区  ›  Andre Pena

基于关联计算表中行之间相似度的最佳方法是什么?

  •  1
  • Andre Pena  · 技术社区  · 15 年前

    假设每个人都有自己喜欢的书。

    所以我有一张桌子:

    • 人与书的关系(MxN的联合表)

    我想根据最喜欢的书的重叠情况,找出与一个人相似的人。也就是说:他们的共同点越多,他们的相似之处就越多。

    你们专家会用什么解决方案?

    3 回复  |  直到 15 年前
        1
  •  2
  •   mdma    15 年前

    这可能不是最有效的,但相对简单:

    WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
    (
     Select p1.person_id, p2.person_id AS simlar_person_id,   
     /* Find the number of books p1 and p2 have in common */
       (SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2 
         JOIN pb1=book_id=pb2.book_id
       WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
       FROM Person p1 CROSS JOIN Person p2
    )
    

    这将给你每个人,其他人的名单和共同的号码簿。

    要获得最相似的人,请添加(在同一查询中)

    SELECT TOP 1 similar_person_id FROM SimilarBookPrefs 
       WHERE person_id = <person_to_match>
       ORDER By booksInCommon DESC;
    

    第一部分不必是CTE(即带有…),它可以是视图,甚至是派生表。为了简洁起见,这里有一个CTE。

        2
  •  1
  •   Anthony Pegram    15 年前

    如果我在C#中这样做,我可能会这样处理它

    var query = from personBook in personBooks
                where personBook.PersonId != basePersonId // ID of person to match
                join bookbase in personBooks
                on personBook.BookId equals bookbase.BookId
                where bookbase.PersonId == basePersonId // ID of person to match
                join person in persons 
                on personBook.PersonId equals person.Id 
                group person by person into bookgroup
                select new
                {
                    Person = bookgroup.Key, 
                    BooksInCommon = bookgroup.Count()
                };
    

    这可以通过实体框架或linqtosql来完成,也可以直接转换成SQL。

    class CommonBooks
    {
        static void Main()
        {
            List<Person> persons = new List<Person>()
            {
                new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
            };
    
            List<Book> books = new List<Book>()
            {
                new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
            };
    
            List<PersonBook> personBooks = new List<PersonBook>()
            {
                new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5), 
                new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5), 
                new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5), 
                new PersonBook(4,1), new PersonBook(4,4),
                new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
            };
    
            int basePersonId = 4; // person to match likeness
    
            var query = from personBook in personBooks
                        where personBook.PersonId != basePersonId
                        join bookbase in personBooks
                        on personBook.BookId equals bookbase.BookId
                        where bookbase.PersonId == basePersonId
                        join person in persons
                        on personBook.PersonId equals person.Id
                        group person by person into bookgroup
                        select new
                        {
                            Person = bookgroup.Key,
                            BooksInCommon = bookgroup.Count()
                        };
    
            foreach (var item in query)
            {
                Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
            }
    
            Console.Read();
        }
    }
    
    class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public Person(int id, string name) { Id = id; Name = name; }
    }
    
    class Book
    {
        public int Id { get; set; }
        public Book(int id) { Id = id; }
    }
    
    class PersonBook
    {
        public int PersonId { get; set; }
        public int BookId { get; set; }
        public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
    }
    
        3
  •  0
  •   earl    15 年前

    您描述的问题通常被称为“协作过滤”,并使用“推荐系统”解决。在谷歌上搜索这两个词中的任何一个都会让你获得大量有用的信息。

    推荐文章