代码之家  ›  专栏  ›  技术社区  ›  user697911

如何使这段代码线程安全?

  •  1
  • user697911  · 技术社区  · 7 年前

    编辑:更完整的代码

    static class Similarity {
            double similarity;
            String seedWord;
            String candidateWord;
    
            public Similarity(double similarity, String seedWord, String candidateWord) {
                this.similarity = similarity;
                this.seedWord = seedWord;
                this.candidateWord = candidateWord;
            }
    
            public double getSimilarity() {
                return similarity;
            }
    
            public String getSeedWord() {
                return seedWord;
            }
    
            public String getCandidateWord() {
                return candidateWord;
            }
        }
    
        static class SimilarityTask implements Callable<Similarity> {
            Word2Vec vectors;
            String seedWord;
            String candidateWord;
            Collection<String> label1;
            Collection<String> label2;
    
            public SimilarityTask(Word2Vec vectors, String seedWord, String candidateWord, Collection<String> label1, Collection<String> label2) {
                this.vectors = vectors;
                this.seedWord = seedWord;
                this.candidateWord = candidateWord;
                this.label1 = label1;
                this.label2 = label2;
            }
    
            @Override
            public Similarity call() {
                double similarity = cosineSimForSentence(vectors, label1, label2);
                return new Similarity(similarity, seedWord, candidateWord);
            }
        }
    

    现在,这个“计算”线程安全吗?涉及3个变量:

    1) vectors;
      2) toeknizerFactory;
      3) similarities;
    
    public static void compute() throws Exception {
    
            File modelFile = new File("sim.bin");
            Word2Vec vectors = WordVectorSerializer.readWord2VecModel(modelFile);
    
            TokenizerFactory tokenizerFactory = new TokenizerFactory()
    
            List<String> seedList = loadSeeds();
            List<String> candidateList = loadCandidates();
    
            log.info("Computing similarity: ");
    
            ExecutorService POOL = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
            List<Future<Similarity>> tasks = new ArrayList<>();
            int totalCount=0;
            for (String seed : seedList) {
                Collection<String> label1 = getTokens(seed.trim(), tokenizerFactory);
                if (label1.isEmpty()) {
                    continue;
                }
                for (String candidate : candidateList) {
                    Collection<String> label2 = getTokens(candidate.trim(), tokenizerFactory);
                    if (label2.isEmpty()) {
                        continue;
                    }
                    Callable<Similarity> callable = new SimilarityTask(vectors, seed, candidate, label1, label2);
                    tasks.add(POOL.submit(callable));
                    log.info("TotalCount:" + (++totalCount));
                }
            }
    
            Map<String, Set<String>> similarities = new HashMap<>();
            int validCount = 0;
            for (Future<Similarity> task : tasks) {
                Similarity simi = task.get();
                Double similarity = simi.getSimilarity();
                String seedWord = simi.getSeedWord();
                String candidateWord = simi.getCandidateWord();
    
                Set<String> similarityWords = similarities.get(seedWord);
                if (similarity >= 0.85) {
                    if (similarityWords == null) {
                        similarityWords = new HashSet<>();
                    }
                    similarityWords.add(candidateWord);
                    log.info(seedWord + " " + similarity + " " + candidateWord);
                    log.info("ValidCount: "  + (++validCount));
                }
    
                if (similarityWords != null) {
                    similarities.put(seedWord, similarityWords);
                }
            }
    }
    

    添加了另一个相关方法,由call()方法使用:

    public static double cosineSimForSentence(Word2Vec vectors, Collection<String> label1, Collection<String> label2) {
            try {
                return Transforms.cosineSim(vectors.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
            } catch (Exception e) {
                log.warn("OOV: " + label1.toString() + " " + label2.toString());
                //e.getMessage();
                //e.printStackTrace();
                return 0.0;
            }
        }
    
    1 回复  |  直到 7 年前
        1
  •  0
  •   Tom Hawtin - tackline    7 年前

    (更改问题的答案已更新。)

    一般来说,您应该在尝试优化代码之前对其进行分析,特别是在代码非常复杂的情况下。

    我想传递给你任务的东西都不会被修改。很难说。 final

    假设您不中断内部循环,那么唯一共享的可变状态似乎是 similarities 以及它包含的值。

    你可能会或不会发现你仍然会连续做太多的事情,需要改变 并发

        ConcurrentMap<String, Set<String>> similarities = new ConcurrentHashMap<>();
    

    这个 get put 属于 相似之处 需要线程安全。我建议总是创造 Set .

            Set<String> similarityWords = similarities.getOrDefault(seed, new HashSet<>());
    

            Set<String> similarityWords = similarities.computeIfAbsent(seed, key -> new HashSet<>());
    

    你可以使用线程安全 (例如 Collections.synchronizedSet ),但我建议为整个内部循环保留一个相关的锁。

    synchronized (similarityWords) {
        ...
    }
    

    如果你想创造 similarityWords